realm / realm-swift

Realm is a mobile database: a replacement for Core Data & SQLite
https://realm.io
Apache License 2.0
16.31k stars 2.14k forks source link

RealmSwift iOS Diskusage is almost 100x when database is encrypted #7141

Closed palaniraja closed 3 years ago

palaniraja commented 3 years ago

When we use standard realm encryption, the disk usage is very high, sometimes 100x i.e, with encryption adding 1000 records disk usage is 0.1Mb in instruments/xcode the same code after enabling encryption it is 0.1GB

Attaching the sample project which consistently spikes disk usage to 50 times with encryption

side effect is - sometime the allocated bytes (but free) is 4 to 8 times the size of used bytes for the data. i.e, 500mb file after compact is some where around 700kb in our project.

Refer screenshots and sample project attached.

sampleProject.zip

Side note, on a actual project we insert close to 2000 records for a business use case where we see the diskusage between 2.9 - 3.7GB

Goals

Figure out if this is a "bug" or "by design". If it is design, what is the recommendation for insertion like 5000 records without logged by iOS watchdog

Expected Results

Disk usage is reasonable when encryption is enabled

Actual Results

Disk usage is exponential, and as a result db size sometimes grows out of proportions of actual data stored (comparing with size after compact)

Steps for others to Reproduce

Run the attached sample project with carthage (dependencies are not included). And check stats on Xcode Debug navigator or Instruments

Refer screenshot attached

Code Sample

Attached. sampleProject.zip

Version of Realm and Tooling

RealmSwift v10.1.4 is used with xcode 12.2, project with minimum deployment target as 14.0

We also noticed same behavior in 5.x last Nov/Dec but we did not zero in to this issue

Realm framework version: RealmSwift v10.1.4

Realm Object Server version: N/A

Xcode version: 12.2

iOS/OSX version: 14.x

Dependency manager + version: ? Carthage 0.36/0.37

Attachments:

With encryption (start and after inserting 1000 records)::

Without encryption (start and after inserting 1000 records):

sampleProject.zip

DominicFrei commented 3 years ago

@palaniraja Thank you for submitting this issue and preparing a sample project and providing all the information above. 👍

I will have a look into that but I have already talked to our core team abut this: The encryption should indeed just lead to a marginal increase in disk usage.

DominicFrei commented 3 years ago

Update about the current status of this: I did see the same results when using the provided code example.

Without encryption:

no_encryption

With encryption:

with_encryption

Some notes on that:

DominicFrei commented 3 years ago

I did create a new sample project, boiling it down to the minimum amount of code necessary to make sure nothing else leads to the above mentioned result: https://github.com/DominicFrei/Playground/tree/realm/realm-cocoa/issues/7141

With this example writes within 100MB+ happen within seconds.

Commenting out the encryption (line 22 in ContentView.swift) reduces it to just a couple MB.

We have to look further into this.

DominicFrei commented 3 years ago

@ironage and @finnschiermer have provided me with loads of helpful information. 🚀

Here is the summary:

Are the 1000 records added in the same transaction (ie batched)? Or are there 1000 transactions writing 1
object each? It matters because when we make a write with encryption, we read a 4096 page from the
Realm, decrypt it, make the change, then reencrypt and write back that whole page. So if only one object
in a page is changed, we write all 4096 bytes. Just a possible direction to investigate.

I've done some more testing on that:

Realm.Configuration.defaultConfiguration = Realm.Configuration(encryptionKey: Data(count: 64))
let realm = try! Realm()

// for _ in 0..<30000 {
//     try! realm.write {
//         realm.add(TestObject())
//      }
// }

var array = [TestObject]()
for _ in 0..<30000 {
    array.append(TestObject())
}
try! realm.write {
    realm.add(array)
}

The results match that statement:

30k writes with 1 object encrypted: 1.1 GB
1 write with 30k objects encrypted: 3.3 MB
30k writes with 1 object unencrypted: 8.3 MB
1 write with 30k objects unencrypted: 7.3 MB

Also:

We use different system calls for writing with encryption (ordinary write + fsync) and without (page
modification + msync)... [Xcode] is just considering the calls to ordinary write(). So it's the
measurement that's wrong. Or rather: it's mostly correct for the encrypted case but totally wrong
for the unencrypted case.

@palaniraja Does that answer all your questions?

palaniraja commented 3 years ago

@DominicFrei Thank you. Yes this confirms our understanding.

Problem is in real app, there are relation to other objects that needs to be queried, e. g. person rec to be queried and added or updated based on this.

Is this behavior (disk usage in case of encryption) of dirtying 4096 is applicable for reads too? assume worst case scenario each record need to be associated to a new record on a different object?

In other words this is applicable only for write not for reads?

appreciate your support

DominicFrei commented 3 years ago

@ironage @finnschiermer Since we talked about this already: Can you answer above stated question? Does this apply to reads as well?

ironage commented 3 years ago

Reads do not dirty a page. Once we have a 4096 byte page decrypted, we reuse it to read other objects if they are found on the same page. We can have more than one page decrypted in memory and we have heuristics to free up unused pages. So there is some initial overhead to decrypt the pages you are reading from, but it is not proportional to the number of reads you make.

palaniraja commented 3 years ago

Thank you @ironage @DominicFrei

DominicFrei commented 3 years ago

@palaniraja Glad we would help. Even though it was more an explanation than a solution. Going to close this for now. But if there is anything else regarding this topic, please let me know any time and I'll re-open the ticket.