realm / realm-object-server

Tracking of issues related to the Realm Object Server and other general issues not related to the specific SDK's
https://realm.io
293 stars 42 forks source link

Sync the huge amount of data which should not sync #240

Open anujjpandey opened 7 years ago

anujjpandey commented 7 years ago

Goals

We have a MediaFile object which contain byte array and other property associate with the byte array, there are few steps we follow to process this media file.

  1. An android device (name it A) pushing the media file to ROS.
  2. Node js is listening the ROS event and as soon as it gets a media file, download the file and save the file to cloud storage.
  3. Once the file successfully saved to cloud storage, we delete it from the ROS.

Till this point of time everything looks fine and the file we deleted from ROS is also reflected to the ROS Browser as you can see in image 1.

screen shot 2017-07-23 at 8 07 18 pm

Expected Results

What I can see there is no media file at the ROS browser, there is only schema and some meta data of media file which could only produce 2-3 mb of downloadable data.

Actual Results

But what I can see there in SYNC Log is just out of the assumption

07-23 20:31:35.613 3340-3382/com.automotive.tracker D/REALM_SYNC: Using already open Realm file: /storage/sdcard0/KentTracker/Sync/9d8b5a513622a769155609c4a1f44d23/9d8b5a513622a769155609c4a1f44d23/2056695815/private.realm 07-23 20:31:35.613 3340-3382/com.automotive.tracker D/REALM_SYNC: Connection[1]: Session[1]: Progress handler called, downloaded = 795650, downloadable = 68717705, uploaded = 0, uploadable = 0, progress version = 1, snapshot version = 4

07-23 20:31:35.623 3340-3382/com.automotive.tracker D/REALM_SYNC: message_type = download 07-23 20:31:35.643 3340-3382/com.automotive.tracker D/REALM_SYNC: Download message compression: is_body_compressed = 1, compressed_body_size=333927, uncompressed_body_size=390100 07-23 20:31:35.643 3340-3382/com.automotive.tracker D/REALM_SYNC: Connection[1]: Session[1]: Received: DOWNLOAD(scan_server_version=4, scan_client_version=0, latest_server_version=351, latest_server_session_ident=5729703240262368978, latest_client_version=0, downloadable_bytes=68717705, number_of_changesets=1)

07-23 20:31:35.643 3340-3382/com.automotive.tracker D/REALM_SYNC: Using already open Realm file: /storage/sdcard0/KentTracker/Sync/9d8b5a513622a769155609c4a1f44d23/9d8b5a513622a769155609c4a1f44d23/2056695815/private.realm 07-23 20:31:37.153 3340-3382/com.automotive.tracker D/REALM_SYNC: Connection[1]: Session[1]: 1 remote changeset integrated, producing client version 5

07-23 20:31:37.153 3340-3382/com.automotive.tracker D/REALM_SYNC: Using already open Realm file: /storage/sdcard0/KentTracker/Sync/9d8b5a513622a769155609c4a1f44d23/9d8b5a513622a769155609c4a1f44d23/2056695815/private.realm 07-23 20:31:37.153 3340-3382/com.automotive.tracker D/REALM_SYNC: Connection[1]: Session[1]: Progress handler called, downloaded = 1185725, downloadable = 68717705, uploaded = 0, uploadable = 0, progress version = 1, snapshot version = 5 07-23 20:31:37.163 3340-3382/com.automotive.tracker D/REALM_SYNC: message_type = download 07-23 20:31:37.173 3340-3382/com.automotive.tracker D/REALM_SYNC: Download message compression: is_body_compressed = 1, compressed_body_size=326705, uncompressed_body_size=389846 07-23 20:31:37.173 3340-3382/com.automotive.tracker D/REALM_SYNC: Connection[1]: Session[1]: Received: DOWNLOAD(scan_server_version=5, scan_client_version=0, latest_server_version=351, latest_server_session_ident=5729703240262368978, latest_client_version=0, downloadable_bytes=68717705, number_of_changesets=1)

07-23 20:31:37.173 3340-3382/com.automotive.tracker D/REALM_SYNC: Using already open Realm file: /storage/sdcard0/KentTracker/Sync/9d8b5a513622a769155609c4a1f44d23/9d8b5a513622a769155609c4a1f44d23/2056695815/private.realm 07-23 20:31:38.633 3340-3382/com.automotive.tracker D/REALM_SYNC: Connection[1]: Session[1]: 1 remote changeset integrated, producing client version 6

07-23 20:31:38.633 3340-3382/com.automotive.tracker D/REALM_SYNC: Using already open Realm file: /storage/sdcard0/KentTracker/Sync/9d8b5a513622a769155609c4a1f44d23/9d8b5a513622a769155609c4a1f44d23/2056695815/private.realm 07-23 20:31:38.633 3340-3382/com.automotive.tracker D/REALM_SYNC: Connection[1]: Session[1]: Progress handler called, downloaded = 1575546, downloadable = 68717705, uploaded = 0, uploadable = 0, progress version = 1, snapshot version = 6 07-23 20:31:38.643 3340-3382/com.automotive.tracker D/REALM_SYNC: message_type = download 07-23 20:31:38.653 3340-3382/com.automotive.tracker D/REALM_SYNC: Download message compression: is_body_compressed = 1, compressed_body_size=313076, uncompressed_body_size=375174 07-23 20:31:38.653 3340-3382/com.automotive.tracker D/REALM_SYNC: Connection[1]: Session[1]: Received: DOWNLOAD(scan_server_version=6, scan_client_version=0, latest_server_version=351, latest_server_session_ident=5729703240262368978, latest_client_version=0, downloadable_bytes=68717705, number_of_changesets=1)

and this is to be continued to the snapshot version = n untill unless the size of downloadable == downloaded.

realmdownload issue

And this much data is actually downloaded from the realm object server and written to the sd card.

Steps to Reproduce

Just commit media file to the ROS and delete them now take a fresh mobile device and enable sync for the same path.

Code Sample

mPrivateConfig = new SyncConfiguration
        .Builder(user, url)
        .directory(Methods.getSyncDirectory())
        .modules(new PrivateSyncModule())
        .name(REMOTE_PRIVATE_DIRECTORY + ".realm")
        .schemaVersion(2)
        .waitForInitialRemoteData()
        .errorHandler(new SyncSession.ErrorHandler() {
            @Override
            public void onError(SyncSession session, ObjectServerError error) {
                RemoteLogger.e("Realm PrivateConfig", error.getMessage(), error);
                AppLoger.e("Realm PrivateConfig" + error.getMessage());
                error.printStackTrace();
            }
        })
        .build();
mPrivateRealmAsync = Realm.getInstanceAsync(mPrivateConfig, new Realm.Callback() {
    @Override
    public void onSuccess(Realm realm) {
        mRealmPrivate = realm;
        mPrivateRealmState = RealmState.REMOTE;
        AppLoger.e("REALM_SYNC onSuccess:");
        RealmQuery<MediaFile> query = realm.where(MediaFile.class);
        final RealmResults<MediaFile> result1 = query.findAll();
        AppLoger.e("REALM_SYNC Media File size:" + result1.size());
        long length = 0;
        for (MediaFile m : result1) {
            length += m.getByteArray().length;
            AppLoger.e("REALM_SYNC total length:" + length);
            AppLoger.e("REALM_SYNC media file :" + m.toString());
        }
    }
    @Override
    public void onError(Throwable exception) {
        super.onError(exception);
        AppLoger.e("REALM_SYNC:" + exception.getMessage());
    }
});

Version of Realm and Tooling

bigfish24 commented 7 years ago

@anujjpandey11 if I understand this issue correctly, this is the expected outcome. Today, Realm's sync works off syncing the operation log and we do not compact the log. Thus if you add an operation, including writing a large binary blob, this is appended to the log and will be synced to all devices, even if ultimately later operations deleted it.

In the coming months we will be working on log compaction to improve this situation.

eXeDK commented 7 years ago

What is the timeline of log compaction to be implemented? I don't see us using ROS in production without this feature.

bigfish24 commented 7 years ago

@eXeDK is your main concern the initial download?

eXeDK commented 7 years ago

Yes it is. Having to sync the entire history of the Realm file at startup don't seem optimal. Hopefully we'll don't have media files in our Realm files so we should be able to keep the size down. However, I still see this as a crucial feature.

Can you reveal some of the timeline for this specific feature?

bigfish24 commented 7 years ago

@eXeDK we are starting work on it which will be going into a RMP 2.0 effort. The target for that is in September, though we might be able to offer early previews sooner.

eXeDK commented 7 years ago

Okay that sounds awesome @bigfish24. We'll start our internal testing of ROS next week hopefully and then we'll start evaluating in internal builds before we decide whether or not we want to push towards production.

zachwhelchel commented 7 years ago

@bigfish24

We are seeing a similar issue. Currently seeing download sizes of > 300mb for a 1.7mb compacted realm. Granted the realm does have quite a bit of history in its logs but am I incorrect in thinking there should be a "delta logs" solution to this?

On initial download don't look at the logs at all... but save a timestamp. On subsequent downloads use the timestamp to only fetch a "delta log" of changes since the last time, etc.

The RMP isn't really viable to us until something like this is implemented. As our data is managed by clients and changes frequently (thus many history logs).

We thought about periodically compacting on the server and always having a compact version for the mobile clients to consume but I'm assuming this wouldn't work as they likely need the log in order to update properly.

Without this we're back to using plain old realm mobile database and writing our own syncing engine...

bigfish24 commented 7 years ago

@zachwhelchel thanks for this info. Quite confident we will be solving this for you. There is a PR internally to compact the history, which will be able to run on both the client and server. We are targeting this PR for the 2.0 release in September, but there will be previews leading up to that. Stay tuned!

zachwhelchel commented 7 years ago

@bigfish24 awesome. Thats really encouraging to hear. Do you mind explaining a bit more about what you mean in "compacting the history"? Will this be a delta like approach or will it still be an ever-growing file size for the logs? Can we expect the download size of a 1.7mb realm file to be closer to 1.7mb? Or roughly 2x, 4x, 10x?

bigfish24 commented 7 years ago

Can we expect the download size of a 1.7mb realm file to be closer to 1.7mb? Or roughly 2x, 4x, 10x?

The current PR can scan the history and compact SET operations, which for most use cases will be a dramatic reduction (i.e. remove all SET operations but the last). The operations that are hard to compact are operations on List properties, due to the inherent ordering. So unless you are using List a lot, it is likely that this initial version will be able to keep the history close to the actual state in size.

The current plan is to have this compaction run on the server, meaning clients will upload their operations and then the server will compact the operations periodically so other clients don't have to download as much. We are also exploring performing the compaction on the client when it is offline, so that once the network connection is reestablished it uploads less. There is a balance between uploading immediately vs. spending time compacting.

Secondarily, we are also working on functionality to download the Realm file on first connect vs. the history. This will only be available via the asyncOpen API, but this will also help alongside the compaction to make sure when clients first connect they only have to download the actual Realm data vs. any history.

We think the combination of both of these will mostly eliminate the issue.

zachwhelchel commented 7 years ago

@bigfish24 best news I've heard all day. Thanks for the insight!

Maxxan commented 7 years ago

So unless you are using List a lot, it is likely that this initial version will be able to keep the history close to the actual state in size.

What if I use a lot of lists? Is there any hope then for a much smaller data size, closer to the compacted file size? :)

For example I have a list of areas, and each area has a list of streets, and each street have a list of addresses, and each address have a list of contacts. And streets with all the addresses and contacts will be added or deleted often and status of a contact will also change often. So a lot of changes in the data!

I hope that version 2 will also solve the problem with everything memory mapped. Not very possible when the database becomes large due to many users, like 500GB-1TB..

bigfish24 commented 7 years ago

There might be ways to optimize lists, but we aren't working on that at the moment. One thing that has been proposed separately that would be easier to compact would be to offer a Set type that has no inherent ordering, would that still work for your use-case?

bigfish24 commented 7 years ago

I should qualify that the operations related to List that are hard to compact are changes to the list itself: insert, delete, or moves. Changing the objects that happened to be in a List are different operations.

Maxxan commented 7 years ago

Thanks for your clarification. What do you mean by "inherit ordering"?

If I have object A, that has a list of object B and I remove object A from the realm (a custom delete function that calls delete from realm on the list so also all B objects will be removed), will it then store a delete operation for each object B in the lest or just the deletion of object A, which in turn would mean all members, lists and objects in the list is deleted?

nirinchev commented 7 years ago

On a related note, if you don’t care about the order of addresses, streets and so on, you can use inverse relationship to represent the same model, i.e. Street has a property Area. This will not preserve ordering but will make the file size smaller.

Jonsapps commented 6 years ago

Is there any update on the feature to sync the latest changes or compact the history? We are finding this to be a real problem now. We launched an app in June and the sync of the initial data has slowly taken longer and longer (due to more and more transactions) we are now at the point where the sync on a clean install takes around 1 hour to complete.

bigfish24 commented 6 years ago

Apologies that this isn’t documented well but we have released log compaction but it needs enabled when you start the server. Follow these instructions here: https://github.com/realm/realm-object-server/issues/127#issuecomment-349630076

We will update our public docs to call this out.

Jonsapps commented 5 years ago

I've recently update our self hosted Realm server to v3.19.0 after running v3.4.5 since my last comments here.

My reason for upgrading and buying a licence was in the hope that the historyTtl setting would resolve the issues described here. So far I am unable to notice any difference. Our users are experiencing issues again because the amount of data syncing from the realm is in excess of 2GB so on their initial sync it will fall over at about 85% complete with the "mmap() failed: Cannot allocate memory size:" error.

I have set the historyTtl setting to 30 days. Is my understanding wrong that setting it to 30 should mean new clients will only take the transaction history of the last 30 days and should therefore sync a much much smaller set of data? Currently with historyTtl set to 30 days or disabled I always see the sync attempting to sync over 2GB of data.

The size reporting in the new Realm Studio is really handy, but it shows me a realm size of 6.98GB and a data size of 160MB. If this means what I think then the transaction history is massively bloating our realm (which is understandable because we perform a lot of transactions) but our client devices are connected daily so we really don't need all that history.

ianpward commented 5 years ago

@Jonsapps please open a ticket at support.realm.io - you will also need to set enableLogCompaction: true, in your ROS index.ts for historyttl to trigger