realm / realm-js

Realm is a mobile database: an alternative to SQLite & key-value stores
https://realm.io
Apache License 2.0
5.62k stars 558 forks source link

SIGSEGV when attempting to resume a Puased syncSession #6558

Open muddy-alex opened 2 months ago

muddy-alex commented 2 months ago

How frequently does the bug occur?

Always

Description

I'm trying to force Realm to perform a sync of all offline changes (it's an asymmetric table).

According to the documentation, we should be able to call realm.syncSession.pause() and realm.syncSession.resume() to perform this operation, however, when calling resume() a SIGSEGV error is thrown.

Is this the correct way to force Realm to perform a sync with AppSync?

Stacktrace & log output

No response

Can you reproduce the bug?

Always

Reproduction Steps

No response

Version

12.6.2

What services are you using?

Both Atlas Device Sync and Atlas App Services

Are you using encryption?

No

Platform OS and version(s)

OSX/Linux

Build environment

NodeJS

Cocoapods version

No response

sync-by-unito[bot] commented 2 months ago

➤ PM Bot commented:

Jira ticket: RJS-2762

kneth commented 2 months ago

@muddy-alex There is no API to force sync as the sync client works mostly autonomously. pause()/resume() can trigger the sync client to take an extra round trip.

If it is possible, please share the stack trace. resume() should not cause a SIGSEGV.

muddy-alex commented 1 month ago

No stack trace, unfortunately.

Pause/Resume seems to work before the initial sync has been completed. However, once it has (based on the realm.syncSession.addConnectionNotification event handler being called), calling pause/resume causes the process to abruptly exit.

This is all I get...

Process finished with exit code 139 (interrupted by signal 11:SIGSEGV)
kneth commented 1 month ago

Could be related to https://github.com/realm/realm-core/issues/7349? It is possible for you to downgrade? If it works with an earlier version (12.4.0 or older), it could be an indication for us.

muddy-alex commented 1 month ago

We were originally testing with 12.3.1 when we found the issue, but upgraded to 12.6.2 to check if it was resolved

nirinchev commented 1 month ago

Let me get this straight - initial sync is happening, but you have created some data offline and don't want to wait for initial sync to complete before that data gets uploaded, which is why you're pausing and resuming the session?

As Kenneth said, even though this is likely not the intended flow, we shouldn't be crashing. But understanding the circumstances around the crash will allow us to repro it.

muddy-alex commented 1 month ago

We only have specific time windows on internet access, which is why we need some kind of 'force-sync' feature as we've found that the NodeJS Client doesn't seem to do any further synchronisation during the lifetime of the process (tested over 2 days).

We've found that the initial sync seems to be restricted to 10MB upload, taking the short time window of internet access into account, we would like this 'force-sync' feature to keep pushing up the 10MB chunks until all offline data has been uploaded.

nirinchev commented 1 month ago

Thanks for that context - I'll chat with our sync team and see what they say.

nirinchev commented 1 month ago

Hey, so following that conversation, here's a few more questions.

  1. What do you mean by "initial sync" here? Can you clarify what the flow looks like in your app - possibly with some example code snippets. It doesn't have to be the exact thing, but more for illustrative purposes - e.g.:

    realm.subscriptions.update(...) // Subscribe for new data
    realm.write(() => {
    for (var i = 0; i < 100; i++) {
      realm.create(...); // Create some offline data
    }
    });
    // User goes into internet accessible window
    realm.syncSession.resume();
    // wait for something?
    realm.syncSession.pause();
  2. Where does the 10mb limit come from? There's nothing in the sync protocol that we're aware of that puts 10 mb limit on uploads and there's nothing we're aware of that would cause pausing/resuming session to force sync and have it behave differently compared to the session just being active.
  3. What is the connection state you're waiting for using addConnectionNotification. This API is intended to let you know that a connection is established, but provides no information whatsoever about the state of synchronization. How are you using it and what information are you inferring from the connection state?
  4. If you can consistently reproduce the crash, can you configure logging to log at trace level and get us the logs from a run that reproduces the crash?
danieltabacaru commented 1 month ago

@muddy-alex There is no way currently to force-sync the local data: we first download any data from the server, and then we start uploading the local data. Pausing and resuming the session does not change this behavior in any way. We could add a config option to change that though. If you only have asymmetric tables in the schema, then there is nothing to download so upload would start immediately.

muddy-alex commented 1 month ago

@kneth I can replicate the SIGSEGV here...

import Realm from 'realm';

const ATLAS_APP_ID = '';
const ATLAS_API_KEY = '';

export const Point = {
  name: 'Point',
  embedded: true,
  properties: {
    type: { type: 'string', default: () => 'Point' },
    coordinates: { type: 'list', objectType: 'double' }
  },
};

class Log extends Realm.Object {
   static schema = {
    name: 'Log',
    asymmetric: true,
    primaryKey: '_id',
    properties: {
      _id: 'objectId',
      timestamp: 'date',
      organisation: 'objectId',
      key: 'string',
      value: 'mixed?[]',
      metadata: 'mixed?{}',
      location: 'Point',
      locationFix: 'int'
    }
  };
}

const app = new Realm.App({ id: ATLAS_APP_ID });

Realm.App.Sync.setLogLevel(app, "trace");

const user = app.currentUser
  ? app.currentUser
  : await app.logIn(Realm.Credentials.apiKey(ATLAS_API_KEY));

const realm = await Realm.open({
  schema: [Log, Point],
  sync: {
    cancelWaitsOnNonFatalError: false,
    existingRealmFileBehavior: {
      type: Realm.OpenRealmBehaviorType.OpenImmediately,
      timeOut: 5000,
      timeOutBehavior: Realm.OpenRealmTimeOutBehavior.OpenLocalRealm
    },
    newRealmFileBehavior: {
      type: Realm.OpenRealmBehaviorType.OpenImmediately,
      timeOut: 5000,
      timeOutBehavior: Realm.OpenRealmTimeOutBehavior.OpenLocalRealm
    },
    flexible: true,
    onError: (session, error) => {
      console.error('Sync Error!');
      console.error(error);
    },
    user
  }
});

realm.syncSession.addConnectionNotification(() => {
  console.log('Sync Connected!');
});

realm.syncSession.addProgressNotification(Realm.ProgressDirection.Download, Realm.ProgressMode.ForCurrentlyOutstandingWork, (transferred, transferable) => {
  console.log(`Sync Downloading: ${transferred}/${transferable}`);
});

realm.syncSession.addProgressNotification(Realm.ProgressDirection.Upload, Realm.ProgressMode.ForCurrentlyOutstandingWork, (transferred, transferable) => {
  console.log(`Sync Uploading: ${transferred}/${transferable}`);
});

setTimeout(async () => {
  console.log('Force Sync');
  await realm.syncSession.pause()
  await realm.syncSession.resume()
}, 15 * 1000);
muddy-alex commented 1 month ago

@danieltabacaru it was the documentation here that lead me to believe that this might be possible

muddy-alex commented 1 month ago

@nirinchev When I mention to 'initial sync', I am referring to when Realm uploads offline data for the first time the process starts. Based on my reproduction above, when you see

Sync Connected!
Sync Downloading: 0/0
Sync Uploading: 0/201
Sync Uploading: 201/201

We've found that when this initial sync runs, only a chunk (I can't remember why I assumed it was 10MB) is uploaded - rather than all of the offline data. This meant that we had to restart the process a few times before all data was pushed up to AppSync, hence the reason we are looking for a 'force-sync' solution.

danieltabacaru commented 1 month ago

@muddy-alex I believe the documentation is correct. Forcing sync in that context is forcing connecting to the server, but the behavior is as I described it.

I wrote a similar test and I see no issues with it. I cannot see where class Point is declared in your example. The one you have there looks like a schema. Maybe @kneth can have a look.

danieltabacaru commented 1 month ago

Is there any code following the resume? Maybe something is not cleaned-up properly. You can try to wait for upload completion.

muddy-alex commented 1 month ago

@danieltabacaru Nothing follows the resume(). As soon as its called we get the SIGSEGV

kneth commented 1 month ago

@muddy-alex You are mixing class-based models (Log) and JS schemas (Point), and I wonder if it has a negative impact. I mean, our tests are either-or, so you might be in the land of undefined behavior.

muddy-alex commented 1 month ago

I couldn't find a documented way of using Embedded objects with the Class syntax. Updating the above example to just use JS schemas still results in the SIGSEGV.

kneth commented 1 month ago

@muddy-alex Thank you to getting back. I'll bring up the issue at the next team meeting.