realm / realm-object-server

Tracking of issues related to the Realm Object Server and other general issues not related to the specific SDK's
https://realm.io
293 stars 42 forks source link

Crash when partial sync is used across multiple users/devices #321

Closed Dids closed 6 years ago

Dids commented 6 years ago

Goals

Trying to use Realm .NET in mobile applications, in conjunction with the free version of Realm Object Server, to provide real-time interaction/database across multiple users and multiple devices, including a single user on multiple devices at the same time.

Partial sync is also enabled, as this product revolves around a single Realm database, which is then shared across dozens if not hundreds of users, where partial syncing makes perfect sense.

Expected Results

Partial sync works and Realm Object Server does not crash.

Actual Results

error: [sync] ServerFile[/app/__partial/76a5860a21f868b65bb494193494be48790f73a2]: Partial sync: Table 'class_class_Destination' missing in partial Realm
terminate called after throwing an instance of 'std::runtime_error'
  what():  PartialSync: Schema mismatch
Aborted (core dumped)

Steps to Reproduce

Not sure, as this is a very large application that's been in active development for a year. I'm assuming it happens when two users attempt to access the same data (doesn't seem to matter if it's two different users, or one user but different devices).

I'm assuming it's related to partial syncing, judging by the error message, but it's hard to tell or debug any further.

Version of Realm and Tooling

Dids commented 6 years ago

It's worth noting that the only fix to this seems to be to purge the database, as otherwise ROS will crash indefinitely as long as a user is trying to open a Realm/log in (at least if they belong to any of the data here).

Also note that all of my users are currently admins, just to make it easier to deal with a single, shared Realm, if that could be related somehow.

nirinchev commented 6 years ago

Ping @kspangsege - it seems like something is looking for a doubly-prefixed table: class_class_Destination. Do you have any idea what might be causing this?

kspangsege commented 6 years ago

My analysis of the partial sync code, as it presently is in develop branch, tells me that the indicated error message (with the double class_ prefix) can only arise if

@simonask , as things stand right now, are Instruction::AddColumn::link_target_table and Instruction::Payload::Link::target_table supposed to contain the class_ prefix or not?

Note that Realm Object Server 2.4.2 implies Realm Sync 2.1.10, which is a few versions behind the present state, however, as far as I can see, nothing has changed since 2.1.10 which is significant in the context of the analysis above.

kspangsege commented 6 years ago

If the error is due to Instruction::AddColumn::link_target_table and/or Instruction::Payload::Link::target_table already containing the class_ prefix, then there are three possibilities:

  1. They are supposed to contain that prefix, and the error is that partial sync adds another class_ prefix.
  2. They are not supposed to contain that prefix, and the prefix is erroneously added during parsing of the changeset.
  3. They are not supposed to contain that prefix, and the prefix is erroneously added on the client side.
kspangsege commented 6 years ago

@Dids , would you be able to dig out the reference and partial Realm files from the server? It would be helpful to be able to see whether they contain tables with double class_ prefix.

If the Realm name is foo, then the name of the reference Realm file is foo.realm, and the name of the partial Realm file is on the form foo/__partial/<unique device ident>.realm.

Dids commented 6 years ago

@kspangsege Sure thing, here you go.

tyomaarain.realm (46.1 MB): https://www.dropbox.com/s/r8hvaxb0jaf7lle/tyomaarain.realm?dl=0

76a5860a21f868b65bb494193494be48790f73a2.realm (83.9 MB): https://www.dropbox.com/s/99gh7fu3dx38mj1/76a5860a21f868b65bb494193494be48790f73a2.realm?dl=0

I'm assuming that both file sizes are "normal", considering that these only contain a few hundred simple objects (with relationships and backlinks)?

Regarding the version of Realm Object Server, version 2.4.2 is the latest available on npm.

kspangsege commented 6 years ago

Alright, I was able to reproduce the crash with the files you provided. In fact, with a debug build of the server (containing assertions), the crash happens earlier due to the following assertion violation:

ServerFile[/foo/__partial/76a5860a21f868b65bb494193494be48790f73a2]: PartialSync: Update triggered: partial_version=34->34, reference_version=23->27
ServerFile[/foo/__partial/76a5860a21f868b65bb494193494be48790f73a2]: PartialSync: Integrated reference version 24 into partial Realm
ServerFile[/foo/__partial/76a5860a21f868b65bb494193494be48790f73a2]: PartialSync: Integrated reference version 25 into partial Realm
noinst/server_history.cpp:3117: [realm-core-4.0.4] Assertion failed: status == Status::pending_creation

The actual status is Status::preexisting.

Also, none of the files actually contain a table with a double class_ prefix.

Dids commented 6 years ago

Could the double class_ prefix be happening client-side (iOS) instead, which might explain why ROS is unable to find the table in the first place? Unless it's unrelated and the actual issue is elsewhere, judging by the new assertion you mentioned.

kspangsege commented 6 years ago

Could the double class_ prefix be happening client-side (iOS) instead

That appears to be the case. Apparently, the payload.data.link.target_table property of Instruction::ContainerInsert already contains the class_ prefix, but partial sync adds another prefix.

@simonask Can you clear up this confusion? Is Instruction::ContainerInsert::payload.data.link.target_table supposed to include the prefix?

kspangsege commented 6 years ago

The issue with the unexpected status seems to be separate / unrelated.

kspangsege commented 6 years ago

Can you clear up this confusion? Is Instruction::ContainerInsert::payload.data.link.target_table supposed to include the prefix?

@simonask Also, is this something that has changed recently?

kspangsege commented 6 years ago

@simonask Note that this issue revolves around version 2.1.10 of Realm sync.

kspangsege commented 6 years ago

I have filed the two issues in the sync repo: https://github.com/realm/realm-sync/issues/1894, https://github.com/realm/realm-sync/issues/1895.

@Dids We will let you know when the issues are resolved.

simonask commented 6 years ago

Is Instruction::ContainerInsert::payload.data.link.target_table supposed to include the prefix?

No, the class_ prefix is never supposed to be included anywhere in the instruction log. It is supposed to be an implementation detail.

We fixed a bug a few weeks ago related to mistakenly including the prefix. Other bugs like that may exist, though we did make an effort to comb through the code for similar bugs at the time.

morten-krogh commented 6 years ago

I can reproduce the issue with class_class_destination and it is likely because this Realm is made before we fixed this issue.

We will make code changes that will fix this issue. We can hopefully soon release the upgraded code.

morten-krogh commented 6 years ago

@Dids We know why this error happens. It is because we had a bug in some previous versions of the Realm object server. In certain instructions, there is an extra class_ in front of the class name. That is why you see class_class_destination. It should have been classdestination. It is not in the schema. The schema is correct. It is just an instruction in the history that unfortunately has this extra class.

We could change the code to remove the additional class. However, that might interfere with other users that happen to use class in front of their own class names.

If you can discard the data and start from scratch, the problem shouldn't be there. If not, you can copy the data over in a new reference Realm and discard the partial Realms and reset the clients. If that isn't possible either, we could possibly perform a manual repair of the Realm.

What is your situation regarding this Realm?

We are really sorry about this problem.

Dids commented 6 years ago

@morten-krogh I've started from scratch numerous times, including when initially posting this issue.

This means that using Realm .NET v2.1.0 and Realm Object Server v2.4.2 still have this issue, from what I can tell at least.

There haven't been any updates to either package (not on nuget.org or npm, at least), so I'm up to date as far as official releases are concerned.

Dids commented 6 years ago

@morten-krogh Although if you're 100% certain that the issue does not exist in the versions above, I could try and see if I have any clients using either old installations or upgraded installations, which might explain it as well.

I'm assuming I could simply change the name of the Realm itself, as well as deploy an update to my clients, to test if that's the case?

nirinchev commented 6 years ago

@morten-krogh does the problem occur just on the server? If that's the case ROS 2.4.2 ships with sync 2.1.10 - has the bug been fixed in that version or does it need to be updated to a later one?

morten-krogh commented 6 years ago

@Dids By scratch, you mean from an empty Realm. You delete all server Realms?

Dids commented 6 years ago

@morten-krogh Yes, I've deleted all data to make sure I've started from a fresh ROS install.

morten-krogh commented 6 years ago

Okay, thanks.

It should have been fixed in 2.1.1. There might be a similar bug that we didn't find. I will look for it. I might need to get a repro case from you

@nirinchev The problem is also client related.

morten-krogh commented 6 years ago

Do you have clients that are older than 2.1.1?

Dids commented 6 years ago

@morten-krogh Not sure, since nuget.org shows the Realm .NET version as 2.1.0: https://www.nuget.org/packages/Realm/

All clients should be on that version though.

morten-krogh commented 6 years ago

2.1.0 is exactly too old, if it is the same versions as I talk about. @nirinchev Are the version numbers the same?

nirinchev commented 6 years ago

No - the .NET SDK is using Sync 2.0.1 which is fairly old it seems. If it's a client bug, then we'll need to release a new version of the SDK.

kspangsege commented 6 years ago

@nirinchev I'm pretty sure it is a client-side bug, so an upgrade to a more recent Realm sync version will be needed.

morten-krogh commented 6 years ago

Yes, everything makes sense now. Just release a new version of the client SDK and use the new version.

nirinchev commented 6 years ago

An updated realm-dotnet package has been released to the nightly feed.