realm / realm-core

Core database component for the Realm Mobile Database SDKs
https://realm.io
Apache License 2.0
1.01k stars 155 forks source link

Bad changeset due to untouched reciprocal history entry #7893

Closed sync-by-unito[bot] closed 4 weeks ago

sync-by-unito[bot] commented 1 month ago

Our soak tests ran into a bad changeset in PBS performing the following series of events

T1: client version 4996: local ArrayInsert to obj.intListField

T2: client receives SV 6658: Clear obj.intListField

T3: client receive SV 6723: ArrayInsert to obj.list

Based on this it looks like the client is running OT and getting the correct output (as it gets applied to state properly), but the OT output is not being reflected in reciprocal history

EVG Task

Relevant changesets from server history dump (Dump can be mongorestored from __realm_sync.gz)

// Changeset that the local change at T1 merged against
{ timestamp: 287415144155, _id: 65c629de298546c714dc15e2, createdAt: '2024-02-09 13:34:22 +0000 UTC', fileIdent: 3, clientVersion: 4736, serverVersion: 6658, instructions: [
         Update { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="_pKey"), payload: "hiwqwluokygetwkfmj", isDefault: false, priorSize: 0 },
         Clear { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld"), priorSize: 0 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.0"), payload: 3255124513315405, priorSize: 0 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.1"), payload: 1152914988180245, priorSize: 1 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.2"), payload: 8652981199148631, priorSize: 2 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.3"), payload: 5257082114615562, priorSize: 3 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.4"), payload: 4962132771478012, priorSize: 4 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.5"), payload: 610258687391357, priorSize: 5 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.6"), payload: 8030180275848615, priorSize: 6 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.7"), payload: 2692238070770941, priorSize: 7 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.8"), payload: 8389870175223855, priorSize: 8 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.9"), payload: 2550864430468186, priorSize: 9 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.10"), payload: 2448326381062568, priorSize: 10 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.11"), payload: 3920266420108281, priorSize: 11 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.12"), payload: 4748114750730776, priorSize: 12 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.13"), payload: 5290562076489202, priorSize: 13 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.14"), payload: 4763018075988447, priorSize: 14 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.15"), payload: 5558702512205115, priorSize: 15 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.16"), payload: 1470857233029320, priorSize: 16 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.17"), payload: 2414691363089715, priorSize: 17 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.18"), payload: 6136840945697697, priorSize: 18 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.19"), payload: 6240611872508867, priorSize: 19 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.20"), payload: 696600421386660, priorSize: 20 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.21"), payload: 3759166984809251, priorSize: 21 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.22"), payload: 6200379315555408, priorSize: 22 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.23"), payload: 3740818814444825, priorSize: 23 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.24"), payload: 5052040814781635, priorSize: 24 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a
205"), fullPath="intListFld.25"), payload: 8632950791858863, priorSize: 25 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.26"), payload: 268386300614666, priorSize: 26 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.27"), payload: 7300284751995718, priorSize: 27 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.28"), payload: 4891235888437641, priorSize: 28 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.29"), payload: 4931806891077047, priorSize: 29 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.30"), payload: 59838987277831, priorSize: 30 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.31"), payload: 1391073917123324, priorSize: 31 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.32"), payload: 6926012470475492, priorSize: 32 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.33"), payload: 2566972138449680, priorSize: 33 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.34"), payload: 8381966378538948, priorSize: 34 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.35"), payload: 5199522972991873, priorSize: 35 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.36"), payload: 3893666273634272, priorSize: 36 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.37"), payload: 7564985999791688, priorSize: 37 },
intListFld.26"), payload: 268386300614666, priorSize: 26 },
         Update { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="_pKey"), payload: "hiwqwluokygetwkfmj", isDefault: false, priorSize: 0 },
         Clear { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld"), priorSize: 0 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.0"), payload: 3255124513315405, priorSize: 0 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.1"), payload: 1152914988180245, priorSize: 1 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.2"), payload: 8652981199148631, priorSize: 2 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.3"), payload: 5257082114615562, priorSize: 3 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.4"), payload: 4962132771478012, priorSize: 4 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.5"), payload: 610258687391357, priorSize: 5 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.6"), payload: 8030180275848615, priorSize: 6 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.7"), payload: 2692238070770941, priorSize: 7 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.8"), payload: 8389870175223855, priorSize: 8 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.9"), payload: 2550864430468186, priorSize: 9 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.10"), payload: 2448326381062568, priorSize: 10 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.11"), payload: 3920266420108281, priorSize: 11 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.12"), payload: 4748114750730776, priorSize: 12 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.13"), payload: 5290562076489202, priorSize: 13 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.14"), payload: 4763018075988447, priorSize: 14 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.15"), payload: 5558702512205115, priorSize: 15 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.16"), payload: 1470857233029320, priorSize: 16 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.17"), payload: 2414691363089715, priorSize: 17 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.18"), payload: 6136840945697697, priorSize: 18 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.19"), payload: 6240611872508867, priorSize: 19 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.20"), payload: 696600421386660, priorSize: 20 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.21"), payload: 3759166984809251, priorSize: 21 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.22"), payload: 6200379315555408, priorSize: 22 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.23"), payload: 3740818814444825, priorSize: 23 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.24"), payload: 5052040814781635, priorSize: 24 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.25"), payload: 8632950791858863, priorSize: 25 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.26"), payload: 268386300614666, priorSize: 26 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.27"), payload: 7300284751995718, priorSize: 27 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.28"), payload: 4891235888437641, priorSize: 28 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.29"), payload: 4931806891077047, priorSize: 29 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.30"), payload: 59838987277831, priorSize: 30 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.31"), payload: 1391073917123324, priorSize: 31 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.32"), payload: 6926012470475492, priorSize: 32 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.33"), payload: 2566972138449680, priorSize: 33 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.34"), payload: 8381966378538948, priorSize: 34 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.35"), payload: 5199522972991873, priorSize: 35 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.36"), payload: 3893666273634272, priorSize: 36 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.37"), payload: 7564985999791688, priorSize: 37 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.38"), payload: 8428617618974834, priorSize: 38 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.39"), payload: 4842348580420880, priorSize: 39 },
         Update { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="linkField"), payload: RealmLink(table: "collRequiredLinked", key: ObjectID("01bfab61b060cd39cd577edf")), isDefault: false, priorSize: 0 },
  ]
}

// Changeset that triggered integration error
{ timestamp: 287415176158, _id: 65c629df298546c714dc16d1, createdAt: '2024-02-09 13:34:23 +0000 UTC', fileIdent: 3, clientVersion: 4873, serverVersion: 6723, instructions: [
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.11"), payload: 7637604215384347, priorSize: 40 },
  ]
} 
sync-by-unito[bot] commented 1 month ago

➤ On 2024-03-13, danieltabacaru commented:

I've tried reproducing the bug with no success unfortunately. I inspected the realm and it indeed contains the right data it receives at T2. I then used the history command tool to inspect the reciprocal history: the entry is empty as if clear didn't discard the insertion! It seems to be an instance of https://github.com/realm/realm-core/pull/6367, but the fix for it should be in the SDK you're using.

Can you confirm there is no other change sent by the server that updates intListField?

This being said, your soak tests use core 13.14.0 which is 9 months old. You could try using a newer version.

sync-by-unito[bot] commented 1 month ago

➤ On 2024-03-15, danieltabacaru commented:

[~sean.brandenburg@mongodb.com] Is there any way I could run this test locally and try to reproduce it?

sync-by-unito[bot] commented 1 month ago

➤ On 2024-03-18, Sean Brandenburg commented:

The soak tests can be run locally, but I don't know if there's a seeded way to re-run these tests (and I think generally they're not very reproducible since there's a variety of timing related things that can be different between runs)

I took a look at the history dump and it does look like the only changes that made it into the merge window with the local changes are the ones detailed in the ticket (the clear followed by some inserts and the next insert that triggers the prior size mismatch)

I've just attached relevant server history for this object to this ticket.

We can upgrade our sdk version for the soak tests and keep and eye on if this recurs, but it does seem like it could be related to the ticket the fix you've linked here

sync-by-unito[bot] commented 1 month ago

➤ On 2024-03-18, danieltabacaru commented:

Even if it's not always reproducible, it could be valuable to run them locally (that is if I can't figure out what's going on otherwise)

sync-by-unito[bot] commented 1 month ago

➤ On 2024-03-18, Sean Brandenburg commented:

Got it, yeah I've admittedly never run them locally and it doesn't look like we have any docs on doing so, but it should be possible (though not super simple) to run them by following the procedure we use to set up the soak tests in the "local" env in this evergreen file.

The most straightforward thing to do would probably be to run the local variant of these tests on evergreen (PBS-Local-Fast) and sub in your own build of the realm sync state binary here (if your goal is to get more verbose/custom logging)

sync-by-unito[bot] commented 1 month ago

➤ On 2024-06-13, Sean Brandenburg commented:

[~daniel.tabacaru@mongodb.com] As an FYI, we upgraded our soak tests to use the most recent JS SDK and are still hitting this issue

sync-by-unito[bot] commented 1 month ago

➤ On 2024-06-13, danieltabacaru commented:

thanks for the heads up (didn't really expect to go away). just a wild guess, are compensating writes in any way involved? either way, could we try to write a simpler test to repro it? I didn't have any luck with client tests.

sync-by-unito[bot] commented 1 month ago

➤ On 2024-06-26, Kiro Morkos commented:

[~daniel.tabacaru@mongodb.com] no compensating writes (this is happening in PBS which doesn't have compensating writes, and the QBS variants that run into this all have open permissions+truepredicate queries, so no compensating writes there either).

We can try reproducing in a simpler test. If you have a hunch of where things might be going wrong that would help us focus in on which operations might be problematic and should be reproduced.

sync-by-unito[bot] commented 1 month ago

➤ On 2024-06-27, danieltabacaru commented:

[~sean.brandenburg@mongodb.com] The history_dump does not seem to contain the changes at T1. Any chance we can recover those?

sync-by-unito[bot] commented 1 month ago

➤ On 2024-07-08, Sean Brandenburg commented:

[~daniel.tabacaru@mongodb.com] I was having trouble getting the tooling for decoding the raw changeset working, but here's the raw history entry for CV 4996 at T1

{
  "_id": {
    "$oid": "65c629e0298546c714dc17a7"
  },
  "appId": {
    "$oid": "65c626aec6dd2a420f9a8ce4"
  },
  "appPartitionId": {
    "$oid": "65c626b9908fcdd4021b995e"
  },
  "timestamp": {
    "$numberLong": "1707485664"
  },
  "originFileIdent": {
    "$numberLong": "3"
  },
  "version": {
    "$numberLong": "6779"
  },
  "version_salt": {
    "$numberLong": "3112504645168333980"
  },
  "changeset": {
    "$binary": {
      "base64": "NDk5NiA2Nzc5IDI4NzQxNTIwNjY1NSAzIDE3NyA/AAxjb2xsUmVxdWlyZWQ/AQ1saW5rTGlzdEZpZWxkPwIKaW50TGlzdEZsZD8DEmNvbGxSZXF1aXJlZExpbmtlZD8EBV9wS2V5CgAK1bW9jQnmioAnq9XFAQECBQoACsxX9H9YB0/dnaGiaQEBFyAEAApfYsfOjCoEtHk3zgkCARYBm8L7jIvM6QwbBAMKvwICgu+KgwqULk7oBAADEmhpd3F3bHVva3lnZXR3a2ZtagA=",
      "subType": "00"
    }
  },
  "history_size": {
    "$numberLong": "9695159"
  },
  "clientVersion": {
    "$numberLong": "4996"
  },
  "workerId": "baas-sync-57b4949f4-tpdxd"
} 
sync-by-unito[bot] commented 1 month ago

➤ On 2024-07-09, danieltabacaru commented:

[~sean.brandenburg@mongodb.com] Thanks. Unfortunately I didn't have much like either decoding the changeset. Shouldn't the file ident be different than 3 (checking the log files I think it should be 1)?

sync-by-unito[bot] commented 1 month ago

➤ On 2024-07-09, Sean Brandenburg commented:

Ah yep, you're right. I was looking at the fileIdent we were merging against. Here's the entry for fileIdent 1.

{
  "_id": {
    "$oid": "65c62a35c142b06dd099c9c5"
  },
  "appId": {
    "$oid": "65c626aec6dd2a420f9a8ce4"
  },
  "appPartitionId": {
    "$oid": "65c626b9908fcdd4021b995e"
  },
  "timestamp": {
    "$numberLong": "1707485749"
  },
  "originFileIdent": {
    "$numberLong": "1"
  },
  "version": {
    "$numberLong": "7440"
  },
  "version_salt": {
    "$numberLong": "2015044307439901013"
  },
  "changeset": {
    "$binary": {
      "base64": "",
      "subType": "00"
    }
  },
  "history_size": {
    "$numberLong": "10510097"
  },
  "clientVersion": {
    "$numberLong": "4996"
  },
  "workerId": "baas-sync-57b4949f4-45qvk"
} 

Also this reproduces 100% of the time in the qbsv2 soak tests so Kiro and I are going to take a look at a fresh run today and will update here with any findings

sync-by-unito[bot] commented 1 month ago

➤ On 2024-07-10, Sean Brandenburg commented:

[~daniel.tabacaru@mongodb.com] Kiro and I were able to get a repro with the apply to state tool working (In QBS). I've attached the client logs with parsed instructions and a file containing the messages for the repro

[^device_2.parsed.log]

[^repro_messages.txt]

sync-by-unito[bot] commented 1 month ago

➤ On 2024-07-11, Kiro Morkos commented:

I was able to narrow down the repro even further (only 5 downloads + 2 uploads) - [^messages_668f2ef6e8ed03bf530574a7.txt]

Taking a look at the contents of mainline vs reciprocal history in the realm file shows the bug pretty clearly I think:

[ 09:09 PM kmorkos@W-PW0959CH: /tmp/gotest_1651888681  ] 
> ~/dev/realm-core/build/src/realm/sync/tools/realm-hist-dbg ./gotest_668f2ef6e8ed03bf530574a6 -V -f changeset
# Version 6
InternStrings   0="collRequired", 1="intListFld"
Update          path=collRequired[ObjectId{517c5129a9090f09b83dff2b}].intListFld[3], value=Int(2923615466627780), prior_size=14
# Version 7
InternStrings   0="collRequired", 1="intListFld"
Update          path=collRequired[ObjectId{517c5129a9090f09b83dff2b}].intListFld[5], value=Int(5405916949283399), prior_size=14
# Version 8
InternStrings   
3 history entries (version 5 -> 8)
[ 09:09 PM kmorkos@W-PW0959CH: /tmp/gotest_1651888681  ] 
> ~/dev/realm-core/build/src/realm/sync/tools/realm-hist-dbg ./gotest_668f2ef6e8ed03bf530574a6 -V -f changeset -r 0
# Version 6
InternStrings   0="collRequired", 1="intListFld"
Update          path=collRequired[ObjectId{517c5129a9090f09b83dff2b}].intListFld[3], value=Int(2923615466627780), prior_size=12
# Version 7
InternStrings   0="collRequired", 1="intListFld"
Update          path=collRequired[ObjectId{517c5129a9090f09b83dff2b}].intListFld[5], value=Int(5405916949283399), prior_size=14
# Version 8
InternStrings   
3 history entries (version 5 -> 8)
[ 09:09 PM kmorkos@W-PW0959CH: /tmp/gotest_1651888681  ] 
> 

Note that the prior size of the first local change is properly adjusted down, but the prior size of the second one is not. However, during the first merge it appears that the in-memory changeset is properly merged against based on the log, the dirty reciprocal history is just not persisted it seems. This explains why in every repro scenario it's always the second incoming download that hits the issue.

Output of the repro for posterity -

> realm-apply-to-state -r $(mktemp realmXXX) -i ./messages_668f2ef6e8ed03bf530574a7.txt --verbose
Realm.Sync.Client.Changeset - decoding download message. {download: {server: 1, client: 0} upload: {server: 0, client: 0}, latest: 2189552}
Realm.Sync.Client.Changeset - found download changeset: serverVersion: 1, clientVersion: 0, origin: 666 [changeset with 2 instructions]
Realm.Sync.Client.Changeset - Integrated 1 changesets out of 1
Realm.Sync.Client.Changeset - decoding download message. {download: {server: 1102, client: 0} upload: {server: 1102, client: 0}, latest: 2189552}
Realm.Sync.Client.Changeset - found download changeset: serverVersion: 1102, clientVersion: 0, origin: 7 [changeset with 13 instructions]
Realm.Sync.Client.Changeset - Integrated 1 changesets out of 1
Realm.Sync.Client.Changeset - decoding download message. {download: {server: 1146, client: 0} upload: {server: 1146, client: 0}, latest: 2189552}
Realm.Sync.Client.Changeset - found download changeset: serverVersion: 1146, clientVersion: 0, origin: 7 [changeset with 2 instructions]
Realm.Sync.Client.Changeset - Integrated 1 changesets out of 1
Realm.Sync.Client.Changeset - found upload changeset: 1146 8 300550592903 0 56
Realm.Sync.Client.Changeset - Decoded changeset: [changeset with 1 instructions]
Realm - integrated local changesets as version 6
Realm.Sync.Client.Changeset - found upload changeset: 1146 9 300550593403 0 56
Realm.Sync.Client.Changeset - Decoded changeset: [changeset with 1 instructions]
Realm - integrated local changesets as version 7
Realm.Sync.Client.Changeset - decoding download message. {download: {server: 1183, client: 0} upload: {server: 1183, client: 0}, latest: 18351976}
Realm.Sync.Client.Changeset - found download changeset: serverVersion: 1183, clientVersion: 0, origin: 7 [changeset with 2 instructions]
Realm - Scanning incoming changeset [1/1] (2 instructions)
Realm.Sync.Client.Changeset - Scanning local changeset [1/2] (1 instructions)
Realm.Sync.Client.Changeset - Scanning local changeset [2/2] (1 instructions)
Realm.Sync.Client.Changeset - Indexing incoming changeset [1/1] (2 instructions)
Realm.Sync.Client.Changeset - Finished changeset indexing (incoming: 1 changeset(s) / 2 instructions, local: 2 changeset(s) / 2 instructions, conflict group(s): 1)
Realm.Sync.Client.Changeset - Transforming local changeset [1/2] through 1 incoming changeset(s) with 1 conflict group(s)
Realm.Sync.Client.Changeset - Transforming local changeset [2/2] through 1 incoming changeset(s) with 1 conflict group(s)
Realm.Sync.Client.Changeset - Finished transforming 2 local changesets through 1 incoming changesets (2 vs 2 instructions, in 1 conflict groups)
Realm.Sync.Client.Changeset - Integrated 1 changesets out of 1
Realm.Sync.Client.Changeset - decoding download message. {download: {server: 1189, client: 0} upload: {server: 1189, client: 0}, latest: 18351976}
Realm.Sync.Client.Changeset - found download changeset: serverVersion: 1189, clientVersion: 0, origin: 4 [changeset with 1 instructions]
Realm - Scanning incoming changeset [1/1] (1 instructions)
Realm.Sync.Client.Changeset - Scanning local changeset [1/2] (1 instructions)
Realm.Sync.Client.Changeset - Scanning local changeset [2/2] (1 instructions)
Realm.Sync.Client.Changeset - Indexing incoming changeset [1/1] (1 instructions)
Realm.Sync.Client.Changeset - Finished changeset indexing (incoming: 1 changeset(s) / 1 instructions, local: 2 changeset(s) / 2 instructions, conflict group(s): 1)
Realm.Sync.Client.Changeset - Transforming local changeset [1/2] through 1 incoming changeset(s) with 1 conflict group(s)
Realm.Sync.Client.Changeset - Transforming local changeset [2/2] through 1 incoming changeset(s) with 1 conflict group(s)
terminate called after throwing an instance of 'realm::sync::IntegrationException'
  what():  Failed to transform received changeset: Merge error: left.prior_size == right.prior_size (instruction target: collRequired[ObjectId{517c5129a9090f09b83dff2b}].intListFld[11]). Please contact support.
Aborted
sync-by-unito[bot] commented 1 month ago

➤ On 2024-07-12, danieltabacaru commented:

Thanks a lot guys! This is very helpful. I hope to get to the bottom of it now. I'll keep you in the loop.

sync-by-unito[bot] commented 1 month ago

➤ PM Bot commented:

Jira ticket: RCORE-1977

sync-by-unito[bot] commented 1 month ago

➤ danieltabacaru commented:

I managed to repro it and a fix is on its way. The issue is due to a subtle bug when saving empty reciprocal changesets (while at least one other non-empty reciprocal changeset at a previous version needs to be saved too). Once again, thanks a lot guys. [~kiro.morkos@mongodb.com] fyi, your repro cannot be fully applied by the tool ({_}Assertion failed: changeset.last_integrated_remote_version <= local_version with (changeset.last_integrated_remote_version, localversion) = [360, 9]{}). May you copy-pasted from a bigger file?

sync-by-unito[bot] commented 1 month ago

➤ Kiro Morkos commented:

Nice!! Good to hear our theory was pretty close!

The messages I shared are being applied properly when I run the tool, but it's possible I'm using a different version. It seems like you already have your own repro now, but if ours would still be useful I can look into it. Let me know

sync-by-unito[bot] commented 1 month ago

➤ danieltabacaru commented:

No need to look further into it. The bug manifests before applying that download message. It was mostly to confirm if the assertion failure is expected or not. There are cases when there will be no crash, but with an empty realm there is no way the local client version will get to 360 before applying that message.

sync-by-unito[bot] commented 1 month ago

➤ Kiro Morkos commented:

Ah okay yea you're right, the header on that last changeset is incorrect, but the bug is manifesting before the invariant triggers an assertion failure.