Closed staltz closed 2 years ago
If one deletes the level indexes and rebuild, they also end up at 1384618 entries. And there seems to be corruption in the jitdb indexes as well. I'll do some more testing.
This got me thinking. What version of node is the ios running? I'm wondering if it might have something to do with the M1 chip. It seems like node 16 was the first official release to support the Apple silicon.
Node.js Mobile (on both Android and iOS) is 12.19.0. iPhones don't have (I think) M1 chips, it's only on macBooks, iMac, and iPad Pro.
Ah right, it is A15 in the latest iphone.
Since you rebuilt the indexes, could you tell me what do you see for the record at seq 1381000 and 1381001 and 1381002? I'd like to know what those 3 rows should have been. And I'm curious what message has the key %XFqG...
Idea: even if we don't discover the culprit, we can still build a system to discover inconsistencies and automatically heal them. Like, if the number of entries in leveldb doesn't match the core indexes in jitdb, then rebuild parts of the leveldb index. (Of course, not fixing the actual culprit will then make it harder for us to debug the culprit, and this issue could linger on forever). But just an idea.
Those 3 are: %XFqGE3AfQT7tsoB8KnzTAas1xGgnncj0cxe7L/jhy1c=.sha256, %cBQmvYU179KsLKdoiemhkzaCq7GxW09yHtd2ClE68Vs=.sha256, %uBNhUX420oJmq0O/2b/YxmokP+jQSAYptyrIGnd3QDw=.sha256.
One thing I noticed is that if I get the messages for my profile I get only 1 message using the corrupted indexes. While clearing the whole thing I get > 8000. This is where I discovered the multiple indexes are corrupt. One thing I find strange is that we now have crc checks, meaning that this should not be any corruption in read/write, rather how they are extracted. Unless they were written before we added these cheks, do you remember if we did a reindex after deployed those?
What I also find really strange is that it seems that both level & the jitdb indexes are corrupt.
Oh, which jitdb indexes you detected were corrupted?
I tried running a query for my posts messages, I havn't gotten exactly to the bottom of it yet, but at least the author + post and one other index was corrupt.
One thing I find strange is that we now have crc checks, meaning that this should not be any corruption in read/write, rather how they are extracted. Unless they were written before we added these cheks, do you remember if we did a reindex after deployed those?
Here goes some detective work:
The last jitdb without CRC was jitdb@3.0.3
and ssb-db2@2.0.2
.
In Manyverse, we forced the rebuild of all leveldb indexes (not jitdb indexes) recently:
But we forced the rebuild of all jitdb indexes (with jitdb@3.1.3
and ssb-db2@2.1.2
) for Android only with this commit because we thought that (see issue https://github.com/ssb-ngi-pointer/jitdb/issues/149#issuecomment-819011645) this issue afflicted only Android with ext4:
https://github.com/staltz/manyverse/commit/82a65c58fc078622a4b024add405b620c7895a3c
Note, CRC didn't work on iOS, so I made this patch to jitdb and finally jitdb@3.1.4
supported CRC on iOS.
And then I updated the "rebuild all jitdb indexes" logic in Manyverse to also include iOS:
https://github.com/staltz/manyverse/commit/7814647e75e3c569eaad97cc8091215579671735
And I also updated jitdb@3.1.4
:
https://github.com/staltz/manyverse/commit/f033099efd07ccd26ca58979bcddc4048ffa3d56
Both of the commits above were shipped into Manyverse version 0.2105.11
, so I believe that iOS users had all of their indexes rebuilt and CRC-checked, but maybe something sneaky happened that I can't see?
That is interesting because I checked the seq numbers of my posts messages in dave's log and I can see that they go way back (they start at seq 8017). This means that they should have been reindexed last month when you made that release. I tried dumping the raw author index and can see that it is vastly different from my reindex on even the first seq. And this is a prefix index, so once a seq has been reindexed, it should never change. The file ISSUE_1328 must be correctly written otherwise that whole file would fail... Hmm
This means that they should have been reindexed last month when you made that release.
Just a clarification, I mentioned 0.2105.11
earlier, which means iOS should have had a full jitdb reindexing and CRC since May this year. You mentioned "last month", why?
The file ISSUE_1328 must be correctly written otherwise that whole file would fail... Hmm
Maybe it's possible that the file issue1328
wasn't written atomically, but that sounds impossible because it's a 0 byte file.
For what is worth (it looks it might), I updated my phone to the latest version last month.
iOS automatically copied the data content of the apps during the migration, so I didn’t have to install and sync Manyverse again.
I can confirm I noticed the issue after the migration to the new phone.
Very interesting! Thanks!
Just for more context, though, there is another user who is experiencing the same symptoms, so I think it's a bit more general issue, maybe they didn't change their phone too.
This means that they should have been reindexed last month when you made that release.
Just a clarification, I mentioned
0.2105.11
earlier, which means iOS should have had a full jitdb reindexing and CRC since May this year. You mentioned "last month", why?
Oh, I misread the version numbers. I thought 11 meant the month :)
The file ISSUE_1328 must be correctly written otherwise that whole file would fail... Hmm
Maybe it's possible that the file
issue1328
wasn't written atomically, but that sounds impossible because it's a 0 byte file.
That shouldn't happen and the folder should have been deleted before anyway.
I want to try later, to rebuild all of these indexes (both jitdb and leveldb) from the log, and then do a diff on all indexes and see which one went wrong first. Because it seems that once one is wrong, it could infect the other indexes with wrong data.
Can it be that the problem is that canDecrypt is not CRC-checked?
Sorry for the deleted message, I spoke to early, was running on my own profile and needed to check if it also is a problem with another one. So I tested cblgh and I see the same problem. I get the correct number of results by removing author + type post + canDecrypt index. In other words they are all 3 corrupted.
Can it be that the problem is that canDecrypt is not CRC-checked?
Still seems like we could do this, right? canDecrypt.index and encrypted.index are currently not CRC'd.
Sure that would be a good idea.
The corruption is really strange as it seems to happen to more or less everything as far as I can see.
The corruption is really strange as it seems to happen to more or less everything as far as I can see.
Can you detect whether a corruption is due to "bad data" coming from another corrupted index, or whether all of the corruptions are "original corruptions"? It would be easier for us if we detected that just canDecrypt.index and encrypted.index were corrupted and then all of the other indexes got infected with that.
But it could surely be a general filesystem-caused corruption.
I want to try later, to rebuild all of these indexes (both jitdb and leveldb) from the log, and then do a diff on all indexes and see which one went wrong first.
Hmm, I can't do this without unless I have Dave's secret keys (in order to create canDecrypt.index). I guess I could ask Dave to run some scripts using his secret key.
After further testing I'm not so sure that canDecrypt is actually corrupt. Let me do some more testing on the author + post that is for sure corrupt.
Yeah, it's actually hard to know if canDecrypt is corrupted unless we get Dave to rebuild it on desktop.
PS: Dave is gonna run this https://github.com/staltz/ssb-db2-issue-291
Oh my. It turns out we might have to be really careful with bipf when upgrading. I was testing with latest bipf + ssb-db2. Pinning db2 to 2.3.3 and bipf to 1.5.1 seems to produce the correct results for author + post. So at least that is good news. Maybe the problem is only level db then. It will be interesting to see you find running the script you linked.
Seeing as I'm really slow today 😩 can you tell me when this fix: https://github.com/ssb-ngi-pointer/ssb-db2/commit/557e8821f9e3483398c882dcbc735b10a9b477a0 was put into manyverse and the level folders reindexed?
It seems like 2.3.2 and 2.3.3 is not tagged in ssb-db2 git?
Updated to ssb-db2 2.3.2 in https://github.com/staltz/manyverse/commit/ebe4bf32675759aba0687e264b3f557e15444c04 and released as Manyverse 0.2110.22
Updated to ssb-db2 2.3.3 in https://github.com/staltz/manyverse/commit/41bb9ea24ad5884b61ecd1bf01f869235d621ca1 and released as Manyverse 0.2111.11
We made so many changes, I almost forgot half of them... ;)
The reason I was so confused is that the author index change for bendy butt bumps the version to 2, but that is only taken into consideration when updating the index ;-) So nothing related to bipf.
Stepping back I don't think this has anything to do with index corruption. It more smells like a bug similar to the level bug. Maybe related to pausing in push-stream, that is just a guess because I noticed there is also 1 message missing from the type index. Sequence 1382999. So the following query will give you the message:
const { seqs } = require('jitdb/operators')
ssb.db.query(
where(
and(
seqs([1382999]),
),
),
toCallback((err, msgs) => {
console.log(msgs[0])
ssb.close()
})
)
while this will not, until you rebuild:
const { seqs } = require('jitdb/operators')
ssb.db.query(
where(
and(
seqs([1382999]),
type('post'),
),
),
toCallback((err, msgs) => {
console.log(msgs[0])
ssb.close()
})
)
Good, we're getting closer.
I don't think this has anything to do with index corruption. It more smells like a bug similar to the level bug. Maybe related to pausing in push-stream, that is just a guess because I noticed there is also 1 message missing from the type index.
That's my intuition too. Off by one (or off by two) error seems like a logic bug somewhere.
Work-in-progress analysis of some indexes (works as note-to-self too):
status is about the diff between rebuilt indexes and bad indexes.
index | status | comment |
---|---|---|
seq.index | :heavy_check_mark: | good except CRC mismatch due to different amount of trailing zero bytes |
sequence.index | :heavy_check_mark: | good except CRC mismatch due to different amount of trailing zero bytes |
timestamp.index | :heavy_check_mark: | good except CRC mismatch due to different amount of trailing zero bytes |
canDecrypt.index | :heavy_check_mark: | |
encrypted.index | :construction: | wasn't rebuilt fully (just a few entries rebuilt) |
meta_.index | :x: | already the first bytes in the tarr body differ, but this may be due to me building this index with a random ssb-key that cannot decrypt messages, so it never finds the meta.private section |
value_author.32prefix | :x: | rebuilt has count=1384618 and bad has count=1384111 |
If you look only at the keys
leveldb index, then it's clear that the seq
was too much ahead of what it should have been. I've been thinking how is it possible that it was ahead, and that's only possible if this.processRecord(record, processedSeq)
below
this.onRecord = function onRecord(record, isLive) {
let changes = 0
if (record.offset > this.offset.value) {
if (record.value) this.processRecord(record, processedSeq)
changes = this.batch.length
processedSeq++
}
processedOffset = record.offset
if (changes > chunkSize) this.flush(thenMaybeReportError)
else if (isLive) liveFlush(thenMaybeReportError)
}
Was called with processedSeq
higher than what it should be. In what cases can processedSeq
be too much ahead?
I think only prior to this PR https://github.com/ssb-ngi-pointer/ssb-db2/pull/269
// 1st, persist the operations in the batch array
this.level.batch(
this.batch,
{ keyEncoding: this.keyEncoding, valueEncoding: this.valueEncoding },
(err2) => {
if (err2) return cb(err2)
if (this.level.isClosed()) return cb()
// 2nd, persist the META because it has its own valueEncoding
this.level.put(
META,
{ version, offset: processedOffsetAtFlush, processed: processedSeqAtFlush },
{ valueEncoding: 'json' },
(err3) => {
if (err3) cb(err3)
else {
this.offset.set(processedOffsetAtFlush)
cb()
}
}
)
}
)
In the code above, lets assume the app crashes after persisting the batch but before persisting the META. Then, next time it loads up, the processedSeq
will be behind of what it should be. For it to be ahead, we would need that concurrency problem to exist.
So this tells me that perhaps there is nothing new about this bug/issue, it's still the same concurrency issue that we patched. But perhaps the leveldb indexes weren't rebuilt on iOS, maybe this line didn't work on iOS (was it the /!(*.*)
syntax?):
I don't think there is a problem with the rimraf and /!(*.*)
unless iOS is doing something super weird. I ran the following code below (on my linux desktop) and it deleted the leveldb folders as it should:
const path = require('path');
const rimraf = require('rimraf');
rimraf.sync(path.join(__dirname, 'db2', 'indexes') + '/!(*.*)');
I also double checked it :) Can you check on iOS using a simulator?
This bug is really strange. Either it is something silly like the above, or it is something a lot more complex. I'll try to write up my thinking today.
If it is the complex case then here are some of the things I have tried and my thinking.
Seeing that we both have the problem in type index and key level index, it seems like it might be lower level than ssb-db2. And it is related to updating the indexes, meaning log.stream
. Luckily there are not that many places it happens.
JITDB:
SSB-DB2:
I tried a few things. I've been poking around in AAOL to see if I could see anything. Checked push-stream lightly.
I tried running a full reindex on the log and checking the sha1 sum on author and type index as the base case. Then ran a few reindexes killing the process while running. Next I ran a few reindexes with too hot enabled. In all cases they ended up with the same sha sum, meaning they are 100% identical to the base case.
One thing that I thought might be a bit sneaky is if, for some reason, the same values is returned twice from the underlying level. Then we might get these seq of by 1 or 2.
I ran the rimraf.sync check on iOS (iPad, not simulator), like this:
mkdirp.sync(path.join(process.env.SSB_DIR, 'foo', 'ax'))
mkdirp.sync(path.join(process.env.SSB_DIR, 'foo', 'bx'))
mkdirp.sync(path.join(process.env.SSB_DIR, 'foo', 'cx'))
console.log("BEFORE", fs.readdirSync(path.join(process.env.SSB_DIR, 'foo')))
rimraf.sync(path.join(process.env.SSB_DIR, 'foo') + '/!(*.*)')
console.log("AFTER", fs.readdirSync(path.join(process.env.SSB_DIR, 'foo')))
And the result was:
BEFORE [ 'ax', 'bx', 'cx' ]
AFTER []
So at least we can rule out the idea that iOS is doing rimraf.sync wrong.
One thing that I thought might be a bit sneaky is if, for some reason, the same values is returned twice from the underlying level.
What do you mean by "values"?
Also, it could be even lower-level, like a filesystem problem. Note that iOS has APFS.
One really needs to be careful with this debugging. I was wrong about type_post, I had run it with a problematic key. So type post is okay, the same is type vote and root index. I checked author again, as you also have identified, and it appears to go wrong around seq no 1384113. This means that a query for author @CxnSXWYjPT160y7QbmTFtWaWT09080azgErYPt1ZeZc=.ed25519
gives a different result with the corrupted database. 6071 versus 6095. Where did you get the count=1384111 from? I'm printing count from here.
Where did you get the count=1384111 from? I'm printing count from here.
I think I got it from here https://github.com/staltz/ssb-db2-issue-291/blob/main/study-prefix-map-diff.js
But one technique I've been using sometimes is
cat db2/indexes-bad/value_author.32prefix | xxd | head > bad.hex
To look at the file header and see the version | offset | count | crc | body
bytes
Ok, I got a report from Powersource that his (manyverse) desktop app is throwing some pranks too, it seems the db is corrupted there! I got the zip of his db2, I can share it with you later.
This would be interesting because it wouldn't be iOS-specific anymore.
Jacobs keys level has the same kind of bug. It is off by 38 instead of 2.
1021983 | 788241486 | %/DnmQa4xqgIn5qaKgOx8ioUrU8i+ybE+hBFhWCVAuq0=.sha256
1021984 | 788242170 | %5nd7p453VUbLtm/oiWTZC8GPzAKWDNGOxz9e0MWBcF0=.sha256
1021985 | 788242654 | %FWzbolvHxC1NtcYAVzXrVHb+EurmkfX+LR0SRH19oxU=.sha256 | %D7CEf5EgPxX5/sWWgD2JREfpU3zShi85+eFsAhprwkc=.sha256
1021986 | 788243123 | %07shhpv4L6wEWpZxZdm0e5zU2QYo+o76iOgeIelSj9s=.sha256 | %LCC8qCZsZbqjKnQ0t0g+T59l15dD4wzDlMiPq+KBanc=.sha256
1021987 | 788244160 | %xc83mi/928R2MfwNP7DKLB/y5VMb2QZrlaojcrd5jpI=.sha256 | %UtAI9IUZn+MHKAYvdkc+7LU/FRXPJIGlklxkYTUtGo8=.sha256
1021988 | 788245071 | %6mFJx4VxZGswkf89liwNXAS1g0u/7CZ1qKxroremvMg=.sha256 | %j5hkP6mW4uvu0x6ujP1nY4Jdf2nHbyS7wb/KEK9pIJM=.sha256
1021989 | 788245555 | %fwRDVfejokKNk9b+P9mEBEazvIudeLXkyh48LMa1bX0=.sha256 | %+oGNg87aNM5SR404xhKFwRI8R3HcE8VqO1t4vcotc/A=.sha256
from correct:
1022023 | 788267008 | %FWzbolvHxC1NtcYAVzXrVHb+EurmkfX+LR0SRH19oxU=.sha256
these are the 38 missing, when compared to the total
I have also noticed that his author index is correct, but instead seq.index
is broken. The following 42 offsets are missing:
788242654
788243123
788244160
788245071
788245555
788246056
788246525
788247018
788247502
788247995
788248717
788249210
788249992
788251454
788252162
788252646
788253121
788253918
788254393
788254868
788255352
788255836
788256320
788256813
788258017
788258497
788258981
788259957
788260906
788261331
788262059
788262543
788263012
788263505
788264349
788264818
788265302
788265795
I have a theory what might be have caused the problem with seq.index. It just doesn't explain the author corruption or the level bugs. What is a bit puzzling about the corrupt author prefix index is that, if you look at the prefixes after it is corrupted, they don't point to any known authors starts.
Studying Jacob's base
leveldb, I noticed there are only two wrong KVs:
key=@22YV3HX/BGClXH49ZebkcepqRpj7a9f5s0ddwhDSfxk=.ed25519
value={offset: 793494573, sequence: 1810}
key=@V+BlA4XU3ZmjKLHlyG22tfmt+d67x5vGzgnbN1EzGjw=.ed25519
value={offset: 793494080, sequence: 191}
Correct ones (they differ only by offset):
key=@22YV3HX/BGClXH49ZebkcepqRpj7a9f5s0ddwhDSfxk=.ed25519
value={offset: 788258981, sequence: 1810}
key=@V+BlA4XU3ZmjKLHlyG22tfmt+d67x5vGzgnbN1EzGjw=.ed25519
value={offset: 788265302, sequence: 191}
And the META for base
is:
Bad base:
Version: 2
Entries: 1032716
Offset: 796004667
Rebuilt base (differs only by entries):
Version: 2
Entries: 1032754
Offset: 796004667
38 is the diff.
Hmmmmmmmmmmmm good find. Why does Jacob has the same message multiple times?
Check this (I exposed the getMsgByOffset method):
ssb.db.getMsgByOffset(788258981, (err, msg) => {
console.log(788258981, msg)
})
ssb.db.getMsgByOffset(793494573, (err, msg) => {
console.log(7934945731, msg)
})
ssb.db.getMsgByOffset(788265302, (err, msg) => {
console.log(788265302, msg)
})
ssb.db.getMsgByOffset(793494080, (err, msg) => {
console.log(793494080, msg)
})
=>
788258981 {
key: '%LFYUCrJ6S2TKCB7o+2f8++xFRQXQYpZg++jpDXsQnFk=.sha256',
value: {
previous: '%wRGPk9ftm1BzYMCPbbf5jIDsZVTdyXsX06lZjIlueE8=.sha256',
sequence: 1810,
author: '@22YV3HX/BGClXH49ZebkcepqRpj7a9f5s0ddwhDSfxk=.ed25519',
timestamp: 1638438970621,
hash: 'sha256',
content: {
type: 'post',
root: '%juRWvyC7fH9YwdCdI0YXhjGHad+XXsZqdThCyFBmutM=.sha256',
fork: '%3rVlNUCs8OD6ceHGrRlcUz1A0BNvVQFHtFylGi2WLnA=.sha256',
branch: '%lM7Yum/PzzVmTBvaCQBZDYkgwe+gkmW/IQu4/+yhoB4=.sha256',
reply: [Object],
channel: null,
recps: null,
text: "Do you know about the snail electric fence method? I've seen it done with just some copper wire and a 9 V battery. Haven't tried it myself though.\n",
mentions: []
},
signature: 'cJKmGtkeSuJf4wg0AnuHs85iD9GAvVuiic494LDqZ5QVs9v+wGtoSqA89XOK2CEwE5hMzF80+nF+aOq3JMxMBw==.sig.ed25519'
},
timestamp: 1638625571799
}
7934945731 {
key: '%LFYUCrJ6S2TKCB7o+2f8++xFRQXQYpZg++jpDXsQnFk=.sha256',
value: {
previous: '%wRGPk9ftm1BzYMCPbbf5jIDsZVTdyXsX06lZjIlueE8=.sha256',
sequence: 1810,
author: '@22YV3HX/BGClXH49ZebkcepqRpj7a9f5s0ddwhDSfxk=.ed25519',
timestamp: 1638438970621,
hash: 'sha256',
content: {
type: 'post',
root: '%juRWvyC7fH9YwdCdI0YXhjGHad+XXsZqdThCyFBmutM=.sha256',
fork: '%3rVlNUCs8OD6ceHGrRlcUz1A0BNvVQFHtFylGi2WLnA=.sha256',
branch: '%lM7Yum/PzzVmTBvaCQBZDYkgwe+gkmW/IQu4/+yhoB4=.sha256',
reply: [Object],
channel: null,
recps: null,
text: "Do you know about the snail electric fence method? I've seen it done with just some copper wire and a 9 V battery. Haven't tried it myself though.\n",
mentions: []
},
signature: 'cJKmGtkeSuJf4wg0AnuHs85iD9GAvVuiic494LDqZ5QVs9v+wGtoSqA89XOK2CEwE5hMzF80+nF+aOq3JMxMBw==.sig.ed25519'
},
timestamp: 1638625912978
}
793494080 {
key: '%bKTVmbsjc9YqpZQrP+V94+KyGMQ0n3jSUrOZY1ADlD8=.sha256',
value: {
previous: '%cquXN1dXcOkKIocYhFHjo0XnDI0EZf3e5R+wyHZHxow=.sha256',
sequence: 191,
author: '@V+BlA4XU3ZmjKLHlyG22tfmt+d67x5vGzgnbN1EzGjw=.ed25519',
timestamp: 1636321596142,
hash: 'sha256',
content: { type: 'vote', channel: null, vote: [Object] },
signature: 'E9Hvxu41Gf+TvwjLMTPFPLWYS+9OJgxfaJYAZWxu/Obp1pQ91p9e8VthA279+2LaSP3BENnU++xHV3YSLjvwDQ==.sig.ed25519'
},
timestamp: 1638625912974
}
788265302 {
key: '%bKTVmbsjc9YqpZQrP+V94+KyGMQ0n3jSUrOZY1ADlD8=.sha256',
value: {
previous: '%cquXN1dXcOkKIocYhFHjo0XnDI0EZf3e5R+wyHZHxow=.sha256',
sequence: 191,
author: '@V+BlA4XU3ZmjKLHlyG22tfmt+d67x5vGzgnbN1EzGjw=.ed25519',
timestamp: 1636321596142,
hash: 'sha256',
content: { type: 'vote', channel: null, vote: [Object] },
signature: 'E9Hvxu41Gf+TvwjLMTPFPLWYS+9OJgxfaJYAZWxu/Obp1pQ91p9e8VthA279+2LaSP3BENnU++xHV3YSLjvwDQ==.sig.ed25519'
},
timestamp: 1638625571896
}
Wait, this is literally just on the log, no indexes involved?
Exactly
I wonder what came first, the corrupt indexes or double messages 🤔
Well, one thing we know for sure, that seq.index has a saving problem. We could push that fix and reset all indexes, and then wait for more months to pass by and see what happens.
Agree that it would be good with that fix. I'm just still trying to understand what is going on here.
I found some duplicate messages in Daves database as well (key, first offset, duplicate offset).
duplicate key %z/MDV4mT/XaiIibeP0C5vf+0tKLAcFxtMBDHLf5eu+E=.sha256 180348963 180749729
duplicate key %+hW4qOX4xuIO8QE9RTUAaHqL0bD28r4GyF/kYJLBz38=.sha256 180295907 180753120
duplicate key %/gzGlFUQBE3C6Kd1QfQSzZxG78IL4DmJssBIIdQCnlI=.sha256 180328614 180769814
duplicate key %NWG837v3J2qhjGmK9yEZvH7gZ34fzlEKfKjeuWXmRG8=.sha256 180371842 181260814
duplicate key %3Z/wvMZQlLWR+pSMn9yERhxwNU6jVrDQGqbvZV+sEZM=.sha256 180435314 245829508
duplicate key %rRUeWOTw1GQBHPJCwAsYSnuFCjRaLh/vafGZJnoA56o=.sha256 180472215 276944048
I checked all of Jacobs and there are only the 2 you already found.
First of all I have been really trying to break AAOL by threating it as a black block and throwing all kinds of random input at it. I havn't seen it break yet. So my hunch is that it must be somewhere else.
Good news is I think I have found something. Look at this line and imagine we have added a new plugin. In this case we will reindex, but most plugins will just skip. BUT they will update their processedOffset
thus lovering it compared to processedSeq
. If you then kill the app while some of them have updated, but NOT keys or base then there will be a mismatch!
This issue has the same symptoms as https://github.com/ssb-ngi-pointer/jitdb/issues/161, and seems to be happening only on iOS, I haven't gotten bug reports for Android yet.
I got the
db2
folder from Dave who has this problem and ran some scripts to study the data. Mostly, I'm concerned with thekeys
leveldb index, because when retrieving a message by a certain key, we get the wrong key.For all records on the log, I printed out:
If you look closely, there seems to be an off-by-two gap. Notice these in bold (and that seqs 1381000 and 1381001 are missing):
Also, if I simply load some index files, the metadata says (notice the
Entries
):