Closed staltz closed 3 years ago
@arj03 If you'd like to see exactly what those lines of code were, here's the 2.4MB index.js
that matches what those users are using in production: index.zip
Will have a look
@staltz so I'm looking at the error and it appears that something it sending down a NaN offset. This crashes the read. We should probably handle that in a nicer way.
I went looking in ssb-db2 to see if I could find something that could trigger this. I saw that you seem to have guarded against that in migrate.js. So I'm guessing you have seen a problem when newLog.since.value is NaN?
Maybe one way to handle this better is to at least send up an error in get, then we should have a better time localizing the error. And in general that seems like the right thing to do.
Aha that migrate check doesn't work:
offset = NaN
console.log(typeof offset !== 'number' || offset < 0)
=> false
You know what? When I wrote that check in migrate, I actually thought that a NaN could sneak in because the typeof doesn't cover it, but I dismissed the thought because "how could this possibly be a NaN?" Like, you need to divide by zero or something. And it should come from the since.value, which I have a hard time believing it could be NaN.
Sigh. Still don't know why the NaN happens but we can add a check for it anyways.
The migrate hunch was probably not correct. It's really annoying that random access file seems to use a queue that destroys the backtraces. With the updated async log it should at least not die there, instead the error should bubble up. Maybe a more likely case would be if you add undefined to a number, you also end up with NaN. Without any more stacktrace to go by its like looking for a needle in a haystack.
@staltz Question, do these backtraces include the debug() calls?
I still can't get this out of my head. There are actually not that many calls to get. Especially ones where there is a fs call right in front of it. The since is such a case, but I can't seem to see where since would end up being NaN. In this line if any of those 3 are undefined since will be NaN. Doesn't seem like it could be len because then the read would have failed, blockSize should be constant so no and if it was recordOffset then the readUint16LE right below it should fail before get has a chance to fail.
There is actually this old issue where you had this problem with jitdb. This was even with flumelog-aligned-offset. Still the same kind of error.
I'll check if there's more logs in those crash reports
I think I know what it might be now. In jitdb if the offset index is not saved properly but the other indexes are, then the lookup in offset will return undefined. Will try to reproduce.
Hello there old friend :-)
arj@arj:~/dev/jitdb$ rm indexes/seq.index
arj@arj:~/dev/jitdb$ touch indexes/seq.index
arj@arj:~/dev/jitdb$ node example.js asynclog.bipf
internal/buffer.js:75
throw new ERR_OUT_OF_RANGE(type || 'offset', 'an integer', value);
^
RangeError [ERR_OUT_OF_RANGE]: The value of "offset" is out of range. It must be an integer. Received NaN
And the latest async log now prints this error instead: Offset undefined is not a number
Great! I anyway couldn't find anything in the crash report logs, only Java-specific stuff (users would have to manually enable debug()
for us to get those in the logcat).
Got this crash report from Manyverse, so it means async-append-only-log version 3.0.1
App: se.manyver 0.2101.5-beta-googlePlay (95) Device: Xiaomi Redmi Note 4 (arm64-v8a | armeabi-v7a | armeabi) OS: Android 9 (SDK 28) User comment: the app just crashed
Manually interpreting the modules in the stack trace:
Update: And same crash for an entirely different user.
App: se.manyver 0.2101.5-beta-googlePlay (95) Device: Google Pixel 3a (arm64-v8a | armeabi-v7a | armeabi) OS: Android 11 (SDK 30) User comment: N/A
And another
App: se.manyver 0.2101.5-beta-googlePlay (95) Device: Google Pixel 5 (arm64-v8a | armeabi-v7a | armeabi) OS: Android 11 (SDK 30) User comment: crashes on startup every time