ssbc / async-append-only-log

A new append-only-log for SSB purposes
16 stars 6 forks source link

Crash: RangeError [ERR_OUT_OF_RANGE] #17

Closed staltz closed 3 years ago

staltz commented 3 years ago

Got this crash report from Manyverse, so it means async-append-only-log version 3.0.1

App: se.manyver 0.2101.5-beta-googlePlay (95) Device: Xiaomi Redmi Note 4 (arm64-v8a | armeabi-v7a | armeabi) OS: Android 9 (SDK 28) User comment: the app just crashed

The value of \\\"offset\\\" is out of range. It must be an integer. Received NaN
RangeError [ERR_OUT_OF_RANGE]: The value of \\\"offset\\\" is out of range. It must be an integer. Received NaN
    at boundsError (internal/buffer.js:75:11)
    at Buffer.readUInt16LE (internal/buffer.js:238:5)
    at getData (/data/data/se.manyver/files/nodejs-project/index.js:63060:25)
    at /data/data/se.manyver/files/nodejs-project/index.js:63075:7
    at Request._callback (/data/data/se.manyver/files/nodejs-project/index.js:63054:9)
    at Request.Trwq+KqeWzxsbUet+Omb+wC9uH2QMBcZNBUiXOCqCv4=.Request.callback (/data/data/se.manyver/files/nodejs-project/index.js:44790:8)
    at onread (/data/data/se.manyver/files/nodejs-project/index.js:75871:31)
    at FSReqCallback.wrapper [as oncomplete] (fs.js:520:5)

Manually interpreting the modules in the stack trace:

The value of \\\"offset\\\" is out of range. It must be an integer. Received NaN
RangeError [ERR_OUT_OF_RANGE]: The value of \\\"offset\\\" is out of range. It must be an integer. Received NaN
    at boundsError (internal/buffer.js:75:11)
    at Buffer.readUInt16LE (internal/buffer.js:238:5)
    at getData (async-append-only-log)
    at get (async-append-only-log)
    at Request._callback (raf.read callback in async-append-only-log)
    at Request.prototype.callback (random-access-storage)
    at RandomAccessFile.prototype._read callback onread (random-access-file)
    at FSReqCallback.wrapper [as oncomplete] (fs.js:520:5)

Update: And same crash for an entirely different user.

App: se.manyver 0.2101.5-beta-googlePlay (95) Device: Google Pixel 3a (arm64-v8a | armeabi-v7a | armeabi) OS: Android 11 (SDK 30) User comment: N/A


And another

App: se.manyver 0.2101.5-beta-googlePlay (95) Device: Google Pixel 5 (arm64-v8a | armeabi-v7a | armeabi) OS: Android 11 (SDK 30) User comment: crashes on startup every time

staltz commented 3 years ago

@arj03 If you'd like to see exactly what those lines of code were, here's the 2.4MB index.js that matches what those users are using in production: index.zip

arj03 commented 3 years ago

Will have a look

arj03 commented 3 years ago

@staltz so I'm looking at the error and it appears that something it sending down a NaN offset. This crashes the read. We should probably handle that in a nicer way.

I went looking in ssb-db2 to see if I could find something that could trigger this. I saw that you seem to have guarded against that in migrate.js. So I'm guessing you have seen a problem when newLog.since.value is NaN?

Maybe one way to handle this better is to at least send up an error in get, then we should have a better time localizing the error. And in general that seems like the right thing to do.

arj03 commented 3 years ago

Aha that migrate check doesn't work:

offset = NaN
console.log(typeof offset !== 'number' || offset < 0)
=> false
staltz commented 3 years ago

You know what? When I wrote that check in migrate, I actually thought that a NaN could sneak in because the typeof doesn't cover it, but I dismissed the thought because "how could this possibly be a NaN?" Like, you need to divide by zero or something. And it should come from the since.value, which I have a hard time believing it could be NaN.

Sigh. Still don't know why the NaN happens but we can add a check for it anyways.

arj03 commented 3 years ago

The migrate hunch was probably not correct. It's really annoying that random access file seems to use a queue that destroys the backtraces. With the updated async log it should at least not die there, instead the error should bubble up. Maybe a more likely case would be if you add undefined to a number, you also end up with NaN. Without any more stacktrace to go by its like looking for a needle in a haystack.

arj03 commented 3 years ago

@staltz Question, do these backtraces include the debug() calls?

I still can't get this out of my head. There are actually not that many calls to get. Especially ones where there is a fs call right in front of it. The since is such a case, but I can't seem to see where since would end up being NaN. In this line if any of those 3 are undefined since will be NaN. Doesn't seem like it could be len because then the read would have failed, blockSize should be constant so no and if it was recordOffset then the readUint16LE right below it should fail before get has a chance to fail.

arj03 commented 3 years ago

There is actually this old issue where you had this problem with jitdb. This was even with flumelog-aligned-offset. Still the same kind of error.

staltz commented 3 years ago

I'll check if there's more logs in those crash reports

arj03 commented 3 years ago

I think I know what it might be now. In jitdb if the offset index is not saved properly but the other indexes are, then the lookup in offset will return undefined. Will try to reproduce.

arj03 commented 3 years ago

Hello there old friend :-)

arj@arj:~/dev/jitdb$ rm indexes/seq.index 
arj@arj:~/dev/jitdb$ touch indexes/seq.index
arj@arj:~/dev/jitdb$ node example.js asynclog.bipf
internal/buffer.js:75
    throw new ERR_OUT_OF_RANGE(type || 'offset', 'an integer', value);
    ^

RangeError [ERR_OUT_OF_RANGE]: The value of "offset" is out of range. It must be an integer. Received NaN
arj03 commented 3 years ago

And the latest async log now prints this error instead: Offset undefined is not a number

staltz commented 3 years ago

Great! I anyway couldn't find anything in the crash report logs, only Java-specific stuff (users would have to manually enable debug() for us to get those in the logcat).