kriszyp / lmdb-js

Simple, efficient, ultra-fast, scalable data store wrapper for LMDB
Other
481 stars 39 forks source link

Segmentation fault in LZ4_compress_fast_continue on armv7l #174

Closed supernes closed 2 years ago

supernes commented 2 years ago

Hi there,

I'm trying to use parcel 2.6.0 on a Raspberry Pi, which includes LMDB-js as a dependency, but I get a segfault which I traced to LMDB/LZ4.

Version 2.2.4 of LMDB (parcel 2.5.0) worked without any issues. Version 2.3.10 (parcel 2.6.0) fails with the backtrace below. Version 2.4.5 (package.json override) crashes like 2.3.10.

NodeJS version is v16.15.1 Kernel version is 5.15.32 (armv7l) GLIBC version is 2.31

Thread 10 "node" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb273d400 (LWP 8918)]
0xb3aa6416 in LZ4_compress_fast_continue () from .../node_modules/@lmdb/lmdb-linux-arm/node.napi.glibc.node

#0  0xb3aa6416 in LZ4_compress_fast_continue () from .../node_modules/@lmdb/lmdb-linux-arm/node.napi.glibc.node
#1  0xb3abb650 in CompressionWorker::Execute() () from .../node_modules/@lmdb/lmdb-linux-arm/node.napi.glibc.node
#2  0x03a7c270 in zero_statbuf ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
$ ldd -v node_modules/@lmdb/lmdb-linux-arm/node.napi.glibc.node 
  linux-vdso.so.1 (0xbefac000)
  /usr/lib/arm-linux-gnueabihf/libarmmem-${PLATFORM}.so => /usr/lib/arm-linux-gnueabihf/libarmmem-v7l.so (0xb6eec000)
  libstdc++.so.6 => /lib/arm-linux-gnueabihf/libstdc++.so.6 (0xb6d64000)
  libm.so.6 => /lib/arm-linux-gnueabihf/libm.so.6 (0xb6cf5000)
  libgcc_s.so.1 => /lib/arm-linux-gnueabihf/libgcc_s.so.1 (0xb6cc8000)
  libpthread.so.0 => /lib/arm-linux-gnueabihf/libpthread.so.0 (0xb6c9c000)
  libc.so.6 => /lib/arm-linux-gnueabihf/libc.so.6 (0xb6b48000)
  /lib/ld-linux-armhf.so.3 (0xb6f64000)

Version information:
node_modules/@lmdb/lmdb-linux-arm/node.napi.glibc.node:
  ld-linux-armhf.so.3 (GLIBC_2.4) => /lib/ld-linux-armhf.so.3
  libc.so.6 (GLIBC_2.17) => /lib/arm-linux-gnueabihf/libc.so.6
  libc.so.6 (GLIBC_2.4) => /lib/arm-linux-gnueabihf/libc.so.6
  libgcc_s.so.1 (GCC_3.5) => /lib/arm-linux-gnueabihf/libgcc_s.so.1
  libstdc++.so.6 (CXXABI_1.3.9) => /lib/arm-linux-gnueabihf/libstdc++.so.6
  libstdc++.so.6 (GLIBCXX_3.4.11) => /lib/arm-linux-gnueabihf/libstdc++.so.6
  libstdc++.so.6 (CXXABI_1.3) => /lib/arm-linux-gnueabihf/libstdc++.so.6
  libstdc++.so.6 (GLIBCXX_3.4.21) => /lib/arm-linux-gnueabihf/libstdc++.so.6
  libstdc++.so.6 (GLIBCXX_3.4.14) => /lib/arm-linux-gnueabihf/libstdc++.so.6
  libstdc++.so.6 (GLIBCXX_3.4) => /lib/arm-linux-gnueabihf/libstdc++.so.6
  libpthread.so.0 (GLIBC_2.12) => /lib/arm-linux-gnueabihf/libpthread.so.0
  libpthread.so.0 (GLIBC_2.4) => /lib/arm-linux-gnueabihf/libpthread.so.0
kriszyp commented 2 years ago

@supernes thanks for the detailed report, this is helpful. I think the most obvious and likely pertinent difference between lmdb@2.2 and newer versions is that lmdb@2.2 did not actually include any prebuilt binaries for armv7l, and so this causes lmdb to be built/compiled from source when it is installed. So my best guess is that your locally built/compiled version of lmdb works, but the prebuilt binary that I cross-compiled is bad in some way (this would definitely not be the first time I have messed up cross-compilation!).

To test this, if you are willing, I believe you should be able to force npm/node-gyp to build/compile from source with npm install --build-from-source. Or more brute force, if you delete the node_modules/@lmdb folder (with lmdb-linux-arm), and run npm run install inside node_modules/lmdb, that should also trigger a build (and you should visibly see it compiling).

Here is the script that I am using for cross compilation for arm7, in case you have any insights or wisdom about how the cross compilation might be wrong: https://github.com/DoctorEvidence/lmdb-js/blob/master/.github/workflows/prebuild.yml#L86-L97

Anyway, I could certainly be wrong about this, could be a different issue, but this is my best guess for now. And this is really cool that you have this (an older version at least), running on raspberry pi, nice job!

supernes commented 2 years ago

Building it from source did not resolve the issue, and after a lot of digging I'm starting to think that the problem is actually on parcel's side, not lmdb.

I believe Compression::compressInstruction is being passed an invalid address or freed buffer, and LZ4_compress_generic_validated tries to write to a null pointer, hence the crash.

Bypassing parcel's HMR removes the segfaults, even with cache enabled.

Sorry for the bother 😃. Seems lmdb works just fine (when given valid buffers). I'll close the issue.

kriszyp commented 2 years ago

Ah, interesting. So I'm guessing you tried parcel 2.5 with latest lmdb to verify that that was working. And so perhaps the arm7 prebuilds are actually working, and I can keep them in place.

supernes commented 2 years ago

Parcel 2.5.0 + HMR still crashes with lmdb 2.3.x and 2.4.x. Seems to work fine without HMR. I'll clone the repo tomorrow and run tests to confirm that everything's alright with lmdb, and to give you a verdict on the prebuilds.

supernes commented 2 years ago

I ran some more tests and lmdb 2.4.5 crashes when running tests in some even more exotic ways (Bus error this time). This is both prebuilt and built from source.

I did try it on 64bit ARM (aarch64) on Fedora 34 and it works with the prebuilds.

My conclusion, for the time being, is that there's something wrong with the code on 32bit ARM. Even though the CPU on the Raspberry is 64bit, the default OS is still 32bit. No issues running parcel or lmdb on AWS a1.large (first-gen Graviton).

Unfortunately I can't dedicate any more time to this, and I'm a bit out of my depth trying to debug this. I did see some warnings when building, but don't know for sure if they're related.

I'll stick to a known working config with the older version for now, and maybe try to update the OS on the Raspberry later.

../src/writer.cpp: In static member function ‘static int WriteWorker::DoWrites(MDB_txn*, EnvWrap*, uint32_t*, WriteWorker*)’:
../src/writer.cpp:204:32: warning: comparison is always false due to limited range of data type [-Wtype-limits]
  204 |      if ((size_t)value.mv_data > 0x1000000000000)
      |          ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
../src/compression.cpp: In member function ‘void (* Compression::compress(MDB_val*, void (*)(MDB_val&)))(MDB_val&)’:
../src/compression.cpp:133:45: warning: right shift count >= width of type [-Wshift-count-overflow]
  133 |    compressedData[2] = (uint8_t)(dataLength >> 40u);
      |                                  ~~~~~~~~~~~^~~~~~
../src/compression.cpp:134:45: warning: right shift count >= width of type [-Wshift-count-overflow]
  134 |    compressedData[3] = (uint8_t)(dataLength >> 32u);
      |                                  ~~~~~~~~~~~^~~~~~
kriszyp commented 2 years ago

Turns out I was able to reproduce segfaults pretty quickly by testing lmdb-js on 32-bit node. And indeed I did track down an incorrect assignment of a 64-bit int to a (32-bit) pointer. This fix should be in v2.5.1, if you willing to give it a try (and cool to hear about running parcel/lmdb on a raspberry).

supernes commented 2 years ago

Just downloaded 2.5.1, but sadly it still crashes like the versions above 2.2.x. I tried running the test suite with the prebuild, and it fails almost immediately (after the first test case) with a SIGBUS ("Bus error").

Tried a debug rebuild, the test suite goes a couple of steps further, but still crashes.

Here's the output from running a debug build on node v.18.3.0 under GDB:

Starting program: /usr/local/lib/nodejs/node-v18.3.0-linux-armv7l/bin/node /home/pi/.local/bin/npm run test
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
[New Thread 0xb6be3400 (LWP 6266)]
[New Thread 0xb61ff400 (LWP 6267)]
[New Thread 0xb59fe400 (LWP 6268)]
[New Thread 0xb51fd400 (LWP 6269)]
[New Thread 0xb49fc400 (LWP 6270)]
[New Thread 0xb63e2400 (LWP 6271)]
[New Thread 0xb3bff400 (LWP 6272)]
[New Thread 0xb33fe400 (LWP 6273)]
[New Thread 0xb2bfd400 (LWP 6274)]
[New Thread 0xb23fc400 (LWP 6275)]

> lmdb@2.5.1 test
> mocha test/**.test.js --recursive && npm run test:types

[Attaching after Thread 0xb6fefac0 (LWP 6263) fork to child process 6276]
[New inferior 2 (process 6276)]
[Detaching after fork from parent process 6263]
[Inferior 1 (process 6263) detached]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
process 6276 is executing new program: /usr/bin/dash
[Attaching after process 6276 vfork to child process 6277]
[New inferior 3 (process 6277)]
[Detaching vfork parent process 6276 after child exec]
[Inferior 2 (process 6276) detached]
process 6277 is executing new program: /usr/bin/env
process 6277 is executing new program: /usr/local/lib/nodejs/node-v18.3.0-linux-armv7l/bin/node
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
[New Thread 0xb6be3400 (LWP 6278)]
[New Thread 0xb61ff400 (LWP 6279)]
[New Thread 0xb59fe400 (LWP 6280)]
[New Thread 0xb51fd400 (LWP 6281)]
[New Thread 0xb49fc400 (LWP 6282)]
[New Thread 0xb63e2400 (LWP 6283)]
[New Thread 0xb3bff400 (LWP 6284)]
[New Thread 0xb33fe400 (LWP 6285)]
[New Thread 0xb2bfd400 (LWP 6286)]
[New Thread 0xb23fc400 (LWP 6287)]

  lmdb-js
    Basic use
      ✔ will not open non-existent db with create disabled
      - 
      ✔ zero length values
      ✔ query of keys
      ✔ reverse query range
      ✔ more reverse query range
      ✔ clear between puts
      ✔ string

Thread 3.10 "node" received signal SIGBUS, Bus error.
[Switching to Thread 0xb2bfd400 (LWP 6286)]
0xb3d6b14c in WriteWorker::DoWrites (txn=0x41f7b48, envForTxn=0x40b9310, instruction=0x419b0b8, worker=0x4082408) at ../src/writer.cpp:234
234                     validated = validated && conditionalVersion == *((double*)conditionalValue.mv_data);
(gdb) bt
#0  0xb3d6b14c in WriteWorker::DoWrites (txn=0x41f7b48, envForTxn=0x40b9310, instruction=0x419b0b8, worker=0x4082408) at ../src/writer.cpp:234
#1  0xb3d6b84c in WriteWorker::Write (this=0x4082408) at ../src/writer.cpp:372
#2  0xb3d6a89c in AsyncWriteWorker::Execute (this=0x4082408, execution=...) at ../src/writer.cpp:77
#3  0xb3d70db4 in Napi::AsyncProgressWorker<char>::Execute (this=0x4082428) at ../node_modules/node-addon-api/napi-inl.h:5907
#4  0xb3d6e208 in Napi::AsyncWorker::OnExecute (this=0x4082428) at ../node_modules/node-addon-api/napi-inl.h:4895
#5  0xb3d6e1d0 in Napi::AsyncWorker::OnAsyncWorkExecute (env=0x41f3090, asyncworker=0x4082428) at ../node_modules/node-addon-api/napi-inl.h:4881
#6  0x011153fc in worker (arg=0x0) at ../deps/uv/src/threadpool.c:122
#7  0xb6d41300 in start_thread (arg=0xb2bfd400) at pthread_create.c:477
#8  0xb6cc5208 in ?? () at ../sysdeps/unix/sysv/linux/arm/clone.S:73 from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

And here's a memory snapshot at the time of the crash:

SIGBUS
kriszyp commented 2 years ago

That's disappointing, certainly thought the fix would help. Thank you for another detailed and helpful report. As far as the sigbus error with debugging, your details certainly clearly show what this issue: misaligned memory conversion of a double. I was actually aware of this hazard but from testing had concluded this was safe on all modern architectures (and it seemed fastest approach). Anyway, I think I can fix that particular error. However...

Even in 2.5.1, you still get a segmentation fault in LZ4? You didn't happen to see if 2.2.x passes the unit tests (the double conversion certainly shouldn't have changed since 2.2.x)? Anyway, its unfortunate I don't have a good way to test this, perhaps qemu might help.

supernes commented 2 years ago

I made a very barebones test script to check 2.5.1 compression, and it still segfaults, but I came across some strange behavior.

It only fails on large-ish values (around 32,000 bytes). If I set compression threshold low, and write small values, it seems to work OK. Also, the amount of data written before the crash is not always the same - cold/warm starts, IO and/or scheduling seem to affect the time at which it bails. It's always above 32K though (up to 45-50 KB on some runs).

Here's the script I used to stress it:

import { open } from 'lmdb';

let mydb = open({
  path: 'test-db',
  compression: true,
  threshold: 1
});

let index = 1, mult = 64;

process.on('SIGSEGV', _ => {
  console.error(`Segfault after ${index} iterations (${index*mult} bytes)`);
  process.kill(process.pid, 'SIGKILL');
});

while (index++ < 1000) {
  const newBuff = Buffer.alloc(index*mult);
  await mydb.put('test-key', newBuff);
}

And here's the output from running it a bunch of times:

$ for i in {1..10}; do node index.js; done
Segfault after 500 iterations (32000 bytes)
Segfault after 500 iterations (32000 bytes)
Segfault after 715 iterations (45760 bytes)
Segfault after 500 iterations (32000 bytes)
Segfault after 500 iterations (32000 bytes)
Segfault after 500 iterations (32000 bytes)
Segfault after 715 iterations (45760 bytes)
Segfault after 707 iterations (45248 bytes)
Segfault after 500 iterations (32000 bytes)
Segfault after 711 iterations (45504 bytes)

Here's a trace of the crash on a debug build of 2.5.1:

Thread 10 "node" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb2bfd400 (LWP 10589)]
0xb630fe50 in LZ4_read32 (memPtr=0x0) at ../dependencies/lz4/lib/lz4.c:336
336 static U32 LZ4_read32(const void* memPtr) { return *(const U32*) memPtr; }
(gdb) bt
#0  0xb630fe50 in LZ4_read32 (memPtr=0x0) at ../dependencies/lz4/lib/lz4.c:336
#1  0xb633513c in LZ4_hashPosition (tableType=byU32, p=0x0) at ../dependencies/lz4/lib/lz4.c:721
#2  LZ4_putPosition (srcBase=0xffff0000 "\001\004", tableType=byU32, tableBase=0xae109948, p=0x0) at ../dependencies/lz4/lib/lz4.c:763
#3  LZ4_compress_generic_validated (acceleration=1, dictIssue=dictSmall, dictDirective=usingExtDict, tableType=byU32, outputDirective=limitedOutput, maxOutputSize=33108, inputConsumed=0x0, 
    inputSize=32963, dest=0xae10d974 "Oŀ\200", source=0x0, cctx=0xae109948) at ../dependencies/lz4/lib/lz4.c:924
#4  LZ4_compress_generic (acceleration=1, dictIssue=<optimized out>, dictDirective=<optimized out>, tableType=byU32, outputDirective=<optimized out>, dstCapacity=33108, inputConsumed=0x0, srcSize=32963, 
    dst=0xae10d974 "Oŀ\200", src=0x0, cctx=0xae109948) at ../dependencies/lz4/lib/lz4.c:1277
#5  LZ4_compress_fast_continue (LZ4_stream=0xae109948, source=0x0, dest=0xae10d974 "Oŀ\200", inputSize=32963, maxOutputSize=33108, acceleration=1) at ../dependencies/lz4/lib/lz4.c:1628
#6  0xb635b934 in Compression::compress (this=0x40af340, value=0xb2bfcd68, freeValue=0x0) at ../src/compression.cpp:126
#7  0xb635b624 in Compression::compressInstruction (this=0x40af340, env=0x40ecda0, compressionAddress=0x4122908) at ../src/compression.cpp:94
#8  0xb635be10 in CompressionWorker::Execute (this=0x4161560) at ../src/compression.cpp:169
#9  0xb634d208 in Napi::AsyncWorker::OnExecute (this=0x4161560) at ../../node-addon-api/napi-inl.h:4895
#10 0xb634d1d0 in Napi::AsyncWorker::OnAsyncWorkExecute (env=0x4056250, asyncworker=0x4161560) at ../../node-addon-api/napi-inl.h:4881
#11 0x011153fc in worker (arg=0x0) at ../deps/uv/src/threadpool.c:122
#12 0xb6d41300 in start_thread (arg=0xb2bfd400) at pthread_create.c:477
#13 0xb6cc5208 in ?? () at ../sysdeps/unix/sysv/linux/arm/clone.S:73 from /lib/arm-linux-gnueabihf/libc.so.6

Running the built-in tests on 2.2.4 does indeed fail with a bus error, as you expected. Running the above script on 2.2.4 completes successfully, though - no segfaults.

As I've mentioned previously, lmdb seems to work just fine on 64bit systems. 32bit is a bit niche nowadays, and the Raspberry's OS has had an official 64bit release for a few months now. I think I'll try to install that and see if the problems go away, or just switch back to a cloud-based remote dev setup.

I'm not sure if anyone else is impacted by these issues, and whether it's worth the effort just to support this particular use case. Thank you for taking the time to look into it nonetheless!

kriszyp commented 2 years ago

Thank you for the great investigative effort, this certainly seems to provide some valuable clues. However, I am still at loss at what could be the root cause. Based on your detailed stack trace, it appears that there is a null/0 pointer being passed in as the address of the buffers to compress. I am curious if that is coming from the attempt to get the address of the buffer or lost when passing it into the compress function (or altered by another thread).

I have attempted to install qemu locally, but was never able to get node running on it. However, I was able to find and set up a qemu github action that seems to provide a full emulation of arm7 and does successfully run lmdb-js's test suite: https://github.com/kriszyp/lmdb-js/runs/6757147979?check_suite_focus=true I also added your script for adding larger buffers (with compression on), and that seems to pass fine as well. Perhaps the emulation misses some of the nuances of your actual ARM processor, not sure.

If you are interested in looking further, you could try latest master, I have added a specific check for null/0 pointer in the JS side that should indicate if this null pointer is originating from where we retrieve the buffer's address (it also includes a fix for misaligned memory access). Thanks!

supernes commented 2 years ago

Good news is that the SIGBUS issue seems fixed. I cloned master (69543e6), ran the test suite and it reaches the "big keys" case before segfaulting.

I managed to trace the segfault to this line. When mv_size is larger than 32K bytes, this conversion results in a null pointer: https://github.com/kriszyp/lmdb-js/blob/69543e65e55d137faf1def4daa8cf054e255258e/src/compression.cpp#L92

The instructions generated by the compiler look like this:

ldr            r3, [r11, #-48]  ; 0xffffffd0
sub            r3, r3, #8
vldr           d7, [r3]
vcvt.u32.f64   s15, d7
vmov           r3, s15
str            r3, [r11, #-32]  ; 0xffffffe0

Case 1: mv_size = 451 (no segfault)

Before conversion

r3             0x416d520           68605216
d7             {u8 = {0x0, 0x0, 0x0, 0x40, 0x37, 0x6c, 0x90, 0x41}, u16 = {0x0, 0x4000, 0x6c37, 0x4190}, u32 = {0x40000000, 0x41906c37}, u64 = 0x41906c3740000000, f32 = {0x2, 0x12}, f64 = 0x41b0dd0}

Memory pointed to by r3

0x416d520:  0x40000000  0x41906c37  0x00000001  0x00000000

After conversion

d7             {u8 = {0x0, 0x0, 0x0, 0x40, 0xd0, 0xd, 0x1b, 0x4}, u16 = {0x0, 0x4000, 0xdd0, 0x41b}, u32 = {0x40000000, 0x41b0dd0}, u64 = 0x41b0dd040000000, f32 = {0x2, 0x0}, f64 = 0x0}
s15            1.82265048e-36      (raw 0x041b0dd0)

mv_data pointer is then set to 0x041b0dd0

Case 2: mv_size = 38022 (segfault)

Before conversion

r3             0x417c0e8           68665576
d7             {u8 = {0x0, 0x0, 0x0, 0xfe, 0xc7, 0xb, 0xd3, 0xc1}, u16 = {0x0, 0xfe00, 0xbc7, 0xc1d3}, u32 = {0xfe000000, 0xc1d30bc7}, u64 = 0xc1d30bc7fe000000, f32 = {0x0, 0xffffffe6}, f64 = 0xffffffffb3d0e008}

Memory pointed to by r3

0x417c0e8:  0xfe000000  0xc1d30bc7  0x00000001  0x00000000

After conversion

d7             {u8 = {0x0, 0x0, 0x0, 0xfe, 0x0, 0x0, 0x0, 0x0}, u16 = {0x0, 0xfe00, 0x0, 0x0}, u32 = {0xfe000000, 0x0}, u64 = 0xfe000000, f32 = {0x0, 0x0}, f64 = 0x0}
s15            0                   (raw 0x00000000)

And hence mv_data points to (null)

Coincidentally, 32KiB is the per-core L1 data cache on the Cortex-A72 (ARMv8-A) CPU, so that may help shed some light on why exactly that data size triggers it. Strangely enough, if I evaluate the C++ line above in GDB, it gives the correct result. Maybe eval'ing it in the debug console issues different instructions, and that's why it doesn't convert to zero.

I can't quite wrap my head around the pointer arithmetic and the expected memory layout to say if there's an error in the math or if it's a problem with the double-to-int conversion. There are probably some compiler flags needed to adjust the behavior if it's the latter.

kriszyp commented 2 years ago

if I evaluate the C++ line above in GDB, it gives the correct result.

What is the correct result that it returns? By my math, the floating point value of 0xfe000000 0xc1d30bc7 (from the second run) is -1278156792 (-4c2f1ff8). I presume converting that to an unsigned number truncates to 0.

This floating point (double) value is assigned at https://github.com/kriszyp/lmdb-js/blob/master/write.js#L182 and is supposed to contain the address of your supplied buffer. I have been suspicious that perhaps the value is incorrect at this point, and the problem might be coming from getAddress. As you can see on line 180, I was trying to debug if the address is 0, although I hadn't considered that the address might be negative at that point. If you want to check that, you could change that check to address <= 0 or log all the addresses and see if the incoming one is in fact negative.

Anyway, thanks again for the great info!

kriszyp commented 2 years ago

My current theory is the once the pointer addresses get above 0x80000000 the conversions are triggering the sign bit and yielding negative values representing the address.

supernes commented 2 years ago

Here's an example from the "big keys" test case:

=> vcvt.u32.f64 s15, d7
-exec i r d7
d7             {u8 = ..., u16 = ..., u32 = {0xfe000000, 0xc1d39bdf}, u64 = ..., f32 = ..., f64 = 0xffffffffb1908008}

-exec x/2xw (compressionAddress-1)
0x4195a68:  0xfe000000  0xc1d39bdf

(void*)((size_t) * (compressionAddress - 1))
0xb1908008
-exec x/32cb 0xb1908008
0xb1908008: -38 '\332'  -108 '\224' -125 '\203' 65 'A'  65 'A'  65 'A'  65 'A'  65 'A'
0xb1908010: 65 'A'  65 'A'  65 'A'  65 'A'  65 'A'  65 'A'  65 'A'  65 'A'
0xb1908018: 65 'A'  65 'A'  65 'A'  65 'A'  65 'A'  65 'A'  65 'A'  65 'A'
0xb1908020: 65 'A'  65 'A'  65 'A'  65 'A'  65 'A'  65 'A'  65 'A'  65 'A'
kriszyp commented 2 years ago

Ok, so we are indeed hitting addresses like 0xb1908008 with the first bit set (the sign bit with signed numbers). And I believe it is undefined behavior (or implementation specific) how to convert a negative double to an unsigned int. I think there is decent chance my theory is correct. Will make a fix...

kriszyp commented 2 years ago

Ok, master should have this attempted fix. I will probably also cherry pick it for a patch release.

supernes commented 2 years ago

Just pulled 2b7bf12 and all tests pass. The stress-test script I wrote also runs without issues and with absurdly large buffers. I also added the latest commit as a dependency to parcel 2.6.0 and it runs, thus confirming it's a fix for the original issue. Will check again with the prebuilds after an official release.

Thank you for the awesome work, hope others can benefit from it as well!

kriszyp commented 2 years ago

These fixes are published in v2.5.2. Thank you again for all your help with this!