Closed JayKickliter closed 3 years ago
On a couple runs with with ASan it reported double free:
=================================================================
==30946==ERROR: AddressSanitizer: attempting double-free on 0x7f77284801d8 in thread T4 (1_scheduler):
#0 0x7f776cfb97a8 in __interceptor_free (/usr/lib/gcc/x86_64-linux-gnu/7/libasan.so+0xde7a8)
#1 0x7f77227e6afe in xor8_free /home/jay/repos/exor_filter/c_src/xorfilter.h:165
#2 0x7f77227e6afe in destroy_xor8_filter_resource exor_filter/c_src/xor_filter_nif.c:69
#3 0x5612138c53dc in run_resource_dtor beam/erl_nif.c:2529
#4 0x561213781df1 in handle_misc_aux_work beam/erl_process.c:1853
#5 0x561213781df1 in handle_aux_work beam/erl_process.c:2638
#6 0x56121377d7bd in scheduler_wait beam/erl_process.c:3402
#7 0x56121377cb6a in erts_schedule beam/erl_process.c:9543
#8 0x56121376f409 in process_main beam/beam_emu.c:684
#9 0x56121378ba38 in sched_thread_func beam/erl_process.c:8498
#10 0x56121397e79c in thr_wrapper pthread/ethread.c:118
#11 0x7f776c2da6da in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da)
#12 0x7f776bdfba3e in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x121a3e)
0x7f77284801d8 is located 472 bytes inside of 32768-byte region [0x7f7728480000,0x7f7728488000)
allocated by thread T0 here:
#0 0x7f776cfba790 in posix_memalign (/usr/lib/gcc/x86_64-linux-gnu/7/libasan.so+0xdf790)
#1 0x5612138ecddb in erts_sys_aligned_alloc sys/unix/sys.c:819
#2 0x5612137b5951 in erts_alcu_sys_alloc beam/erl_alloc_util.c:1111
#3 0x5612137b8eed in create_carrier beam/erl_alloc_util.c:4262
#4 0x5612137c1af2 in erts_alcu_start beam/erl_alloc_util.c:6782
#5 0x5612138d2c78 in erts_aoffalc_start beam/erl_ao_firstfit_alloc.c:343
#6 0x5612137aad57 in start_au_allocator beam/erl_alloc.c:1130
#7 0x5612137abbbe in erts_alloc_init beam/erl_alloc.c:888
#8 0x5612137c6698 in early_init beam/erl_init.c:1181
#9 0x5612137c6a16 in erl_start beam/erl_init.c:1266
#10 0x561213766b22 in main sys/unix/erl_main.c:30
#11 0x7f776bcfbb96 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b96)
Thread T4 (1_scheduler) created by T0 here:
#0 0x7f776cf12d2f in __interceptor_pthread_create (/usr/lib/gcc/x86_64-linux-gnu/7/libasan.so+0x37d2f)
#1 0x56121397ea12 in ethr_thr_create pthread/ethread.c:400
SUMMARY: AddressSanitizer: double-free (/usr/lib/gcc/x86_64-linux-gnu/7/libasan.so+0xde7a8) in __interceptor_free
==30946==ABORTING```
Interesting, thank you for the detailed bug report. Running it ~10 times seems like it does show up intermittently, xor16 tests only. Will look into this more this eve. Have you ran into this while using the library or just with the test suite?
I have only run this under ASan with the test suite, and not in an erlang application.
Some background on how I discovered this: filter serialization [to,from]_bin
returns non-deterministic binaries. They can round trip [de]serialize just fine, but two identical filters serialize to different binaries. I figured it had something to do with uninitialized memory, and it turns out it does. Changing
#define malloc(size) enif_alloc(size)
to
#define malloc(size) calloc(size)
fixes the nondeterministic binary problem.
Don't get hung up on the above. It isn't (obviously? directly?) related to the issue I filed, but it's how I discovered it and may speak to wider memory management bugs.
I can see where the double free error can occur. destroy_xor8_filter_resource
does not check for null before calling free. should be an easy enough fix. I'll see what I can do tonight and run the suite extensively to check for more segfaults.
Running it ~10 times seems like it does show up intermittently, xor16 tests only.
Strange. It fails about 70% of the time for me.
We run all our c NIF modules CI tests with GH workflows and ASan enabled. If you're interested in setting that up and want some guidance, please feel free to reach out and @ me.
We do enif_release_resource(filter)
to have the VM own the binary. It could be freed sometimes before we call out to destroy. I think this is the source of the non-determinism.
I was always meaning to setup that up. Might be the perfect time. I'll let you know if I run into any issues. Thanks.
Here is the minimum reproduction:
$ ./rebar3 shell
1> Filter0 = xor16:new_empty().
{builder,#Ref<0.136826599.2258763777.81971>}
2> q().
ok
3> Segmentation fault (core dumped)
It looks like problem began with 037900f10f2fcf0edd0c9e448a474720b44152f9.
The error:
Compiling and running the tests with address santizer (ASan) gives us a little more info.
Modifications:
Errors (they are non-deterministic and change from run to run):
Commenting out malloc and free macros in
xor_filter_nif.c
gives more insightful info, but I don't whether or not I introduced a new bug causing the NIF to
free
memory allocated withenif_alloc
or vice versa: