Closed dfn-certling closed 4 years ago
Just forgot to add this: we didn't see these errors with 0.24. They started once we upgraded to 0.33.
If it's not directly reproducible with a single lookup of a particular CIDR string and instead somehow related to previous operations trashing memory at some point, then good way to try to catch that may be running when compiled with AddressSanitizer:
# Example of how to compile (may use -O1 instead of -O0, if desired):
CFLAGS="-fsanitize=address,undefined -fno-omit-frame-pointer -fno-optimize-sibling-calls -O0" CXXFLAGS="-fsanitize=address,undefined -fno-omit-frame-pointer -fno-optimize-sibling-calls -O0" ./configure --enable-debug
( cd build && make )
# Example of how to run:
LD_PRELOAD=$(gcc -print-file-name=libasan.so) ASAN_OPTIONS='detect_leaks=0' PYTHONPATH=$(pwd)/build python -c 'import SubnetTree; s = SubnetTree.SubnetTree(); print("xxx" in s)'
Otherwise, haven't been able to read much into that call stack. There is this extra paranoid return value checking that may be worth adding/testing to see if it triggers:
diff --git a/include/SubnetTree.h b/include/SubnetTree.h
index 643837d..0ce9d00 100644
--- a/include/SubnetTree.h
+++ b/include/SubnetTree.h
@@ -22,12 +22,22 @@ extern "C" {
return NULL;
}
- PyBytes_AsStringAndSize(ascii, &$1, &len);
+ if ( PyBytes_AsStringAndSize(ascii, &$1, &len) == -1 )
+ {
+ PyErr_SetString(PyExc_TypeError, "Failed to convert CIDR to null-terminated string");
+ return NULL;
+ }
+
$2 = len;
}
else if ( PyBytes_Check($input) )
{
- PyBytes_AsStringAndSize($input, &$1, &len);
+ if ( PyBytes_AsStringAndSize($input, &$1, &len) == -1 )
+ {
+ PyErr_SetString(PyExc_TypeError, "Failed to convert CIDR to null-terminated string");
+ return NULL;
+ }
+
$2 = len;
}
else
but after reading through the code, don't yet see how the strchr()
at https://github.com/zeek/pysubnettree/blob/bff5ed2976ee385f0b52195f6e9ee03068f6573e/SubnetTree.cc#L56-L59 ends up reading from invalid address, since the other possible situations seem to be handled right: the PyBytes_AsStringAndSize()
calls have an error and leave the value of the swig-initialized-null-pointer alone, or else set it to point at the intended, null-terminated string.
Thanks. I'll test a build with your suggestions and get back as soon as I see something.
I caught the error with AddressSanitizer. Unfortunately the output doesn't look too helpful to me:
AddressSanitizer:DEADLYSIGNAL
=================================================================
==10507==ERROR: AddressSanitizer: SEGV on unknown address 0x7f68b7ea4050 (pc 0x7f68c9cf82b3 bp 0x7ffdd3592fe0 sp 0x7ffdd3592758 T0)
==10507==The signal is caused by a READ memory access.
#0 0x7f68c9cf82b2 in __strchr_sse2 (/lib64/libc.so.6+0x9b2b2)
#1 0x7f68ca83bd5c (/usr/lib64/libasan.so.5+0xa6d5c)
#2 0x7f68bcf295a2 (/usr/lib64/python3.6/site-packages/_SubnetTree.cpython-36m-x86_64-linux-gnu.so+0x2b5a2)
#3 0x7f68bcf2b770 in SubnetTree::lookup(char const*, int) const (/usr/lib64/python3.6/site-packages/_SubnetTree.cpython-36m-x86_64-linux-gnu.so+0x2d770)
#4 0x7f68bcf43f96 (/usr/lib64/python3.6/site-packages/_SubnetTree.cpython-36m-x86_64-linux-gnu.so+0x45f96)
#5 0x7f68bcf4eef9 (/usr/lib64/python3.6/site-packages/_SubnetTree.cpython-36m-x86_64-linux-gnu.so+0x50ef9)
#6 0x7f68bcf4f9a5 (/usr/lib64/python3.6/site-packages/_SubnetTree.cpython-36m-x86_64-linux-gnu.so+0x519a5)
#7 0x7f68ca36861e in PyCFunction_Call (/usr/lib64/libpython3.6m.so.1.0+0x13161e)
#8 0x7f68ca3b9a40 in _PyEval_EvalFrameDefault (/usr/lib64/libpython3.6m.so.1.0+0x182a40)
#9 0x7f68ca3bd473 in _PyFunction_FastCallDict (/usr/lib64/libpython3.6m.so.1.0+0x186473)
#10 0x7f68ca331c3d in _PyObject_FastCallDict (/usr/lib64/libpython3.6m.so.1.0+0xfac3d)
#11 0x7f68ca3324e0 in _PyObject_Call_Prepend (/usr/lib64/libpython3.6m.so.1.0+0xfb4e0)
#12 0x7f68ca331b6b in _PyObject_FastCallDict (/usr/lib64/libpython3.6m.so.1.0+0xfab6b)
#13 0x7f68ca37b4a5 (/usr/lib64/libpython3.6m.so.1.0+0x1444a5)
#14 0x7f68ca3b86c0 in _PyEval_EvalFrameDefault (/usr/lib64/libpython3.6m.so.1.0+0x1816c0)
#15 0x7f68ca3bbe2a (/usr/lib64/libpython3.6m.so.1.0+0x184e2a)
#16 0x7f68ca3bb7f5 (/usr/lib64/libpython3.6m.so.1.0+0x1847f5)
#17 0x7f68ca3b5541 in _PyEval_EvalFrameDefault (/usr/lib64/libpython3.6m.so.1.0+0x17e541)
#18 0x7f68ca3bbac9 (/usr/lib64/libpython3.6m.so.1.0+0x184ac9)
#19 0x7f68ca3bb7f5 (/usr/lib64/libpython3.6m.so.1.0+0x1847f5)
#20 0x7f68ca3b47c9 in _PyEval_EvalFrameDefault (/usr/lib64/libpython3.6m.so.1.0+0x17d7c9)
#21 0x7f68ca3bbac9 (/usr/lib64/libpython3.6m.so.1.0+0x184ac9)
#22 0x7f68ca3bb7f5 (/usr/lib64/libpython3.6m.so.1.0+0x1847f5)
#23 0x7f68ca3b47c9 in _PyEval_EvalFrameDefault (/usr/lib64/libpython3.6m.so.1.0+0x17d7c9)
#24 0x7f68ca3bbac9 (/usr/lib64/libpython3.6m.so.1.0+0x184ac9)
#25 0x7f68ca3bb7f5 (/usr/lib64/libpython3.6m.so.1.0+0x1847f5)
#26 0x7f68ca3b47c9 in _PyEval_EvalFrameDefault (/usr/lib64/libpython3.6m.so.1.0+0x17d7c9)
#27 0x7f68ca3b3a50 in PyEval_EvalCodeEx (/usr/lib64/libpython3.6m.so.1.0+0x17ca50)
#28 0x7f68ca3472bd (/usr/lib64/libpython3.6m.so.1.0+0x1102bd)
#29 0x7f68ca331f9a in PyObject_Call (/usr/lib64/libpython3.6m.so.1.0+0xfaf9a)
#30 0x7f68ca3b5ee0 in _PyEval_EvalFrameDefault (/usr/lib64/libpython3.6m.so.1.0+0x17eee0)
#31 0x7f68ca3b411f in PyEval_EvalCodeEx (/usr/lib64/libpython3.6m.so.1.0+0x17d11f)
#32 0x7f68ca3472bd (/usr/lib64/libpython3.6m.so.1.0+0x1102bd)
#33 0x7f68ca331f9a in PyObject_Call (/usr/lib64/libpython3.6m.so.1.0+0xfaf9a)
#34 0x7f68ca3b5ee0 in _PyEval_EvalFrameDefault (/usr/lib64/libpython3.6m.so.1.0+0x17eee0)
#35 0x7f68ca3bc1d4 (/usr/lib64/libpython3.6m.so.1.0+0x1851d4)
#36 0x7f68ca3bb7f5 (/usr/lib64/libpython3.6m.so.1.0+0x1847f5)
#37 0x7f68ca3b47c9 in _PyEval_EvalFrameDefault (/usr/lib64/libpython3.6m.so.1.0+0x17d7c9)
#38 0x7f68ca3bbe2a (/usr/lib64/libpython3.6m.so.1.0+0x184e2a)
#39 0x7f68ca3bb7f5 (/usr/lib64/libpython3.6m.so.1.0+0x1847f5)
#40 0x7f68ca3b5541 in _PyEval_EvalFrameDefault (/usr/lib64/libpython3.6m.so.1.0+0x17e541)
#41 0x7f68ca3b3a50 in PyEval_EvalCodeEx (/usr/lib64/libpython3.6m.so.1.0+0x17ca50)
#42 0x7f68ca3472bd (/usr/lib64/libpython3.6m.so.1.0+0x1102bd)
#43 0x7f68ca331f9a in PyObject_Call (/usr/lib64/libpython3.6m.so.1.0+0xfaf9a)
#44 0x7f68ca3b5ee0 in _PyEval_EvalFrameDefault (/usr/lib64/libpython3.6m.so.1.0+0x17eee0)
#45 0x7f68ca3bc1d4 (/usr/lib64/libpython3.6m.so.1.0+0x1851d4)
#46 0x7f68ca3bb7f5 (/usr/lib64/libpython3.6m.so.1.0+0x1847f5)
#47 0x7f68ca3b47c9 in _PyEval_EvalFrameDefault (/usr/lib64/libpython3.6m.so.1.0+0x17d7c9)
#48 0x7f68ca3bd473 in _PyFunction_FastCallDict (/usr/lib64/libpython3.6m.so.1.0+0x186473)
#49 0x7f68ca331c3d in _PyObject_FastCallDict (/usr/lib64/libpython3.6m.so.1.0+0xfac3d)
#50 0x7f68ca3324e0 in _PyObject_Call_Prepend (/usr/lib64/libpython3.6m.so.1.0+0xfb4e0)
#51 0x7f68ca331f9a in PyObject_Call (/usr/lib64/libpython3.6m.so.1.0+0xfaf9a)
#52 0x7f68ca37a0bc (/usr/lib64/libpython3.6m.so.1.0+0x1430bc)
#53 0x7f68ca37799d (/usr/lib64/libpython3.6m.so.1.0+0x14099d)
#54 0x7f68ca331b6b in _PyObject_FastCallDict (/usr/lib64/libpython3.6m.so.1.0+0xfab6b)
#55 0x7f68ca3bb86c (/usr/lib64/libpython3.6m.so.1.0+0x18486c)
#56 0x7f68ca3b47c9 in _PyEval_EvalFrameDefault (/usr/lib64/libpython3.6m.so.1.0+0x17d7c9)
#57 0x7f68ca3b3a50 in PyEval_EvalCodeEx (/usr/lib64/libpython3.6m.so.1.0+0x17ca50)
#58 0x7f68ca3b377a in PyEval_EvalCode (/usr/lib64/libpython3.6m.so.1.0+0x17c77a)
#59 0x7f68ca436491 (/usr/lib64/libpython3.6m.so.1.0+0x1ff491)
#60 0x7f68ca4369ec in PyRun_FileExFlags (/usr/lib64/libpython3.6m.so.1.0+0x1ff9ec)
#61 0x7f68ca4366b5 in PyRun_SimpleFileExFlags (/usr/lib64/libpython3.6m.so.1.0+0x1ff6b5)
#62 0x7f68ca43d6e4 in Py_Main (/usr/lib64/libpython3.6m.so.1.0+0x2066e4)
#63 0x56276cd33de4 in main (/usr/bin/python3.6+0xde4)
#64 0x7f68c9c81349 in __libc_start_main (/lib64/libc.so.6+0x24349)
#65 0x56276cd33f59 in _start (/usr/bin/python3.6+0xf59)
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/lib64/libc.so.6+0x9b2b2) in __strchr_sse2
==10507==ABORTING
Thanks, I guess it pointing to the same place as before at least helps rule out some memory corruption theories, but yeah, still stumped. Can you try running the topic/jsiwek/gh-19
branch and see if it's stable?
https://github.com/zeek/pysubnettree/tree/topic/jsiwek/gh-19
The new guess being that Python is returning a pointer to some location that violates an expectation of the vectorized SSE2 implementation of strchr()
. If that's the case, then stepping through one byte at a time will work. Otherwise, no other guesses for why it seems like Python is returning a pointer to completely bogus/unusable location.
Built and being deployed.
I saw that you included the return value checking in the branch. I forgot to mention that that was also included in the address sanitized build that I used before. The branch is now also built with address sanitizer.
Crashed again =(:
AddressSanitizer:DEADLYSIGNAL
=================================================================
==32272==ERROR: AddressSanitizer: SEGV on unknown address 0x7f2bb7614050 (pc 0x7f2bbd1a96cb bp 0x7fffd3feec30 sp 0x7fffd3feeaf0 T0)
==32272==The signal is caused by a READ memory access.
#0 0x7f2bbd1a96ca (/usr/lib64/python3.6/site-packages/_SubnetTree.cpython-36m-x86_64-linux-gnu.so+0x2b6ca)
#1 0x7f2bbd1ab8de in SubnetTree::lookup(char const*, int) const (/usr/lib64/python3.6/site-packages/_SubnetTree.cpython-36m-x86_64-linux-gnu.so+0x2d8de)
#2 0x7f2bbd1c4104 (/usr/lib64/python3.6/site-packages/_SubnetTree.cpython-36m-x86_64-linux-gnu.so+0x46104)
#3 0x7f2bbd1cf067 (/usr/lib64/python3.6/site-packages/_SubnetTree.cpython-36m-x86_64-linux-gnu.so+0x51067)
#4 0x7f2bbd1cfb13 (/usr/lib64/python3.6/site-packages/_SubnetTree.cpython-36m-x86_64-linux-gnu.so+0x51b13)
#5 0x7f2bca5b061e in PyCFunction_Call (/usr/lib64/libpython3.6m.so.1.0+0x13161e)
#6 0x7f2bca601a40 in _PyEval_EvalFrameDefault (/usr/lib64/libpython3.6m.so.1.0+0x182a40)
#7 0x7f2bca605473 in _PyFunction_FastCallDict (/usr/lib64/libpython3.6m.so.1.0+0x186473)
#8 0x7f2bca579c3d in _PyObject_FastCallDict (/usr/lib64/libpython3.6m.so.1.0+0xfac3d)
#9 0x7f2bca57a4e0 in _PyObject_Call_Prepend (/usr/lib64/libpython3.6m.so.1.0+0xfb4e0)
#10 0x7f2bca579b6b in _PyObject_FastCallDict (/usr/lib64/libpython3.6m.so.1.0+0xfab6b)
#11 0x7f2bca5c34a5 (/usr/lib64/libpython3.6m.so.1.0+0x1444a5)
#12 0x7f2bca6006c0 in _PyEval_EvalFrameDefault (/usr/lib64/libpython3.6m.so.1.0+0x1816c0)
#13 0x7f2bca603e2a (/usr/lib64/libpython3.6m.so.1.0+0x184e2a)
#14 0x7f2bca6037f5 (/usr/lib64/libpython3.6m.so.1.0+0x1847f5)
#15 0x7f2bca5fd541 in _PyEval_EvalFrameDefault (/usr/lib64/libpython3.6m.so.1.0+0x17e541)
#16 0x7f2bca603ac9 (/usr/lib64/libpython3.6m.so.1.0+0x184ac9)
#17 0x7f2bca6037f5 (/usr/lib64/libpython3.6m.so.1.0+0x1847f5)
#18 0x7f2bca5fc7c9 in _PyEval_EvalFrameDefault (/usr/lib64/libpython3.6m.so.1.0+0x17d7c9)
#19 0x7f2bca603ac9 (/usr/lib64/libpython3.6m.so.1.0+0x184ac9)
#20 0x7f2bca6037f5 (/usr/lib64/libpython3.6m.so.1.0+0x1847f5)
#21 0x7f2bca5fc7c9 in _PyEval_EvalFrameDefault (/usr/lib64/libpython3.6m.so.1.0+0x17d7c9)
#22 0x7f2bca603ac9 (/usr/lib64/libpython3.6m.so.1.0+0x184ac9)
#23 0x7f2bca6037f5 (/usr/lib64/libpython3.6m.so.1.0+0x1847f5)
#24 0x7f2bca5fc7c9 in _PyEval_EvalFrameDefault (/usr/lib64/libpython3.6m.so.1.0+0x17d7c9)
#25 0x7f2bca5fba50 in PyEval_EvalCodeEx (/usr/lib64/libpython3.6m.so.1.0+0x17ca50)
#26 0x7f2bca58f2bd (/usr/lib64/libpython3.6m.so.1.0+0x1102bd)
#27 0x7f2bca579f9a in PyObject_Call (/usr/lib64/libpython3.6m.so.1.0+0xfaf9a)
#28 0x7f2bca5fdee0 in _PyEval_EvalFrameDefault (/usr/lib64/libpython3.6m.so.1.0+0x17eee0)
#29 0x7f2bca5fc11f in PyEval_EvalCodeEx (/usr/lib64/libpython3.6m.so.1.0+0x17d11f)
#30 0x7f2bca58f2bd (/usr/lib64/libpython3.6m.so.1.0+0x1102bd)
#31 0x7f2bca579f9a in PyObject_Call (/usr/lib64/libpython3.6m.so.1.0+0xfaf9a)
#32 0x7f2bca5fdee0 in _PyEval_EvalFrameDefault (/usr/lib64/libpython3.6m.so.1.0+0x17eee0)
#33 0x7f2bca6041d4 (/usr/lib64/libpython3.6m.so.1.0+0x1851d4)
#34 0x7f2bca6037f5 (/usr/lib64/libpython3.6m.so.1.0+0x1847f5)
#35 0x7f2bca5fc7c9 in _PyEval_EvalFrameDefault (/usr/lib64/libpython3.6m.so.1.0+0x17d7c9)
#36 0x7f2bca603e2a (/usr/lib64/libpython3.6m.so.1.0+0x184e2a)
#37 0x7f2bca6037f5 (/usr/lib64/libpython3.6m.so.1.0+0x1847f5)
#38 0x7f2bca5fd541 in _PyEval_EvalFrameDefault (/usr/lib64/libpython3.6m.so.1.0+0x17e541)
#39 0x7f2bca5fba50 in PyEval_EvalCodeEx (/usr/lib64/libpython3.6m.so.1.0+0x17ca50)
#40 0x7f2bca58f2bd (/usr/lib64/libpython3.6m.so.1.0+0x1102bd)
#41 0x7f2bca579f9a in PyObject_Call (/usr/lib64/libpython3.6m.so.1.0+0xfaf9a)
#42 0x7f2bca5fdee0 in _PyEval_EvalFrameDefault (/usr/lib64/libpython3.6m.so.1.0+0x17eee0)
#43 0x7f2bca6041d4 (/usr/lib64/libpython3.6m.so.1.0+0x1851d4)
#44 0x7f2bca6037f5 (/usr/lib64/libpython3.6m.so.1.0+0x1847f5)
#45 0x7f2bca5fc7c9 in _PyEval_EvalFrameDefault (/usr/lib64/libpython3.6m.so.1.0+0x17d7c9)
#46 0x7f2bca605473 in _PyFunction_FastCallDict (/usr/lib64/libpython3.6m.so.1.0+0x186473)
#47 0x7f2bca579c3d in _PyObject_FastCallDict (/usr/lib64/libpython3.6m.so.1.0+0xfac3d)
#48 0x7f2bca57a4e0 in _PyObject_Call_Prepend (/usr/lib64/libpython3.6m.so.1.0+0xfb4e0)
#49 0x7f2bca579f9a in PyObject_Call (/usr/lib64/libpython3.6m.so.1.0+0xfaf9a)
#50 0x7f2bca5c20bc (/usr/lib64/libpython3.6m.so.1.0+0x1430bc)
#51 0x7f2bca5bf99d (/usr/lib64/libpython3.6m.so.1.0+0x14099d)
#52 0x7f2bca579b6b in _PyObject_FastCallDict (/usr/lib64/libpython3.6m.so.1.0+0xfab6b)
#53 0x7f2bca60386c (/usr/lib64/libpython3.6m.so.1.0+0x18486c)
#54 0x7f2bca5fc7c9 in _PyEval_EvalFrameDefault (/usr/lib64/libpython3.6m.so.1.0+0x17d7c9)
#55 0x7f2bca5fba50 in PyEval_EvalCodeEx (/usr/lib64/libpython3.6m.so.1.0+0x17ca50)
#56 0x7f2bca5fb77a in PyEval_EvalCode (/usr/lib64/libpython3.6m.so.1.0+0x17c77a)
#57 0x7f2bca67e491 (/usr/lib64/libpython3.6m.so.1.0+0x1ff491)
#58 0x7f2bca67e9ec in PyRun_FileExFlags (/usr/lib64/libpython3.6m.so.1.0+0x1ff9ec)
#59 0x7f2bca67e6b5 in PyRun_SimpleFileExFlags (/usr/lib64/libpython3.6m.so.1.0+0x1ff6b5)
#60 0x7f2bca6856e4 in Py_Main (/usr/lib64/libpython3.6m.so.1.0+0x2066e4)
#61 0x555a902d5de4 in main (/usr/bin/python3.6+0xde4)
#62 0x7f2bc9ec9349 in __libc_start_main (/lib64/libc.so.6+0x24349)
#63 0x555a902d5f59 in _start (/usr/bin/python3.6+0xf59)
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/usr/lib64/python3.6/site-packages/_SubnetTree.cpython-36m-x86_64-linux-gnu.so+0x2b6ca)
==32272==ABORTING
What SWIG version is being used?
Do you have a build/CMakeFiles/SubnetTree.dir/SubnetTreePYTHON_wrap.cxx
you can attach?
I can look at that generated code to see if there's something different/unexpected, but on my end still didn't notice anything.
SWIG version is 3.0.12.
But i just found an error in the deployment of the custom built package, so the above sanitized errors used the address sanitizer, but not the rebuilt binding. I ran configure and make but used the unmodified setup.py to install. I have now a fixed built so maybe the next error is more helpful.
I didn't find a SubnetTreePYTHON_wrap.cxx
in the path you mentioned. I have one directly below build. In the path you mentioned, there is only the compiled file SubnetTreePYTHON_wrap.cxx.o
. I attach the cxx
one: https://gist.github.com/dfn-certling/c7e44d44329c1c79e32f73b1f3a54320.
The latest build has run stable for ~14 days. I now deployed a built without any debug options or patches just using the CMake built instead of the included SWIG bindings. Maybe there is the culprit indeed.
That one runs stable as well. I revert to the pypi version to see if the bug is still present or if something else disappeared that caused it in the first place.
No more crashes. Either the data doesn't trigger it anymore or something else changed in the environment. Thanks for your help in trying to debug this.
We experience segmentation faults with 0.33 that look very similar to #15. We have a daemon that manages a couple hundred SubnetTrees in a dict. Each tree contains a good handful of networks. The daemon processes files from a directory. With each file the trees are reinitialized by emptying the dictionary and setting it back up by inserting the current networks returned by an API. After that the trees aren't modified and the IP addresses found in the files are looked up against all the trees. For the next file this process repeats: reinitialize the trees, lookup all IPs.
From time to time the daemon dies with a segfault. Unfortunately not always choking on the same file, so restarting it and processing the same file again succeeds sometimes, suggesting it is connected to the history of files processed.
I have a traceback from a core dump. The actual IP address substituted below by 'aaa.bbb.ccc.ddd'.
The location of the segfault seems to be similar to #15. I cannot get the local variables from the frames above 0, but looking into the code at frame 0
As far as I read it, rdi should contain the CIDR here and trying to access it shows
We are trying the workaround from #15 to see if it helps in our case as well.
I know this is not much to work on. Anything else I can try to extract from the core dump? Or things we could try to log or capture if it segfaults again? As I said, we weren't able to reproduce it with a test case, therefore it's the regular process that runs until it eventually dies.