Closed laissezfarrell closed 8 months ago
Thank you so much for this. It's clear that there is a hang on the 2.0 multithreaded dispatch system. Does this happen reliably? Can you give me the data that makes this happen?
What happens when you type ^C and then restart the program?
This happened reliably with that set of files. Killing the process and restarting resulted in the same behavior.
In other cases, different sets of files resulted in All data read; waiting for threads to finish...
but only for a period of time before, presumably, the threads finished and the application completed its run.
If helpful, I can provide both hung/aborted bulk_extractor 2.0 report XML files and successfully completed bulk_extractor 1.5 report XML files reporting on the same set of files.
Oh, I’m pretty sure the issue is with the multi-threading code. I’ll get you a temp release to try, once I have a chance to work on this.
This seems to be a problem in the find scanner...
command line:
% src/bulk_extractor --notify_main_thread -S ssn_mode=1 -e outlook -x zip -x rar -x winpe -x exif -x pdf -J -d8 -o out1 -Z -F tests/patterns.txt tests/Images/nps-2010-emails.E01
opening tests/Images/nps-2010-emails.E01
bulk_extractor version: 2.0.3
Input file: "tests/Images/nps-2010-emails.E01"
Output directory: "out1"
Disk Size: 10485760
Scanners: aes base64 elf evtx facebook find gzip httplogs json kml_carved msxml net ntfsindx ntfslogfile ntfsmft ntfsusn outlook sqlite utmp vcard_carved windirs winlnk winprefetch accts email gps
Threading Disabled
running single-threaded (DEBUG)...
Then I typed ^T a few times:
load: 5.37 cmd: bulk_extractor 68295 running 88.72u 18.47s
load: 4.40 cmd: bulk_extractor 68295 running 23046.41u 8449.16s
load: 4.13 cmd: bulk_extractor 68295 running 23048.93u 8450.26s
So I attached in another window:
(base) simsong@Seasons src % lldb bulk_extractor (402-fix-J)bulk_extractor
(lldb) target create "bulk_extractor"
Current executable set to '/Users/simsong/gits/bulk_extractor/src/bulk_extractor' (arm64).
(lldb) attach 68295
Process 68295 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
frame #0: 0x0000000188e4fd88 libsystem_kernel.dylib`_kernelrpc_mach_vm_deallocate_trap + 8
libsystem_kernel.dylib`:
-> 0x188e4fd88 <+8>: ret
libsystem_kernel.dylib`task_dyld_process_info_notify_get:
0x188e4fd8c <+0>: mov x16, #-0xd
0x188e4fd90 <+4>: svc #0x80
0x188e4fd94 <+8>: ret
Target 0: (bulk_extractor) stopped.
(lldb) where
error: 'where' is not a valid command.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
* frame #0: 0x0000000188e4fd88 libsystem_kernel.dylib`_kernelrpc_mach_vm_deallocate_trap + 8
frame #1: 0x0000000188e519dc libsystem_kernel.dylib`mach_vm_deallocate + 88
frame #2: 0x0000000188cbdfdc libsystem_malloc.dylib`mvm_deallocate_pages + 144
frame #3: 0x0000000188cbc4ec libsystem_malloc.dylib`free_large + 416
frame #4: 0x0000000188cc667c libsystem_malloc.dylib`_szone_free + 720
frame #5: 0x00000001022fbb88 bulk_extractor`bool std::__1::basic_regex<char, std::__1::regex_traits<char> >::__match_at_start_ecma<std::__1::allocator<std::__1::sub_match<char const*> > >(char const*, char const*, std::__1::match_results<char const*, std::__1::allocator<std::__1::sub_match<char const*> > >&, std::__1::regex_constants::match_flag_type, bool) const [inlined] void std::__1::__libcpp_operator_delete[abi:v15006]<void*>(__args=<unavailable>) at new:256:3 [opt]
frame #6: 0x00000001022fbb84 bulk_extractor`bool std::__1::basic_regex<char, std::__1::regex_traits<char> >::__match_at_start_ecma<std::__1::allocator<std::__1::sub_match<char const*> > >(char const*, char const*, std::__1::match_results<char const*, std::__1::allocator<std::__1::sub_match<char const*> > >&, std::__1::regex_constants::match_flag_type, bool) const [inlined] void std::__1::__do_deallocate_handle_size[abi:v15006]<>(__ptr=<unavailable>, __size=<unavailable>) at new:280:10 [opt]
frame #7: 0x00000001022fbb84 bulk_extractor`bool std::__1::basic_regex<char, std::__1::regex_traits<char> >::__match_at_start_ecma<std::__1::allocator<std::__1::sub_match<char const*> > >(char const*, char const*, std::__1::match_results<char const*, std::__1::allocator<std::__1::sub_match<char const*> > >&, std::__1::regex_constants::match_flag_type, bool) const [inlined] std::__1::__libcpp_deallocate[abi:v15006](__ptr=<unavailable>, __size=<unavailable>, __align=8) at new:296:14 [opt]
frame #8: 0x00000001022fbb84 bulk_extractor`bool std::__1::basic_regex<char, std::__1::regex_traits<char> >::__match_at_start_ecma<std::__1::allocator<std::__1::sub_match<char const*> > >(char const*, char const*, std::__1::match_results<char const*, std::__1::allocator<std::__1::sub_match<char const*> > >&, std::__1::regex_constants::match_flag_type, bool) const [inlined] std::__1::allocator<std::__1::__state<char> >::deallocate[abi:v15006](this=0x000000016db17730, __p=<unavailable>, __n=<unavailable>) at allocator.h:128:13 [opt]
frame #9: 0x00000001022fbb84 bulk_extractor`bool std::__1::basic_regex<char, std::__1::regex_traits<char> >::__match_at_start_ecma<std::__1::allocator<std::__1::sub_match<char const*> > >(char const*, char const*, std::__1::match_results<char const*, std::__1::allocator<std::__1::sub_match<char const*> > >&, std::__1::regex_constants::match_flag_type, bool) const [inlined] std::__1::allocator_traits<std::__1::allocator<std::__1::__state<char> > >::deallocate[abi:v15006](__a=0x000000016db17730, __p=<unavailable>, __n=<unavailable>) at allocator_traits.h:282:13 [opt]
frame #10: 0x00000001022fbb84 bulk_extractor`bool std::__1::basic_regex<char, std::__1::regex_traits<char> >::__match_at_start_ecma<std::__1::allocator<std::__1::sub_match<char const*> > >(char const*, char const*, std::__1::match_results<char const*, std::__1::allocator<std::__1::sub_match<char const*> > >&, std::__1::regex_constants::match_flag_type, bool) const [inlined] std::__1::vector<std::__1::__state<char>, std::__1::allocator<std::__1::__state<char> > >::~vector[abi:v15006](this=0x000000016db17720 size=0) at vector:437:9 [opt]
frame #11: 0x00000001022fbb34 bulk_extractor`bool std::__1::basic_regex<char, std::__1::regex_traits<char> >::__match_at_start_ecma<std::__1::allocator<std::__1::sub_match<char const*> > >(char const*, char const*, std::__1::match_results<char const*, std::__1::allocator<std::__1::sub_match<char const*> > >&, std::__1::regex_constants::match_flag_type, bool) const [inlined] std::__1::vector<std::__1::__state<char>, std::__1::allocator<std::__1::__state<char> > >::~vector[abi:v15006](this=0x000000016db17720 size=0) at vector:430:5 [opt]
frame #12: 0x00000001022fbb34 bulk_extractor`bool std::__1::basic_regex<char, std::__1::regex_traits<char> >::__match_at_start_ecma<std::__1::allocator<std::__1::sub_match<char const*> > >(this=0x0000600003e44000, __first="GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG"..., __last="", __m=0x000000016db17860, __flags=<unavailable>, __at_first=<unavailable>) const at regex:5850:1 [opt]
frame #13: 0x0000000102301f0c bulk_extractor`bool std::__1::basic_regex<char, std::__1::regex_traits<char> >::__search<std::__1::allocator<std::__1::sub_match<char const*> > >(char const*, char const*, std::__1::match_results<char const*, std::__1::allocator<std::__1::sub_match<char const*> > >&, std::__1::regex_constants::match_flag_type) const [inlined] bool std::__1::basic_regex<char, std::__1::regex_traits<char> >::__match_at_start<std::__1::allocator<std::__1::sub_match<char const*> > >(this=0x0000600003e44000, __first="GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG"..., __last="", __m=0x000000016db17860, __flags=match_prev_avail, __at_first=false) const at regex:6056:16 [opt]
frame #14: 0x0000000102301ef0 bulk_extractor`bool std::__1::basic_regex<char, std::__1::regex_traits<char> >::__search<std::__1::allocator<std::__1::sub_match<char const*> > >(this=0x0000600003e44000, __first="GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG"..., __last="", __m=0x000000016db17860, __flags=match_prev_avail) const at regex:6090:17 [opt]
frame #15: 0x0000000102307c74 bulk_extractor`regex_vector::search_all(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*, unsigned long*, unsigned long*) const [inlined] bool std::__1::regex_search[abi:v15006]<std::__1::char_traits<char>, std::__1::allocator<char>, std::__1::allocator<std::__1::sub_match<std::__1::__wrap_iter<char const*> > >, char, std::__1::regex_traits<char> >(__s="gD\xc9\xf3\x86\xa9f\x9e\x9e\xf5\x8c[Ź\xc99\xb1\xb7in\x9eN\xfb\U00000003\xb7\U00000006\xfb\x86\x8d9\U0000001d]9\xf5\x8c[ŹɞN\xb1\xb7i9\U0000001d\x869\xf5\x8c[Ź\xc99\xf5\x8c;\xc5 DgɞGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG\xdd.\x85GGGGGG\xe6\xe44绤\x8fGѝ|\xf0\x945\xa04\x86\xcb\xe74n;\x879&IF8\xe0y<\ay\xae\xb7\vgi\x8a\xea\f#=\x8a\xe1X_\xfe\xdb@\U00000011\xed\U0000001b\U0000001a=\xd6\U0000007f\xa8\xbd\xa8\b\xc5~I\U00000012\xe0*sf\xf1GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG\x87Md\xe6GG\xec\xec\xe6G\U00000016N_\xebGG\xf1GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG"..., __m=0x000000016db177f0, __e=0x0000600003e44000, __flags=match_default) at regex:6210:20 [opt]
frame #16: 0x0000000102307c28 bulk_extractor`regex_vector::search_all(this=<unavailable>, probe="gD\xc9\xf3\x86\xa9f\x9e\x9e\xf5\x8c[Ź\xc99\xb1\xb7in\x9eN\xfb\U00000003\xb7\U00000006\xfb\x86\x8d9\U0000001d]9\xf5\x8c[ŹɞN\xb1\xb7i9\U0000001d\x869\xf5\x8c[Ź\xc99\xf5\x8c;\xc5 DgɞGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG\xdd.\x85GGGGGG\xe6\xe44绤\x8fGѝ|\xf0\x945\xa04\x86\xcb\xe74n;\x879&IF8\xe0y<\ay\xae\xb7\vgi\x8a\xea\f#=\x8a\xe1X_\xfe\xdb@\U00000011\xed\U0000001b\U0000001a=\xd6\U0000007f\xa8\xbd\xa8\b\xc5~I\U00000012\xe0*sf\xf1GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG\x87Md\xe6GG\xec\xec\xe6G\U00000016N_\xebGG\xf1GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG"..., found="", offset=0x000000016db17bb0, len=0x000000016db17a78) const at regex_vector.cpp:31:9 [opt]
frame #17: 0x000000010236a934 bulk_extractor`::scan_find(sp=0x000000016db180a8) at scan_find.cpp:95:28 [opt]
frame #18: 0x0000000102319b30 bulk_extractor`scanner_set::process_sbuf(this=0x000000016db1a800, sbufp=0x000000011df08f30, scanner=(bulk_extractor`::scan_find(scanner_params &) at scan_find.cpp:48))(scanner_params&)) at scanner_set.cpp:873:9 [opt]
frame #19: 0x0000000102316e24 bulk_extractor`scanner_set::process_sbuf(this=0x000000016db1a800, sbufp=0x000000011df08f30) at scanner_set.cpp:1022:13 [opt]
frame #20: 0x00000001023160d0 bulk_extractor`scanner_set::schedule_sbuf(this=0x000000016db1a800, sbufp=0x000000011df08f30) at scanner_set.cpp:684:9 [opt]
frame #21: 0x000000010237aa30 bulk_extractor`::scan_outlook(sp=0x000000016db18708) at scan_outlook.cpp:83:16 [opt]
frame #22: 0x0000000102319b30 bulk_extractor`scanner_set::process_sbuf(this=0x000000016db1a800, sbufp=0x000000011df088f0, scanner=(bulk_extractor`::scan_outlook(scanner_params &) at scan_outlook.cpp:61))(scanner_params&)) at scanner_set.cpp:873:9 [opt]
frame #23: 0x0000000102316e24 bulk_extractor`scanner_set::process_sbuf(this=0x000000016db1a800, sbufp=0x000000011df088f0) at scanner_set.cpp:1022:13 [opt]
frame #24: 0x00000001023160d0 bulk_extractor`scanner_set::schedule_sbuf(this=0x000000016db1a800, sbufp=0x000000011df088f0) at scanner_set.cpp:684:9 [opt]
frame #25: 0x000000010234e6ec bulk_extractor`Phase1::read_process_sbufs(this=0x000000016db18e50) at phase1.cpp:207:24 [opt]
frame #26: 0x000000010234faf0 bulk_extractor`Phase1::phase1_run(this=0x000000016db18e50) at phase1.cpp:290:5 [opt]
frame #27: 0x0000000102330404 bulk_extractor`bulk_extractor_main(cout=<unavailable>, cerr=<unavailable>, argc=<unavailable>, argv=<unavailable>) at bulk_extractor.cpp:608:16 [opt]
frame #28: 0x0000000188b37f28 dyld`start + 2236
(lldb)
Run a bit more and see where we get:
(lldb) cont
Process 68295 resuming
(lldb)
error: Process is running. Use 'process interrupt' to pause execution.
(lldb) bt
error: Command requires a process which is currently stopped.
(lldb) process interrupt
Process 68295 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
frame #0: 0x0000000188ccc460 libsystem_malloc.dylib`_nanov2_free + 284
libsystem_malloc.dylib`:
-> 0x188ccc460 <+284>: cmp w9, #0x7fc
0x188ccc464 <+288>: b.eq 0x188ccc494 ; <+336>
0x188ccc468 <+292>: b 0x188ccc478 ; <+308>
0x188ccc46c <+296>: sub w9, w9, #0x7fe
Target 0: (bulk_extractor) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
* frame #0: 0x0000000188ccc460 libsystem_malloc.dylib`_nanov2_free + 284
frame #1: 0x00000001022fba3c bulk_extractor`bool std::__1::basic_regex<char, std::__1::regex_traits<char> >::__match_at_start_ecma<std::__1::allocator<std::__1::sub_match<char const*> > >(this=0x0000600003e44000, __first="GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG"..., __last="", __m=0x000000016db17860, __flags=match_prev_avail, __at_first=<unavailable>) const at regex:0 [opt]
frame #2: 0x0000000102301f0c bulk_extractor`bool std::__1::basic_regex<char, std::__1::regex_traits<char> >::__search<std::__1::allocator<std::__1::sub_match<char const*> > >(char const*, char const*, std::__1::match_results<char const*, std::__1::allocator<std::__1::sub_match<char const*> > >&, std::__1::regex_constants::match_flag_type) const [inlined] bool std::__1::basic_regex<char, std::__1::regex_traits<char> >::__match_at_start<std::__1::allocator<std::__1::sub_match<char const*> > >(this=0x0000600003e44000, __first="GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG"..., __last="", __m=0x000000016db17860, __flags=match_prev_avail, __at_first=false) const at regex:6056:16 [opt]
frame #3: 0x0000000102301ef0 bulk_extractor`bool std::__1::basic_regex<char, std::__1::regex_traits<char> >::__search<std::__1::allocator<std::__1::sub_match<char const*> > >(this=0x0000600003e44000, __first="GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG"..., __last="", __m=0x000000016db17860, __flags=match_prev_avail) const at regex:6090:17 [opt]
frame #4: 0x0000000102307c74 bulk_extractor`regex_vector::search_all(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*, unsigned long*, unsigned long*) const [inlined] bool std::__1::regex_search[abi:v15006]<std::__1::char_traits<char>, std::__1::allocator<char>, std::__1::allocator<std::__1::sub_match<std::__1::__wrap_iter<char const*> > >, char, std::__1::regex_traits<char> >(__s="gD\xc9\xf3\x86\xa9f\x9e\x9e\xf5\x8c[Ź\xc99\xb1\xb7in\x9eN\xfb\U00000003\xb7\U00000006\xfb\x86\x8d9\U0000001d]9\xf5\x8c[ŹɞN\xb1\xb7i9\U0000001d\x869\xf5\x8c[Ź\xc99\xf5\x8c;\xc5 DgɞGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG\xdd.\x85GGGGGG\xe6\xe44绤\x8fGѝ|\xf0\x945\xa04\x86\xcb\xe74n;\x879&IF8\xe0y<\ay\xae\xb7\vgi\x8a\xea\f#=\x8a\xe1X_\xfe\xdb@\U00000011\xed\U0000001b\U0000001a=\xd6\U0000007f\xa8\xbd\xa8\b\xc5~I\U00000012\xe0*sf\xf1GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG\x87Md\xe6GG\xec\xec\xe6G\U00000016N_\xebGG\xf1GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG"..., __m=0x000000016db177f0, __e=0x0000600003e44000, __flags=match_default) at regex:6210:20 [opt]
frame #5: 0x0000000102307c28 bulk_extractor`regex_vector::search_all(this=<unavailable>, probe="gD\xc9\xf3\x86\xa9f\x9e\x9e\xf5\x8c[Ź\xc99\xb1\xb7in\x9eN\xfb\U00000003\xb7\U00000006\xfb\x86\x8d9\U0000001d]9\xf5\x8c[ŹɞN\xb1\xb7i9\U0000001d\x869\xf5\x8c[Ź\xc99\xf5\x8c;\xc5 DgɞGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG\xdd.\x85GGGGGG\xe6\xe44绤\x8fGѝ|\xf0\x945\xa04\x86\xcb\xe74n;\x879&IF8\xe0y<\ay\xae\xb7\vgi\x8a\xea\f#=\x8a\xe1X_\xfe\xdb@\U00000011\xed\U0000001b\U0000001a=\xd6\U0000007f\xa8\xbd\xa8\b\xc5~I\U00000012\xe0*sf\xf1GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG\x87Md\xe6GG\xec\xec\xe6G\U00000016N_\xebGG\xf1GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG"..., found="", offset=0x000000016db17bb0, len=0x000000016db17a78) const at regex_vector.cpp:31:9 [opt]
frame #6: 0x000000010236a934 bulk_extractor`::scan_find(sp=0x000000016db180a8) at scan_find.cpp:95:28 [opt]
frame #7: 0x0000000102319b30 bulk_extractor`scanner_set::process_sbuf(this=0x000000016db1a800, sbufp=0x000000011df08f30, scanner=(bulk_extractor`::scan_find(scanner_params &) at scan_find.cpp:48))(scanner_params&)) at scanner_set.cpp:873:9 [opt]
frame #8: 0x0000000102316e24 bulk_extractor`scanner_set::process_sbuf(this=0x000000016db1a800, sbufp=0x000000011df08f30) at scanner_set.cpp:1022:13 [opt]
frame #9: 0x00000001023160d0 bulk_extractor`scanner_set::schedule_sbuf(this=0x000000016db1a800, sbufp=0x000000011df08f30) at scanner_set.cpp:684:9 [opt]
frame #10: 0x000000010237aa30 bulk_extractor`::scan_outlook(sp=0x000000016db18708) at scan_outlook.cpp:83:16 [opt]
frame #11: 0x0000000102319b30 bulk_extractor`scanner_set::process_sbuf(this=0x000000016db1a800, sbufp=0x000000011df088f0, scanner=(bulk_extractor`::scan_outlook(scanner_params &) at scan_outlook.cpp:61))(scanner_params&)) at scanner_set.cpp:873:9 [opt]
frame #12: 0x0000000102316e24 bulk_extractor`scanner_set::process_sbuf(this=0x000000016db1a800, sbufp=0x000000011df088f0) at scanner_set.cpp:1022:13 [opt]
frame #13: 0x00000001023160d0 bulk_extractor`scanner_set::schedule_sbuf(this=0x000000016db1a800, sbufp=0x000000011df088f0) at scanner_set.cpp:684:9 [opt]
frame #14: 0x000000010234e6ec bulk_extractor`Phase1::read_process_sbufs(this=0x000000016db18e50) at phase1.cpp:207:24 [opt]
frame #15: 0x000000010234faf0 bulk_extractor`Phase1::phase1_run(this=0x000000016db18e50) at phase1.cpp:290:5 [opt]
frame #16: 0x0000000102330404 bulk_extractor`bulk_extractor_main(cout=<unavailable>, cerr=<unavailable>, argc=<unavailable>, argv=<unavailable>) at bulk_extractor.cpp:608:16 [opt]
frame #17: 0x0000000188b37f28 dyld`start + 2236
(lldb)
Turns out that the -S ssn_mode=1
isn't necessary to replicate the crash, but the -e outlook
is.
As of 7e2f14c814e86dd6ad30d7156055c2c3390d0cff, this command line no longer causes a hang:
$ src/bulk_extractor --notify_main_thread -S ssn_mode=1 -e outlook -x zip -x rar -x winpe -x exif -x pdf -J -d8 -o out1 -Z -F tests/patterns.txt tests/Images/nps-2010-emails.E01
I've now determined that this is a STL regex issue. Other people have encountered it.
So I think that I need to come up with a new approach for scan_find so that it doesn't send megabytes or gigabytes to std::regex_search.
It's going to have to be a lot smarter.
Oh! Is scan_find using C++’s std::regex in 2.0?It is notorious for poor performance. Can you describe the issue in more depth for me?We really aren’t too far away from a new lightgrep release in support of the lightgrep PR. Now that I’m through this weekend, I have crossed many things off my to-do list and have time to think and work.JonOn Jan 15, 2024, at 7:34 PM, Simson L. Garfinkel @.***> wrote: So I think that I need to come up with a new approach for scan_find so that it doesn't send megabytes or gigabytes to std::regex_search. It's going to have to be a lot smarter.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>
My plan is to replace std::regex with RE2, for people who can't get lightgrep to work.
RE2 has been replaced and BE no longer hangs.
Checklist:
Original Report
Running bulk_extractor 2.02 with this command:
bulk_extractor -S ssn_mode=1 -e outlook -x zip -x rar -x winpe -x exif --no_threads -o /home/accessions/b_e2x_errors/debug_mode02 -R /home/accessions/UA2023-0021/objects/OPD/ -F /home/scripts/be_regex/uaregex.txt
Ran bulk_extractor in multi-threaded mode. When bulk_extractor reported
All data read; waiting for threads to finish...
the process proceeded to hang for over three days until I killed the process manually.