Closed laissezfarrell closed 10 months ago
Hi. I'm trying to understand the error here. Is it that -J makes it use multiple threads, or it that it hung?
Can you provide /home/accessions/UA2023-0021/objects/OPD/
and /home/scripts/be_regex/uaregex.txt
?
Hi -
With -J, I expected bulk_extractor to use only the primary thread, not run in multi-threaded mode.
The regex file is here: https://github.com/laissezfarrell/rl-bitcurator-scripts/blob/master/be_regex/uaregex.txt
I'm not able to share the content files because of access restrictions. I can share some summary data about them (filetypes, sizes, aggregate size, etc), if that would be helpful, though I'm not sure that would be.
Okay. Let me check this out over the weekend. Thanks.
This will almost certainly require review in the be20_api.
Linked with https://github.com/simsong/be20_api/issues/92
Here is a command line that may exercise the bug:
src/bulk_extractor -S ssn_mode=1 -e outlook -x zip -x rar -x winpe -x exif -x pdf -J -d8 -o out1 -Z -F tests/patterns.txt tests/Images/nps-2010-emails.E01
Also:
src/bulk_extractor --notify_main_thread -S ssn_mode=1 -e outlook -x zip -x rar -x winpe -x exif -x pdf -J -d8 -o out1 -Z -F tests/patterns.txt tests/Images/nps-2010-emails.E01
What's the use case for -J/--no-threads
? It's not obvious to me; I don't see the advantage of -J
over -j 1
(where processing happens on one background thread while the main thread monitors). This disadvantage is the complexity of supporting a different codepath. If there's no use case, maybe this option could just be deleted?
It's very useful for debugging.
Fixed. -J now works properly:
simsong@Simsons-MacBook-Pro bulk_extractor % src/bulk_extractor -1 -S ssn_mode=1 -e outlook -x zip -x rar -x winpe -x exif -x pdf -J -o out1 -R . (main)bulk_extractor
bulk_extractor version: 2.0.6
Input file: "."
Output directory: "out1"
Disk Size: 1577
Scanners: aes base64 elf evtx facebook find gzip httplogs json kml_carved msxml net ntfsindx ntfslogfile ntfsmft ntfsusn outlook sqlite utmp vcard_carved windirs winlnk winprefetch accts email gps
Threading Disabled
running single-threaded (DEBUG)...
20:21:47 Offset 0MB (0.00%) Done in n/a at 2024-01-15 20:21:46
20:21:48 Offset 0MB (2.09%) Done in 0:00:47 at 2024-01-15 20:22:35
20:21:49 Offset 0MB (2.09%) Done in 0:01:34 at 2024-01-15 20:23:23
20:21:50 Offset 0MB (2.09%) Done in 0:02:21 at 2024-01-15 20:24:11
20:21:51 Offset 0MB (2.09%) Done in 0:03:08 at 2024-01-15 20:24:59
...
report.xml
when using-J
sbuf
when analyzed in single-threaded mode.Running bulk_extractor 2.02 with command:
bulk_extractor -S ssn_mode=1 -e outlook -x zip -x rar -x winpe -x exif -x pdf -J -d8 -o /home/accessions/b_e2x_errors/debug_mode05 -R /home/accessions/UA2023-0021/objects/OPD/ -F /home/scripts/be_regex/uaregex.txt
reported "going multi-threaded (24)" and eventually reported "All data read; waiting for threads to finish..."
Execution environment: `