simsong / bulk_extractor

This is the development tree. Production downloads are at:
https://github.com/simsong/bulk_extractor/releases
Other
1.11k stars 187 forks source link

find scanner change in BE2.0? Missing a search string. #292

Closed bgrundy closed 2 years ago

bgrundy commented 2 years ago

I'm testing BE2.0 and ran across an issue where the find scanner is not behaving as it did in 1.6.0dev. Test result is below. running the find scanner on an EWF image with 1.6.0 finds the expected string. Running the same command with 2.0.0 results in a zero length find.txt. I'v read the documentation and cannot find a change that should matter in this case. Any pointers would be appreciated.

Platform: Slackware-Current Target File: NTFS_Pract_2017_E01.tar.gz (LinuxLEO.com). Report.xml is available on request

bulk_extractor-20210928_692ee97

$ bulk_extractor -V
bulk_extractor 2.0.0-dev

$ bulk_extractor -o bo2021 -f "Uranium-235" NTFS_Pract_2017.E01
mkdir "bo2021"

bulk_extractor version: 2.0.0-dev
Input file: "NTFS_Pract_2017.E01"
Output directory: "bo2021"
Disk Size: 524288000
Scanners: aes base64 elf evtx exif facebook find gzip httplogs json kml msxml
net ntfsindx ntfslogfile ntfsmft ntfsusn pdf rar sqlite utmp vcard windirs
winlnk winpe winprefetch zip accts email gps
Threads: 2
going multi-threaded...( 2 )
bulk_extractor      Fri Nov 12 15:44:10 2021
...

$ cat bo2021/find.txt
<zero length - empty>

bulk_extractor-20170403_779dbe1

$ bulk_extractor -V
bulk_extractor 1.6.0-dev

$ bulk_extractor -o bo2017 -f "Uranium-235" NTFS_Pract_2017.E01
bulk_extractor version: 1.6.0-dev
Hostname: classbox.lab
Input file: NTFS_Pract_2017.E01
Output directory: bo2017
Disk Size: 524288000
...

$ cat bo2017/find.txt
# BANNER FILE NOT PROVIDED (-b option)
# BULK_EXTRACTOR-Version: 1.6.0-dev ($Rev: 10844 $)
# Feature-Recorder: find
# Filename: NTFS_Pract_2017.E01
# Feature-File-Version: 1.1
445901295-ZIP-9745  Uranium-235 ference between Uranium-235 and Uranium-238
445901295-ZIP-0-MSXML-857   Uranium-235 ference between Uranium-235 and Uranium-238
simsong commented 2 years ago

Thank you for the well-documented test case. Is the image NTFS_Pract_2017.E01 publicly available?

simsong commented 2 years ago

Oh, it's attached above. Thanks!

bgrundy commented 2 years ago

In case it's helpful, I uploaded report.xml from the BE2.0 output:

report.xml (LinuxLEO.com)

bgrundy commented 2 years ago

With the release of 2.0.0 (Thank You!), I retested this issue and it appears to be fixed. I've been following the BE code changes as close as I can, but not being a programmer (of any sort) I must have missed the relevant commit.

This was tested on a raw and E01 version of the same image and worked for both. Note that the version of BE compiled with libewf used the libewf-legacy 20140812 version instead of experimental. Both Sleuthkit and Plaso are tested with libewf-legacy and will reside along side BE in production forensic machines.

I believe this issue can be closed.

$ bulk_extractor -V
bulk_extractor 2.0.0

$ bulk_extractor -o bulk_out  -f "Uranium-235" NTFS_Pract_2017.E01

bulk_extractor      Sun Feb 13 09:19:52 2022

available_memory: 3236741120
bytes_queued: 26008967
depth0_bytes_queued: 20971520
depth0_sbufs_queued: 1
elapsed_time:  0:00:38
estimated_date_completion: 2022-02-13 09:19:52
estimated_time_remaining:  0:00:00
fraction_read: 100.000000 %
max_offset: 522373709
sbufs_created: 2000151
sbufs_queued: 4384
sbufs_remaining: 148
tasks_queued: 4382
thread-1: 522373709-ZIP-13982-BASE64-0: accts (1049 bytes)
thread-2: 503316480: email (20971520 bytes)
thread_count: 2
==========================================================================================>|

$ cat $ bulk_out/find.txt
# BANNER FILE NOT PROVIDED (-b option)
# BULK_EXTRACTOR-Version: 2.0.0
# Feature-Recorder: find
# Filename: NTFS_Pract_2017.E01
# Feature-File-Version: 1.1
445901295-ZIP-9745  Uranium-235 ference between Uranium-235 and Uranium-238
445901295-ZIP-0-MSXML-857   Uranium-235 ference between Uranium-235 and Uranium-238

Thanks!

simsong commented 2 years ago

Yes. Thank you very much for reporting this. There is now an explicit unit test for the find scanner:

https://github.com/simsong/bulk_extractor/blob/17c2a0d52d67f3dd9bb46f62ed8678c6e48cf525/src/test_be3.cpp#L142-L159

This means that the find scanner will be tested on every pull-request and commit to GitHub.

Thank you again!