simsong / bulk_extractor

This is the development tree. Production downloads are at:
https://github.com/simsong/bulk_extractor/releases
Other
1.04k stars 183 forks source link

File system iterator needs to keep a set of directories and files visited and not process the same directory or file twice. #472

Open martinmdp opened 2 months ago

martinmdp commented 2 months ago

Hi

Thanks for this amazing tool.

I managed to compile and install bulk_extractor on linux debian 12, but when I run with te command:

bulk_extractor -o results -R / -E accts

fails with:

terminate called after throwing an instance of 'std::filesystem::__cxx11::filesystem_error' what(): filesystem error: status: Too many levels of symbolic links [/run/udev/watch/25] Aborted

any suggestions?

Thanks in advance

simsong commented 1 month ago

This likely happened because you have a recursive symlink - a symlink that points to a directory that ultimately points back to the symlink. The filesystem iterator does not keep a set of all directories and symlinks that have been previously visited to make sure that it never processes the same symlink or directory twice. That's a good thing to add. It wouldn't be hard to do. Would you like to add it?

martinmdp commented 1 month ago

@simsong thanks for the response, unfortunatly i'm not a C++ developer, there is no way to exclude directories from the scan? In linux some folders like /sys, /dev or /run are links to devices or dynamic files, during the scans of bulk_extractor the procedure fails with "segmentation fault" ( for example with /sys/kernel/notes ), and if the tool wasn't compiled with libexpat library, is not possible to resume the scan.

Thanks in advance

simsong commented 1 month ago

Well, what is your goal in scanning /sys and /dev and /run?

martinmdp commented 1 month ago

I need a full system scan, because of a requirement, then i do bulk_extractor -o results -R /

This scan the whole system, but when get into the folders mentioned there Is a segmentation fault or a problem following symbolic links

simsong commented 1 month ago

Is there a reason you can’t just scan the raw drive?

On Thu, May 9, 2024 at 1:18 PM martinmdp @.***> wrote:

I need a full system scan, because of a requirement, then i do bulk_extractor -o results -R /

This scan the whole system, but when get into the folders mentioned there Is a segmentation fault or a problem following symbolic links

— Reply to this email directly, view it on GitHub https://github.com/simsong/bulk_extractor/issues/472#issuecomment-2103081956, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMFHLBPNY3JI4NEOFHFP5LZBOVXPAVCNFSM6AAAAABHLQQI4CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBTGA4DCOJVGY . You are receiving this because you were mentioned.Message ID: @.***>

martinmdp commented 1 month ago

The scan is from the same filesystem, i cant scan a image because is a production server

simsong commented 1 month ago

I am also unclear why you want to scan /dev/random and /dev/zero. Can you try running strings or wc in these devices for me and print the results here?

simsong commented 1 month ago

The scan is from the same filesystem, i cant scan a image because is a production server

You can't read the file system's raw disk partition on a production system? What OS are you running?

martinmdp commented 1 month ago

I don't need to scan those particular directories, I need to scan the entire system for certain evidence, the problem is that when passing the "-R / " parameter to scan the entire system and generate a single report, the process happens for all folders without exception, and in the system folders it throws an error, if I have to scan for each folder I must generate more than 10 different reports (for example, run the command each time with /etc, /home/ /var and so on), I need a single scan for the entire root "/", I am using Linux operating system debian 11 and 12, latest version of bulk_extractor compiled with git pull --recursive, boostrap.sh, ./configure, make and make install

simsong commented 1 month ago

Well, -R / says start at the root directory and scan every single readable object in every folder. And one of the folders you will get from / is /dev. So -R / is asking the computer to scan /dev/chargen and /dev/zero and I think that you will be very unhappy when you read those. And /dev/stdin will read from your keyboard until you type ^d. So I think that you really do not want to do a -R /. Either that, or you are asking that the iterator also only scan regular files and not devices or pipes of FIFOs or other things that are in the file system. Right?

I'm still curious why you are only interested in scanning the allocated blocks associated with files, and then, you are only interested in the primary stream of file systems that support multiple data streams. Are you sure that you do not want data from deleted files?

Don't get me wrong — the mods that you are asking for make sense. But I want to be sure that I understand exactly what you need before I put in a change request.

martinmdp commented 1 month ago

Thanks for your response, I'm not really requesting modifications, I've just been tasked with scanning a file system in search of certain information. Honestly, I'm not interested in system folders or deleted files. I just need a single report of a system scan complete, and if there are findings, let them appear in the report.

I have run bulk_extractor on a Windows system without problems, but on Linux I cannot do it due to this problem, that is why I was asking if there is a way to "exclude" folders or symlinks, because that way I would run the command with -R / and I would have the possibility of excluding or ignore system folders like /dev, /sys, etc.

Thanks for your time

simsong commented 1 month ago

Hi. If you are asking for -R / to not include all of the directories and file system entries under /, then you are very much requesting modifications.

You haven't had these problems on Windows because Windows doesn't have a unified file system that places devices underneath a single root directory, and because everything on Windows is not a file, as it is on Linux.