simsong / bulk_extractor

This is the development tree. Production downloads are at:
https://github.com/simsong/bulk_extractor/releases
Other
1.04k stars 184 forks source link

It is not clear to many beginners how to git clone and then build bulk_extractor because of the git submodule structure. #393

Closed lic-8 closed 1 year ago

lic-8 commented 1 year ago

Hello, I am trying to compile BE for linux and windows. I have not tried for windows yet, because I don't see any instructions. For Linux, I cloned the repo locally on my machine and ran ./bootstrap.sh and it gives me an error submodule be20_api is not present.

However I cloned be20_api git repo within the bulk_extractor directory, then I go to be20_api repo and ran git submodule update --init --recursive

How do I do to compile a headless bulk_extractor for linux and windows ?

simsong commented 1 year ago

What version of Linux are you using?

simsong commented 1 year ago

Oh, where precisely would you like to find the step-by-step instructions to do this?

lic-8 commented 1 year ago

I'm using Ubuntu 22.04. I am not expecting step by step instructions, but, since on the main readme there are basic instructions for compiling on Ubuntu, I was wondering if there was the equivalent for windows

lic-8 commented 1 year ago

What I did is:

git clone https://github.com/simsong/bulk_extractor.git
cd bulk_extractor 
git clone https://github.com/simsong/be20_api.git
cd be20_api
git submodule update --init --recursive
cd ..
./bootstrap.sh

then it gives the error

simsong commented 1 year ago

Hi. I'd like to know precisely where you expect to see the instructions, and then I will put the instructions there for you. There are instructions in several locations, but you apparently haven't found them. So please let me know where you expect to find them.

Your error was that your original git clone should have included the --recurse-submodules flag.

However, you probably wanted to prepare the VM by running https://github.com/simsong/bulk_extractor/blob/main/etc/CONFIGURE_UBUNTU20LTS.bash, which I should update for UBUNTU 22.04. So my request is for you to give me the URL of the web page where you would like to find instructions, and I will put them there.

Finally, you should know that the windows version can only be built from Fedora.

lic-8 commented 1 year ago

I found this link that has information about that but I think it would be cool to add some details for newbie users like me. Why is it only possible to compile a Windows executable from Fedora ? What I precisely need, is to be able to "embed" bulk_extractor in another program to be able to run bulk_extractor commands from within another program's code using os commands execution (without any terminal to pop, I need a headless mode)

simsong commented 1 year ago

Hi. It would be super-nice if you would provide for me a URL where you would like to see the instructions. This would help other people in your position.

Fedora has better maintained mingw compilers than Ubuntu. That is why it is not possible to use Ubuntu to compile the windows executable.

Embedding bulk_extractor in another program is no different from calling any other command line program. The details depend on the language you are using and the operating system.

lic-8 commented 1 year ago

https://github.com/simsong/bulk_extractor/wiki/Installing-bulk_extractor Thank you for your help !

zdavatz commented 1 year ago

These steps just worked very well for me on Ubuntu: https://github.com/simsong/bulk_extractor#building-bulk_extractor

simsong commented 1 year ago

Thank you for the update, and thank you for prompting me to put the instructions in a more obvious place.

zdavatz commented 1 year ago

Thank you for the great software!

One question I also wanted to ask: Why do you not output the amount of PDFs found?

simsong commented 1 year ago

One question I also wanted to ask: Why do you not output the amount of PDFs found?

I don't understand this question.

zdavatz commented 1 year ago

I don't understand this question.

Lets say I have 2000 PDF files on my Harddisk drive. I would find it interesting to know, which information comes from the PDF files.

simsong commented 1 year ago

The program doesn't know about files. You might want to review the paper to remind yourself as to how the program works, what it does, and what it does not do.

zdavatz commented 1 year ago

Where do I find the latest version of the paper?

simsong commented 1 year ago

https://github.com/simsong/bulk_extractor/tree/main/doc