simsong / bulk_extractor

This is the development tree. Production downloads are at:
https://github.com/simsong/bulk_extractor/releases
Other
1.12k stars 188 forks source link

Add RTTI thumbnail option #460

Open Rannek opened 9 months ago

Rannek commented 9 months ago

bulk_extractor automatically finds and extracts JPG images but the tool does not currently support the extraction of RTTI thumbnail files.

RTTI is a thumbnail file format generated by Raw Therapee. These files cannot be extracted by standard methods, which means that they are currently overlooked by bulk_extractor. and other softwares too.

I have developed a script that successfully extracts RTTI thumbnail files. I believe integrating this script into bulk_extractor would significantly increase the tool's yield and make it even more versatile.

https://github.com/Rannek/raw-therapee-thumbnail-extractor

simsong commented 9 months ago

Right now, your script is in Python. If you want to rewrite it as a bulk_extractor module, we can take it now. Otherwise, it will need to wait until we can take on python modules. They will run much slower, because python runs much slower than see.

How widely is RTTI used?

Rannek commented 9 months ago

Thank you for the answer and the clarification. I will try to rewrite my Python script as a bulk_extractor module in C++.

How widely is it used? That’s a good question. It is an open-source alternative to Adobe Lightroom, used to edit mainly RAW images but can handle other formats too. I think it’s pretty well known in the photographic community.

Every Raw Therapee Thumbnail Image (RTTI) begins with either Image8 or Image16, which indicates the number of bits per channel in the thumbnail image in RGB layout.

Screenshot_20240207_080858

simsong commented 9 months ago

It's not very hard. You will need to have a test file as well. Please check out the src directory.

simsong commented 9 months ago

I'm happy to review your code and otherwise help out!

Rannek commented 9 months ago

I converted my script to C++ and also made it support binary files, but I can't integrate it as a bulk_extractor module because it exceeds my skills. I am pretty much a noob at C++ and Makefiles and still learning.

You can find my script here, with test files: https://github.com/Rannek/rtti_cpp/

I hope it can be implemented into bulk_extractor somehow.

simsong commented 9 months ago

Thanks. Congrats on getting the program to work. Your program depends on OpenCV. I don't want to build OpenCV into bulk_extractor, so I really can't use your code as-is. But it's a start.

Can you give me an idea of how widely RTTI is used?

Can you put together a corpus of 3-4 RTTI files that I can use for tests?

What are the tools that read and write RTTI images?

Rannek commented 9 months ago

Thank you very much! Not wanting to integrate OpenCV is completely understandable. Maybe there is a more elegant solution to this. I need to further investigate it.

How widely is Raw Therapee used?

From the standpoint of how often this filetype would appear in an evidence scan scenario, it is a good question. It is definitely not as common as Windows thumbcache files, but I think every piece of evidence matters in evidence searching. No other forensic tool can search for this thumbnail type, only for common image formats (as far as I know).

Maybe it will appear in one hard drive out of 50 when searching for evidence, but if that one helps, it was worth it.

Of course, Windows thumbcache is not limited to photographers like Raw Therapee. But this could also be an advantage because if you find this type of thumbnail, you can be sure that there will be a lot of thumbnail files (photographers usually have a lot of pictures).

One additional advantage is that regular file cleaners (CCleaner, Windows built-in cleaner) clear the thumbcache folder, but they miss this folder. This is not true on Linux, though. (BleachBit cleans the .cache folder)

What are the tools that read and write RTTI images?

Only Raw Therapee uses it. It builds a thumbnail folder so the next time the user opens the program, it does not need to generate the thumbnails again (similar to Windows Thumbcache). It does this for every image displayed (even if it is not edited).

I placed 4 .rtti files in the rtti_cpp/rtti_testfiles/ folder in their unmodified format (as the program outputs them) in different aspect ratios for testing.

You can process them one by one, or you can even cat them into one file and the program will still reads them.

simsong commented 9 months ago

Got it. Okay, I'll add it in.

Rannek commented 9 months ago

Thank you very much! If i can assist you in any way, please let me know.

Rannek commented 9 months ago

I found something useful. I discovered that the thumbnail images are actually PPM files. Wikipedia.

If I rewrite the header to a PPM header, it will become a PPM image. So, you actually don't need OpenCV or anything else. The dimensions are stored in little endian after the newline character. I think I overcomplicated this a bit.

For example:

496d61676538 0a 8002 0000 2003 0000 1a Image8..... ....

Becomes:

After conversion, the header:

5036 0a36 3430 2038 3030 0a32 3535 0a1a P6.640 800.255..

After that, you can convert the .ppm file to .jpg with the convert program from ImageMagick or better ways. Maybe this helps.

simsong commented 9 months ago

Yay! That's great research. Do you want to change your program and see if it still works? DO you wish to add tests in your program?

Rannek commented 9 months ago

Thank you!

Yes, I will try to rewrite my program to be a simple header replacement, so it does not need OpenCV at all.

Also, I will try to find a way to convert the output to JPG without too much over complication and external libraries. A lot of image editors and viewers support PPM by default, so it's a bonus step.

Yes, I'm planning to add tests to my program. I want to first ensure that I have found the most simple and effective solution. Do you have any recommendations?

I will update this thread soon with my progress.

simsong commented 9 months ago

Check out the bulk extractor source code. You will need to refractor your code to support unit testing


On Thu, Feb 8, 2024 at 2:59 PM Rannek @.***> wrote:

Thank you!

Yes, I will try to rewrite my program to be a simple header replacement, so it does not need OpenCV at all.

Also, I will try to find a way to convert the output to JPG without too much over complication and external libraries. A lot of image editors and viewers support PPM by default, so it's a bonus step.

Yes, I'm planning to add tests to my program. I want to first ensure that I have found the most simple and effective solution. Do you have any recommendations?

I will update this thread soon with my progress.

— Reply to this email directly, view it on GitHub https://github.com/simsong/bulk_extractor/issues/460#issuecomment-1934843974, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMFHLBW4ZDW2QKYQQ6OV5DYSUVCBAVCNFSM6AAAAABC4AJDX2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZUHA2DGOJXGQ . You are receiving this because you commented.Message ID: @.***>

Rannek commented 9 months ago

I revised my code. Now, it functions without OpenCV, and the output files are in .bmp format. It operates the same as before. https://github.com/Rannek/rtti_cpp/

Now, I need to figure out how to convert it into a bulk_extractor module. If I understand correctly, the basic logic involves changing std::ifstream to the bulk_extractor stream. I need to review the bulk_extractor code more thoroughly.

simsong commented 9 months ago

Correct. Look at the jpeg carver.


On Fri, Feb 9, 2024 at 4:08 AM Rannek @.***> wrote:

I revised my code. Now, it functions without OpenCV, and the output files are in .bmp format. It operates the same as before. https://github.com/Rannek/rtti_cpp/

Now, I need to figure out how to convert it into a bulk_extractor module. If I understand correctly, the basic logic involves changing std::ifstream to the bulk_extractor stream. I need to review the bulk_extractor code more thoroughly.

— Reply to this email directly, view it on GitHub https://github.com/simsong/bulk_extractor/issues/460#issuecomment-1935814853, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMFHLGMCG42RDZY2JLE2UTYSYGTBAVCNFSM6AAAAABC4AJDX2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZVHAYTIOBVGM . You are receiving this because you commented.Message ID: @.***>