sepinf-inc / IPED

IPED Digital Forensic Tool. It is an open source software that can be used to process and analyze digital evidence, often seized at crime scenes by law enforcement or in a corporate investigation by private examiners.
Other
884 stars 209 forks source link

Fix metadata error (#2110) #2111

Closed wladimirleite closed 4 months ago

wladimirleite commented 4 months ago

Fixes #2110.

This was a tricky issue. First, the provided file was carved as a video, but it was actually an audio file. In MetadataUtil, it received both "audio:" and "video:" prefix because "Indexer-Content-Type" = "video/mp4" and "Content-Type" = "audio/mp4". In the code, checking if it's a video uses both properties ("Indexer-Content-Type" and "Content-Type"). If at least one of them starts with "video" it receives the "video:" prefix. Audio checks only the second property.

I kept this behavior (video check both properties), but changed the code to accept just a single prefix (among "video:", "audio:", "image:" and "pdf:"). This was enough to solve the issue.

I also refined the header signatures used by MOVCarver, so it handles "audio/mp4", so files like the provided one would be carved as audios.

lfcnassif commented 4 months ago

In MetadataUtil, it received both "audio:" and "video:" prefix because "Indexer-Content-Type" = "video/mp4" and "Content-Type" = "audio/mp4".

About different values into those properties, here is a brief background: Indexer-Content-Type was created into IPED because old versions of Tika used Content-Type to store detection results, but also as a hint about the MediaType before detection, like hints returned by HTTP response headers and other unreliable sources, so it is not always correct. Recent versions of Tika use a new Content-Type-Hint, created after I suggested it. On the other hand... some parsers can overwrite the Content-Type key with a much better value after deeply parsing the file, even better than custom coded detectors (usually simpler than parsers), seems that's the case with MP4Parser...

The problem is, after parsing, possibly Content-Type could have better values, but categorization and other tasks depending on the Mediatype, like VideoThumbTask, were already run... If we change the value, it would cause some inconsistencies about the detected type and executed modules. I don't know how to fix that, but I wouldn't like to run tasks again...

lfcnassif commented 4 months ago

Just a quick question @wladimirleite, I saw you restricted the ftypmp42 signature to test later values. We are carving the same number of MP4 files, right? If not (or if you are not sure), maybe we could keep the new signatures and the old one leaving the mimeType empty in the old one, deferring the final mime detection to the signature detection module.

wladimirleite commented 4 months ago

Good question! I missed the option to leave the mimeType empty. What I did was to export all MP4 (carved or not) from a few cases I have here and wrote a quick program to count the number of different signatures. The new restricted signature covered all the samples. I tried to check online resources about it, but didn't come to a conclusion if the restricted signature would really cover all cases.

I will try leaving the mime empty with the samples I have and check the results.

lfcnassif commented 4 months ago

Great! So I'm confident it is fine.

wladimirleite commented 4 months ago

Great! So I'm confident it is fine.

I am running now with a larger set of files (LED database). Let's see if I can find a counterexample.

wladimirleite commented 4 months ago

I am running now with a larger set of files (LED database). Let's see if I can find a counterexample.

I am sorry, should have done this before. Found 3 other signatures "M4V", "iso" and "ndh". Although it would be possible to add them, the empty mime seems a much safer option. I am running a large test with it and will update here.