sepinf-inc / IPED

IPED Digital Forensic Tool. It is an open source software that can be used to process and analyze digital evidence, often seized at crime scenes by law enforcement or in a corporate investigation by private examiners.
Other
948 stars 218 forks source link

Link modified WhatsApp images by (long) path #2044

Closed wladimirleite closed 7 months ago

wladimirleite commented 9 months ago

While working on a real case, I noticed that some images that I could see on the WhatsApp chats, looking directly in the device, were not present in the chat HTML generated by the internal parser. Analyzing the situation, I found out that these unmatched images have a different hash and file size than what is recorded in the database. All these unmatched files are a bit smaller than what is in the DB, so it seems some kind of compression was applied. It affects (114 images out of 1398, ~8%). These paths (in WhatsApp database) contain strings like Media/5511987654321@s.whatsapp.net/b/6/b6e1d31d-5a9c-4754-915c-af45b16764d7.jpg, so very long paths. Existing parser code already matches by name+size, and allows slightly different sizes (#486), but none of these fallbacks handle this case. The external parser did find all these images.

My proposal is to add one more fallback, and match WhatsApp media if the whole path matches (not just the name) and the path is long enough (to avoid false positives). The item path would need to be something like /XXX/YYY/Media/5511987654321@s.whatsapp.net/b/6/b6e1d31d-5a9c-4754-915c-af45b16764d7.jpg, i.e. it must end with the media's path. There will be a parser parameter to define the minimal length of the path to allow this kind of matching (default value 40). One thing to point out, the proposed fallback can use the query results of the existing fallback by name and approximate size, which is enabled by default. So usually no additional queries will be needed.

The evidence I am working on is an iPhone extraction. I processed other 2 UFDRs that I recently generated from iPhones, and also found the same situation.

lfcnassif commented 9 months ago

+1. Maybe the new long path fallback could come before the name + approximate_size fallback, since the current fallbacks didn't work on your real cases and I think the long path fallback may bring less false positives.

wladimirleite commented 9 months ago

There is a fallback that looks for the name and exact file size. And another one by name and approximate size. The second one (added by #486) checks other conditions (like an small difference in the file size caused by padding, and the extra bytes to be zero). It seems extremely unlikely to link with incorrect files.

I will take a closer look to see if I can rearrange their order. I was trying to minimize the changes in existing code (and behavior), so adding as the last fallback was my first choice.

lfcnassif commented 9 months ago

Adding as last fallback is good @wladimirleite! I remembered about padding, but forgot about the check for zeroed bytes, seems very unlikely to return false positives! Please forget my suggestion and thank you for adding this new fallback!