sepinf-inc / IPED

IPED Digital Forensic Tool. It is an open source software that can be used to process and analyze digital evidence, often seized at crime scenes by law enforcement or in a corporate investigation by private examiners.
Other
886 stars 211 forks source link

WhatsApp Enhancements #2048

Closed wladimirleite closed 4 months ago

wladimirleite commented 5 months ago

Closes #1725, #1923, #2030, #2031, #2035, #2036 and #2044.

Running a large test with ~1.8K WhatsApp databases (~1.7M chats, with ~157M messages), I found out that the number of "unknown messages" dropped from ~1.956M to ~927K (less than half). Note that there are many Android "old" databases, which use a different data model and were not my focus, as that model is not used anymore on recent devices, so the reduction factor for currently used databases (iOS and Android) is larger that these overall numbers.

There are too many different types of messages and different codes/database models used for Android and IOS, so #1923 can be further improved, but as there are too many changes in this PR, I am stopping here.

Basically there are 4 ways to find out how to deal with these unknown messages:

aberenguel commented 5 months ago

Great job, @wladimirleite !!!

lfcnassif commented 5 months ago

Awesome work @wladimirleite! Thank you very much!

wladimirleite commented 5 months ago

I just added a performance improvement for quote messages (see https://github.com/sepinf-inc/IPED/issues/1916#issuecomment-1894583484), which would affect only very large databases with a lot of chats.

wladimirleite commented 5 months ago

I am working with a few other iOS devices, so I am checking for unknown messages and implementing the support to them in this opened PR, while review doesn't start (I know that there other PR's to be reviewed before this one).

gfd2020 commented 5 months ago

@wladimirleite , There is still a part of the iPhone quoted messages that is not implemented. This is when the message reference is not found. Could you take a look? I haven't managed to understand the logic yet, I think you can. It's on line 388 of the ExtractorIOS.java file.

wladimirleite commented 5 months ago

@wladimirleite , There is still a part of the iPhone quoted messages that is not implemented. This is when the message reference is not found. Could you take a look? I haven't managed to understand the logic yet, I think you can. It's on line 388 of the ExtractorIOS.java file.

Sure! I will take a closer look on that. I have a couple of iPhone extractions with devices at hand, which makes verification much easier and reliable. The encoded fields in BLOBs are an odd choice. In Android DB, data is better structured, using new tables and columns to store information related to new features.

wladimirleite commented 5 months ago

@wladimirleite , There is still a part of the iPhone quoted messages that is not implemented. This is when the message reference is not found. Could you take a look? I haven't managed to understand the logic yet, I think you can. It's on line 388 of the ExtractorIOS.java file.

@gfd2020, I started looking into this. I noticed that some quote messages metadata does not start with 0x2A (e.g. when you send an image quoting some other message). I am reviewing this condition a bit, using the "full decoder" of the metadata bytes.

gfd2020 commented 5 months ago

@wladimirleite , There is still a part of the iPhone quoted messages that is not implemented. This is when the message reference is not found. Could you take a look? I haven't managed to understand the logic yet, I think you can. It's on line 388 of the ExtractorIOS.java file.

@gfd2020, I started looking into this. I noticed that some quote messages metadata does not start with 0x2A (e.g. when you send an image quoting some other message). I am reviewing this condition a bit, using the "full decoder" of the metadata bytes.

hi @wladimirleite . Yes, I guess some cases don't really start with 0x2a...

gfd2020 commented 5 months ago

Hi @wladimirleite , It looks like the quote messages are complete now, very cool!

wladimirleite commented 5 months ago

Hi @wladimirleite , It looks like the quote messages are complete now, very cool!

Thanks! I am still polishing some details, but I guess it is almost finished. As you noticed, there are too many details and corner cases. You did a excellent job finding all the information and implementing the quote message support. It is much more complex than it looks (when you are using the app).

gfd2020 commented 5 months ago

Hi @wladimirleite . Just a small detail. When you delete a message that was quoted, you will have to show in the UI that it was deleted. However, I believe that not always when the reference of a message is not found, it was actually deleted. Perhaps for backup reasons, you may not be able to find the reference. Therefore, in the field text I put 'recovered'...

wladimirleite commented 5 months ago

Hi @wladimirleite . Just a small detail. When you delete a message that was quoted, you will have to show in the UI that it was deleted. However, I believe that not always when the reference of a message is not found, it was actually deleted. Perhaps for backup reasons, you may not be able to find the reference. Therefore, in the field text I put 'recovered'...

Hi @gfd2020! First, I changed it to use an specific message (not the generic "recovered") so these messages can be found easily searching for the message. I am not sure if "recovered" would be the best term, as the data we are presenting comes from an active record of the database and it does not contain the whole quoted message, just some fields that allow the application to render it. And we are not trying to actually recover the original quoted message in the sense of adding it to the chat.

Currently the message is "Quoted message deleted". We can change the message to something else. Any suggestion? Maybe also use "Quoted message not found" for this situation. In the devices that I am using to check, there are 2 situations: the quoted message was actually deleted or it is older than the first chat message (user changed the device and did not import older messages).

gfd2020 commented 5 months ago

Hi @gfd2020! First, I changed it to use an specific message (not the generic "recovered") so these messages can be found easily searching for the message. I am not sure if "recovered" would be the best term, as the data we are presenting comes from an active record of the database and it does not contain the whole quoted message, just some fields that allow the application to render it. And we are not trying to actually recover the original quoted message in the sense of adding it to the chat.

Ok, I agree.

Currently the message is "Quoted message deleted". We can change the message to something else. Any suggestion? Maybe also use "Quoted message not found" for this situation. In the devices that I am using to check, there are 2 situations: the quoted message was actually deleted or it is older than the first chat message (user changed the device and did not import older messages).

I think these are the cases. Saying it was deleted may not actually be the case. Maybe it's better as you suggested... "Quoted message not found". I'm also in doubt...

Unless there is a way to track whether the message was actually deleted. I don't know if there is a flag in the database...

wladimirleite commented 5 months ago

I think these are the cases. Saying it was deleted may not actually be the case. Maybe it's better as you suggested... "Quoted message not found". I'm also in doubt...

Let's go with "Quoted message not found". It is more generic, but seems good enough at this point.

Unless there is a way to track whether the message was actually deleted. I don't know if there is a flag in the database...

Good point! I will try to find out.

gfd2020 commented 5 months ago

Hi @wladimirleite , Another observation. In android's message_template, wouldn't it be interesting to concatenate the footer_text_data column together with the content_text_data? In the examples I have, the footer_text_data is null, but there may be someone who implemented this data and it will be missing... image

gfd2020 commented 5 months ago

Hi @wladimirleite . Sorry to bother you again. See that underline? Is it the expected layout?

image

wladimirleite commented 5 months ago

Hi @wladimirleite . Sorry to bother you again.

Not at all! It is always great to have a second look, especially in cases like this, which involves a large number of changes.

See that underline? Is it the expected layout?

You mean below the duration, right? No, it is not expected, but I believe I fixed that yesterday morning (https://github.com/sepinf-inc/IPED/pull/2048/commits/9b019c38e3d6b09a3c45b7c944c43a6b9115dafe). Are you using the latest version? If you are, let me know and I will try to find out what is going on. I processed a few cases last night, and the underline is not showing here.

gfd2020 commented 5 months ago

You mean below the duration, right? No, it is not expected, but I believe I fixed that yesterday morning (9b019c3). Are you using the latest version? If you are, let me know and I will try to find out what is going on. I processed a few cases last night, and the underline is not showing here.

I'm sorry, I hadn't seen it. I got the latest version and it's really ok.

lfcnassif commented 5 months ago

Hi @wladimirleite, just saw you marked this as ready, excellent! Thank you very much!

@gfd2020 since you are already testing this, would mind to do your tests with the last commit?

gfd2020 commented 5 months ago

Hi @wladimirleite, just saw you marked this as ready, excellent! Thank you very much!

@gfd2020 since you are already testing this, would mind to do your tests with the last commit?

Hi @lfcnassif . OK, I'll do some tests.

wladimirleite commented 5 months ago

Hi @lfcnassif . OK, I'll do some tests.

I processed another iOS UFDR today, I am commiting a small change to handle yet another message types (in a few minutes). This is kind of an infinite task, so I will stop so this PR can be reviewed and (hopefully) merged.

wladimirleite commented 5 months ago

I processed another iOS UFDR today, I am commiting a small change to handle yet another message types (in a few ~minutes~ hours). This is kind of an infinite task, so I will stop so this PR can be reviewed and (hopefully) merged.

Just finished here.

gfd2020 commented 5 months ago

@wladimirleite and @lfcnassif . I did my tests and everything is ok.

wladimirleite commented 4 months ago

I added the support to other message types that I found in the last days while working with a few recently seized iPhones.

lfcnassif commented 4 months ago

I added the support to other message types that I found in the last days while working with a few recently seized iPhones.

Thanks @wladimirleite!

lfcnassif commented 4 months ago

Hi @wladimirleite, do you plan to push more improvements here?

wladimirleite commented 4 months ago

No :-)

lfcnassif commented 4 months ago

Running final regression tests to merge this...

lfcnassif commented 4 months ago

Results running on about 100 Android databases: Master: image

This PR: image

No timeouts, similar processing times, 2 parsing exceptions on both (invalid DBs). TONS of more search hits for common words. So seems we are much better, thank you @wladimirleite!

I'll run a similar test for iOS DBs.

PS: Query used in the pictures filters in just the first html report for each chat

lfcnassif commented 4 months ago

Not related to this PR, since it is also happening on master, but I noticed some weird new line breaks in the results table when executing searches, and also in the processing UI, in WA chat names: image

image

Maybe it's related to recent changes of other WA related merged PRs.

wladimirleite commented 4 months ago

Not related to this PR, since it is also happening on master, but I noticed some weird new line breaks in the results table when executing searches, and also in the processing UI, in WA chat names. Maybe it's related to recent changes of other WA related merged PRs.

I already spent sometime trying to fix this or find a workaround, but it is JDK bug. It is not related to WhatsApp, it just happens with them because many chat names have emojis, but it can be reproduced with simple files.

https://bugs.openjdk.org/browse/JDK-8269854?attachmentSortBy=fileName

wladimirleite commented 4 months ago

One workaround for WhatsApp chats (or any chats and contacts info) would be filtering out emojis (or "special" characters) in the chat name. Not sure if would be misleading to the user. It won't solve all the cases, as there may be files with such names, but it would reduce this issue a lot.

lfcnassif commented 4 months ago

Test results on ~150 iOs databases: Master: image

This PR: image

1 exception happened on master, 0 exceptions with this PR. That exception caused the database to be expanded using the fallback SQLiteParser, extracting tables and blobs classified into "Databases", "Plain Texts" and "Other Files" categories. We are also getting more search hits for common words with this PR.

So, this was a perfect work, thank you very much @wladimirleite!