sepinf-inc / IPED

IPED Digital Forensic Tool. It is an open source software that can be used to process and analyze digital evidence, often seized at crime scenes by law enforcement or in a corporate investigation by private examiners.
Other
924 stars 217 forks source link

Whatsapp quote messages #1920

Closed gfd2020 closed 8 months ago

gfd2020 commented 10 months ago

Implementation of WhatsApp quote messages. Android Old DB - OK ( carved quotes fully implemented ) IPHONE - OK ( carved quotes NOT fully implemented ) Android New DB - OK ( carved quotes fully implemented )

lfcnassif commented 10 months ago

Thank you @gfd2020!

lfcnassif commented 10 months ago

Hi @gfd2020, the medium size regression test didn't finish yet, maybe they are expected, but I got many traces like this in the log:

java.sql.SQLException: Column quoted not found. no such column
    at iped.parsers.sqlite.SQLiteUndeleteTableResultSetAdapter.getMappedColumnName(SQLiteUndeleteTableResultSetAdapter.java:76) ~[iped-parsers-impl-4.2-snapshot.jar:?]
    at iped.parsers.sqlite.SQLiteUndeleteTableResultSetAdapter.getInt(SQLiteUndeleteTableResultSetAdapter.java:377) ~[iped-parsers-impl-4.2-snapshot.jar:?]
    at iped.parsers.whatsapp.ExtractorAndroid.createMessageFromDBRow(ExtractorAndroid.java:531) ~[iped-parsers-impl-4.2-snapshot.jar:?]
    at iped.parsers.whatsapp.ExtractorAndroid.extractMessages(ExtractorAndroid.java:383) [iped-parsers-impl-4.2-snapshot.jar:?]
    at iped.parsers.whatsapp.ExtractorAndroid.extractChatList(ExtractorAndroid.java:219) [iped-parsers-impl-4.2-snapshot.jar:?]
    at iped.parsers.whatsapp.Extractor.getChatList(Extractor.java:34) [iped-parsers-impl-4.2-snapshot.jar:?]
    at iped.parsers.whatsapp.WhatsAppParser.extractChatList(WhatsAppParser.java:537) [iped-parsers-impl-4.2-snapshot.jar:?]
    at iped.parsers.whatsapp.WhatsAppParser.parseDB(WhatsAppParser.java:407) [iped-parsers-impl-4.2-snapshot.jar:?]
    at iped.parsers.whatsapp.WhatsAppParser.parseAndCheckIfIsMainDb(WhatsAppParser.java:469) [iped-parsers-impl-4.2-snapshot.jar:?]
    at iped.parsers.whatsapp.WhatsAppParser.parse(WhatsAppParser.java:250) [iped-parsers-impl-4.2-snapshot.jar:?]
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298) [tika-core-2.4.0-p1.jar:2.4.0]
    at iped.parsers.standard.StandardParser.parse(StandardParser.java:245) [iped-parsers-impl-4.2-snapshot.jar:?]
    at iped.engine.io.ParsingReader$BackgroundParsing.run(ParsingReader.java:247) [iped-engine-4.2-snapshot.jar:?]
    at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) [?:?]
    at java.util.concurrent.FutureTask.run(Unknown Source) [?:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
    at java.lang.Thread.run(Unknown Source) [?:?]
gfd2020 commented 10 months ago

Hi @gfd2020, the medium size regression test didn't finish yet, maybe they are expected, but I got many traces like this in the log:

Hi @lfcnassif . This is odd. I have a DB without message_quotes table and work fine. Did you sync the last version of PR?

EDIT: It seems that the database has the message_quotes table but when searching for deleted data it doesn't... I think I have to map it in MESSAGES_TABLE_COL_MAP

Can you share a single database with this case?

lfcnassif commented 10 months ago

Did you sync the last version of PR?

Yes.

Can you share a single database with this case?

Yes, I'll search for it soon and send to you.

lfcnassif commented 10 months ago

Hi @gfd2020, I merged the older #1889 and, as I anticipated, there are merge conflicts to resolve here. When you are done, please push all your changes and let me know, so I'll try to resolve them.

gfd2020 commented 10 months ago

hi @lfcnassif , just finished the changes. I think that now is robust.

lfcnassif commented 10 months ago

@gfd2020 please don't push more commits, I'm trying to resolve conflicts right now, 6f14e31 caused more conflicts after I resolved the previous ones...

lfcnassif commented 10 months ago

Just pushed to avoid more merge conflicts, please pull the changes before making more changes. I'm running a regression test to check if I introduced errors while merging, it was more difficult than I thought...

lfcnassif commented 10 months ago

I'm running a regression test to check if I introduced errors while merging, it was more difficult than I thought...

It finished. There are some exceptions like this in the log:

org.apache.tika.exception.TikaException: WAExtractorException Exception
    at iped.parsers.whatsapp.WhatsAppParser.parseDB(WhatsAppParser.java:414)
    at iped.parsers.whatsapp.WhatsAppParser.parseAndCheckIfIsMainDb(WhatsAppParser.java:469)
    at iped.parsers.whatsapp.WhatsAppParser.parse(WhatsAppParser.java:250)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
    at iped.parsers.standard.StandardParser.parse(StandardParser.java:245)
    at iped.engine.task.MakePreviewTask$1.run(MakePreviewTask.java:181)
Caused by: iped.parsers.whatsapp.WAExtractorException: java.sql.SQLException: no such column: 'forwarded'
    at iped.parsers.whatsapp.ExtractorAndroid.extractChatList(ExtractorAndroid.java:256)
    at iped.parsers.whatsapp.Extractor.getChatList(Extractor.java:34)
    at iped.parsers.whatsapp.WhatsAppParser.extractChatList(WhatsAppParser.java:537)
    at iped.parsers.whatsapp.WhatsAppParser.parseDB(WhatsAppParser.java:407)
    ... 5 more
Caused by: java.sql.SQLException: no such column: 'forwarded'
    at org.sqlite.jdbc3.JDBC3ResultSet.findColumn(JDBC3ResultSet.java:49)
    at org.sqlite.jdbc3.JDBC3ResultSet.getInt(JDBC3ResultSet.java:400)
    at iped.parsers.whatsapp.ExtractorAndroid.createMessageFromDBRow(ExtractorAndroid.java:549)
    at iped.parsers.whatsapp.ExtractorAndroid.extractMessages(ExtractorAndroid.java:359)
    at iped.parsers.whatsapp.ExtractorAndroid.extractChatList(ExtractorAndroid.java:217)
    ... 8 more

It doesn't seem related to the merge since ExtractorAndroid class didn't cause merge conflicts.

lfcnassif commented 10 months ago

I would also appreciate if you could take a look if I messed up the quoted messages logic after the merge.

gfd2020 commented 10 months ago

It doesn't seem related to the merge since ExtractorAndroid class didn't cause merge conflicts.

The cause is whatsapp quote messages PR. I just find the issue. I will make the fix commit.

gfd2020 commented 10 months ago

I would also appreciate if you could take a look if I messed up the quoted messages logic after the merge.

Hi, @lfcnassif I make the fix commit. Please, ran the regression tests again.

In my tests, it seems that they were correct after the merge. I would like to ask you not to publish this feature for now. I wanted to double-check the queries in case of carved messages. And it may take a while.

Semi-off. It seems that in iOS conversations, there is a division of chats by groups and users. However, it seems that sometimes, some groups send messages to users, so it does not appear in conversations. I noticed this when seeing some users' message quotes.

PS: See issue 1923, please. Can we take advantage of this PR and implement it?

lfcnassif commented 10 months ago

I make the fix commit. Please, ran the regression tests again.

Sure! I'll run the small regression test again. When this is really finished, I'll run the large one.

I would like to ask you not to publish this feature for now. I wanted to double-check the queries in case of carved messages. And it may take a while.

Don't worry, this probably will be put into 4.2, to be released hopefully in November.

PS: See issue 1923, please. Can we take advantage of this PR and implement it?

I think so. Since it also deals with the same classes, it would avoid more merge conflicts.

lfcnassif commented 10 months ago

I'll run the small regression test again.

The small regression test on 22 DBs seems good now, thank you!

lfcnassif commented 10 months ago

I left the large test on 300 DBs running last night, it finished with 77 timeouts. But I think master also gives a few timeouts, I'll run with master and compare results.

wladimirleite commented 10 months ago

There is some kind of concurrency at SQLite level, when a process accesses many different databases. I was able to reproduce the issue in a standalone program, but couldn't find a solution or a workaround. I had to reduce the number of IPED threads to run tests with several hundred WhatsApp databases.

lfcnassif commented 10 months ago

I had to reduce the number of IPED threads to run tests with several hundred WhatsApp databases.

How many did you use @tc-wleite?

wladimirleite commented 10 months ago

I used only 8 threads and increased parsing timeout, as I was trying to compare extracted messages (not speed). It was enough to avoid timeouts in my tests, but didn't try other setups.

lfcnassif commented 10 months ago

I left the large test on 300 DBs running last night, it finished with 77 timeouts. But I think master also gives a few timeouts, I'll run with master and compare results.

Processing with master, I got just 1 timeout, using all 48 cores from my computer. Obviously, it recovered much more messages. I'll run the test again with this PR. And actually there are 170 DBs in this data set, not 300.

gfd2020 commented 10 months ago

Processing with master, I got just 1 timeout, using all 48 cores from my computer. Obviously, it recovered much more messages. I'll run the test again with this PR. And actually there are 170 DBs in this data set, not 300.

Does this test you did consider the optimizations you integrated in this PR?

lfcnassif commented 10 months ago

Does this test you did consider the optimizations you integrated in this PR?

Yes, it is running again.

lfcnassif commented 10 months ago

Yes, it is running again.

Until now, it triggered 6 timeouts. It is taking a long time to parse a 4GB DB. I took 2 thread dumps yesterday in the first test, and 2 more now, all 4 dumps show the execution is/was at this point:

   java.lang.Thread.State: RUNNABLE
        at org.sqlite.core.NativeDB.step(Native Method)
        - locked <0x0000001048053878> (a org.sqlite.core.NativeDB)
        at org.sqlite.core.DB.execute(DB.java:851)
        - locked <0x0000001048053878> (a org.sqlite.core.NativeDB)
        at org.sqlite.jdbc3.JDBC3PreparedStatement.executeQuery(JDBC3PreparedStatement.java:77)
        at iped.parsers.whatsapp.ExtractorAndroid.extractMessages(ExtractorAndroid.java:357)
        at iped.parsers.whatsapp.ExtractorAndroid.extractChatList(ExtractorAndroid.java:217)
        at iped.parsers.whatsapp.Extractor.getChatList(Extractor.java:34)
        at iped.parsers.whatsapp.WhatsAppParser.extractChatList(WhatsAppParser.java:537)
        at iped.parsers.whatsapp.WhatsAppParser.parseDB(WhatsAppParser.java:407)
        at iped.parsers.whatsapp.WhatsAppParser.parseAndCheckIfIsMainDb(WhatsAppParser.java:469)
        at iped.parsers.whatsapp.WhatsAppParser.parse(WhatsAppParser.java:250)
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
        at iped.parsers.standard.StandardParser.parse(StandardParser.java:245)
        at iped.engine.io.ParsingReader$BackgroundParsing.run(ParsingReader.java:247)
        at java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.13/Unknown Source)
        at java.util.concurrent.FutureTask.run(java.base@11.0.13/Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.13/Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.13/Unknown Source)
        at java.lang.Thread.run(java.base@11.0.13/Unknown Source)

Seems the query for quoted messages is the current bottleneck, is it possible to optimize it?

lfcnassif commented 10 months ago

It is taking a long time to parse a 4GB DB

It timed out, probably in the same point I posted above. It was also being processed alone, so I think there is no SQLite native synchronization bottleneck in this specific case.

gfd2020 commented 10 months ago

Seems the query for quoted messages is the current bottleneck, is it possible to optimize it?

I think so. All timouts are in old whatsapp? what type? NO_THUMBS_TABLE or THUMBS_TABLE?

lfcnassif commented 10 months ago

All timouts are in old whatsapp?

Took a look at some timed out samples, seems most of them are from the new Android WhatsApp schema.

lfcnassif commented 10 months ago

Until now, it triggered 6 timeouts.

It finished, this time there were a total 56 timeouts from 170 DBs.

gfd2020 commented 10 months ago

It finished, this time there were a total 56 timeouts from 170 DBs.

Ok. I am taking a look at the whatsapp android new db query. I think it can be optimized.

gfd2020 commented 10 months ago

Just push a fix to optimize android New. Android old is a little more tricky.

lfcnassif commented 10 months ago

Just push a fix to optimize android New.

Great! I'll kick off another run.

gfd2020 commented 10 months ago

Just push Android Old DB. Please, run full tests later ... thank you.

lfcnassif commented 10 months ago

Hi @gfd2020! Just to let you know, with last commits, just one timeout happened! Tested twice. It also happens with master on the same DB, I will take a closer look into it. The parsing speed was 15% slower in average, what seems fine to me since the parser is doing more work. Thank you very much for the optimizations!

Another detail, I got a bit more messages than master, is it expected? Interestingly the last 2 runs with this PR recovered a different number of messages between them. I'll check if they came from the timed out DB. If yes, that should be fine, if not, that would be weird.

gfd2020 commented 10 months ago

Hi @gfd2020! Just to let you know, with last commits, just one timeout happened! Tested twice. It also happens with master on the same DB, I will take a closer look into it. The parsing speed was 15% slower in average, what seems fine to me since the parser is doing more work. Thank you very much for the optimizations!

Oh! That's good news.

Another detail, I got a bit more messages than master, is it expected? Interestingly the last 2 runs with this PR recovered a different number of messages between them. I'll check if they came from the timed out DB. If yes, that should be fine, if not, that would be weird.

This is really strange, as they should be the same messages, just with added replies...

But, What do you mean more messages? More instant messages or more chats? Because if there are more chats, it is normal, as there will be more html for the responses, so there may be more partitioned chats.

lfcnassif commented 10 months ago

More instant messages...

gfd2020 commented 10 months ago

More instant messages...

That is strange. In my test ( 4 DBs) all have the same. Can you filter and find the strange one?

lfcnassif commented 10 months ago

Can you filter and find the strange one?

Yes, I'll take a closer look soon.

gfd2020 commented 10 months ago

I will try to implement interactive messages in this PR as well. I'll need a little more time.

lfcnassif commented 10 months ago

I will try to implement interactive messages in this PR as well. I'll need a little more time.

Wouldn't it be better to implement it in a separate PR after this is merged to don't delay this feature? I was waiting for the check below to continue and to finish the review:

I would like to ask you not to publish this feature for now. I wanted to double-check the queries in case of carved messages. And it may take a while.

Let me know if you still plan to check it or not, or if you already did.

And sorry for not answering your last question, I forgot it. I'll try to check it tomorrow or in the next days.

gfd2020 commented 10 months ago

Wouldn't it be better to implement it in a separate PR after this is merged to don't delay this feature? I was waiting for the check below to continue and to finish the review:

Ok. I leave it for another PR.

Let me know if you still plan to check it or not, or if you already did.

Not yet. I'm still on vacation this week. But I'll take a look until Friday.

And sorry for not answering your last question, I forgot it. I'll try to check it tomorrow or in the next days.

Please. See which types of dbs (android or iphone) did not have the same number of instant messages.

gfd2020 commented 9 months ago

Hi @lfcnassif , Unfortunately I wasn't able to review last week. I'm already looking at it now. Did you redo that duplicate message test?

lfcnassif commented 9 months ago

Did you redo that duplicate message test?

Sorry @gfd2020, still not yet. I plan to do this week...

gfd2020 commented 9 months ago

Did you redo that duplicate message test?

Sorry @gfd2020, still not yet. I plan to do this week...

Ok @lfcnassif . Take your time.

I finished the sqls review. They seem to be ok ( I did a small update to get 2 more fields on android new). There are some raw texts in reportgenerator. I would do the localization but I don't know how to do it in all languages. Do I only do Portuguese and English?

lfcnassif commented 9 months ago

I finished the sqls review. They seem to be ok ( I did a small update to get 2 more fields on android new).

Thanks @gfd2020!

There are some raw texts in reportgenerator. I would do the localization but I don't know how to do it in all languages. Do I only do Portuguese and English?

Just copy the English localization to the non Portuguese ones and add the [TBT] suffix to them, to flag they should be sent to foreign collaborators for translation later.

gfd2020 commented 9 months ago

Hi @lfcnassif . I just finished localization fix. I think that is ready do review ( except for the duplicate instant messages ...).

PS: If the quoted message is on another chat page, it will not be able to be marked when clicked. I think it's a small bug that won't occur very often and if fixed it will take a lot of effort.

lfcnassif commented 9 months ago

Hi @lfcnassif . I just finished localization fix.

Thanks @gfd2020!

PS: If the quoted message is on another chat page, it will not be able to be marked when clicked. I think it's a small bug that won't occur very often and if fixed it will take a lot of effort.

I don't think it's a bug, but a limitation. Actually, I thought in an approach to make it work: we could set an ID (or UUID) into each chat page metadata to allow it to be referenced by another page, so each replay message could have a page reference if the original message is in a different page. Switching to another page may not be what user wants, so we could ask him to confirm the action. Or we could simply show a message warning the original message is in a different page and do nothing. What do you think?

gfd2020 commented 9 months ago

I don't think it's a bug, but a limitation. Actually, I thought in an approach to make it work: we could set an ID (or UUID) into each chat page metadata to allow it to be referenced by another page, so each replay message could have a page reference if the original message is in a different page. Switching to another page may not be what user wants, so we could ask him to confirm the action. Or we could simply show a message warning the original message is in a different page and do nothing. What do you think?

Your idea is good to put an ID on the page. I think the best thing is to just warn the user and do nothing. Remembering that deleted quoted messages also do not execute the link. But would the message be an alert in javascript?

lfcnassif commented 9 months ago

But would the message be an alert in javascript?

Yes, could be, with the proper localization, I think the Map already does this...

gfd2020 commented 9 months ago

But would the message be an alert in javascript?

Yes, could be, with the proper localization, I think the Map already does this...

Hi @lfcnassif . I make it work but internal viewer javafx doesnt show alerts. In browser it will show as bellow:

image

lfcnassif commented 9 months ago

The javascript method alert('xxx') doesn't work? Sorry, I didn't remember that... Another approach is calling a java function from javascript side to display a java dialog, like done by HtmlLinkViewer.AttachmentHandler inner class, which shows a message dialog when a clicked attachment is not found.

gfd2020 commented 9 months ago

I get this template and make it work on iped. It is a html, css, javascript inner modal alert. Very cool. https://www.cssscript.com/alert-confirm-modal-prompt/

It has confirm options too ... What do you think?

image

lfcnassif commented 9 months ago

Seems very nice!