Closed gfd2020 closed 10 months ago
Thank you @gfd2020!
Hi @gfd2020, the medium size regression test didn't finish yet, maybe they are expected, but I got many traces like this in the log:
java.sql.SQLException: Column quoted not found. no such column
at iped.parsers.sqlite.SQLiteUndeleteTableResultSetAdapter.getMappedColumnName(SQLiteUndeleteTableResultSetAdapter.java:76) ~[iped-parsers-impl-4.2-snapshot.jar:?]
at iped.parsers.sqlite.SQLiteUndeleteTableResultSetAdapter.getInt(SQLiteUndeleteTableResultSetAdapter.java:377) ~[iped-parsers-impl-4.2-snapshot.jar:?]
at iped.parsers.whatsapp.ExtractorAndroid.createMessageFromDBRow(ExtractorAndroid.java:531) ~[iped-parsers-impl-4.2-snapshot.jar:?]
at iped.parsers.whatsapp.ExtractorAndroid.extractMessages(ExtractorAndroid.java:383) [iped-parsers-impl-4.2-snapshot.jar:?]
at iped.parsers.whatsapp.ExtractorAndroid.extractChatList(ExtractorAndroid.java:219) [iped-parsers-impl-4.2-snapshot.jar:?]
at iped.parsers.whatsapp.Extractor.getChatList(Extractor.java:34) [iped-parsers-impl-4.2-snapshot.jar:?]
at iped.parsers.whatsapp.WhatsAppParser.extractChatList(WhatsAppParser.java:537) [iped-parsers-impl-4.2-snapshot.jar:?]
at iped.parsers.whatsapp.WhatsAppParser.parseDB(WhatsAppParser.java:407) [iped-parsers-impl-4.2-snapshot.jar:?]
at iped.parsers.whatsapp.WhatsAppParser.parseAndCheckIfIsMainDb(WhatsAppParser.java:469) [iped-parsers-impl-4.2-snapshot.jar:?]
at iped.parsers.whatsapp.WhatsAppParser.parse(WhatsAppParser.java:250) [iped-parsers-impl-4.2-snapshot.jar:?]
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298) [tika-core-2.4.0-p1.jar:2.4.0]
at iped.parsers.standard.StandardParser.parse(StandardParser.java:245) [iped-parsers-impl-4.2-snapshot.jar:?]
at iped.engine.io.ParsingReader$BackgroundParsing.run(ParsingReader.java:247) [iped-engine-4.2-snapshot.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) [?:?]
at java.util.concurrent.FutureTask.run(Unknown Source) [?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
at java.lang.Thread.run(Unknown Source) [?:?]
Hi @gfd2020, the medium size regression test didn't finish yet, maybe they are expected, but I got many traces like this in the log:
Hi @lfcnassif . This is odd. I have a DB without message_quotes table and work fine. Did you sync the last version of PR?
EDIT: It seems that the database has the message_quotes table but when searching for deleted data it doesn't... I think I have to map it in MESSAGES_TABLE_COL_MAP
Can you share a single database with this case?
Did you sync the last version of PR?
Yes.
Can you share a single database with this case?
Yes, I'll search for it soon and send to you.
Hi @gfd2020, I merged the older #1889 and, as I anticipated, there are merge conflicts to resolve here. When you are done, please push all your changes and let me know, so I'll try to resolve them.
hi @lfcnassif , just finished the changes. I think that now is robust.
@gfd2020 please don't push more commits, I'm trying to resolve conflicts right now, 6f14e31 caused more conflicts after I resolved the previous ones...
Just pushed to avoid more merge conflicts, please pull the changes before making more changes. I'm running a regression test to check if I introduced errors while merging, it was more difficult than I thought...
I'm running a regression test to check if I introduced errors while merging, it was more difficult than I thought...
It finished. There are some exceptions like this in the log:
org.apache.tika.exception.TikaException: WAExtractorException Exception
at iped.parsers.whatsapp.WhatsAppParser.parseDB(WhatsAppParser.java:414)
at iped.parsers.whatsapp.WhatsAppParser.parseAndCheckIfIsMainDb(WhatsAppParser.java:469)
at iped.parsers.whatsapp.WhatsAppParser.parse(WhatsAppParser.java:250)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
at iped.parsers.standard.StandardParser.parse(StandardParser.java:245)
at iped.engine.task.MakePreviewTask$1.run(MakePreviewTask.java:181)
Caused by: iped.parsers.whatsapp.WAExtractorException: java.sql.SQLException: no such column: 'forwarded'
at iped.parsers.whatsapp.ExtractorAndroid.extractChatList(ExtractorAndroid.java:256)
at iped.parsers.whatsapp.Extractor.getChatList(Extractor.java:34)
at iped.parsers.whatsapp.WhatsAppParser.extractChatList(WhatsAppParser.java:537)
at iped.parsers.whatsapp.WhatsAppParser.parseDB(WhatsAppParser.java:407)
... 5 more
Caused by: java.sql.SQLException: no such column: 'forwarded'
at org.sqlite.jdbc3.JDBC3ResultSet.findColumn(JDBC3ResultSet.java:49)
at org.sqlite.jdbc3.JDBC3ResultSet.getInt(JDBC3ResultSet.java:400)
at iped.parsers.whatsapp.ExtractorAndroid.createMessageFromDBRow(ExtractorAndroid.java:549)
at iped.parsers.whatsapp.ExtractorAndroid.extractMessages(ExtractorAndroid.java:359)
at iped.parsers.whatsapp.ExtractorAndroid.extractChatList(ExtractorAndroid.java:217)
... 8 more
It doesn't seem related to the merge since ExtractorAndroid class didn't cause merge conflicts.
I would also appreciate if you could take a look if I messed up the quoted messages logic after the merge.
It doesn't seem related to the merge since ExtractorAndroid class didn't cause merge conflicts.
The cause is whatsapp quote messages PR. I just find the issue. I will make the fix commit.
I would also appreciate if you could take a look if I messed up the quoted messages logic after the merge.
Hi, @lfcnassif I make the fix commit. Please, ran the regression tests again.
In my tests, it seems that they were correct after the merge. I would like to ask you not to publish this feature for now. I wanted to double-check the queries in case of carved messages. And it may take a while.
Semi-off. It seems that in iOS conversations, there is a division of chats by groups and users. However, it seems that sometimes, some groups send messages to users, so it does not appear in conversations. I noticed this when seeing some users' message quotes.
PS: See issue 1923, please. Can we take advantage of this PR and implement it?
I make the fix commit. Please, ran the regression tests again.
Sure! I'll run the small regression test again. When this is really finished, I'll run the large one.
I would like to ask you not to publish this feature for now. I wanted to double-check the queries in case of carved messages. And it may take a while.
Don't worry, this probably will be put into 4.2, to be released hopefully in November.
PS: See issue 1923, please. Can we take advantage of this PR and implement it?
I think so. Since it also deals with the same classes, it would avoid more merge conflicts.
I'll run the small regression test again.
The small regression test on 22 DBs seems good now, thank you!
I left the large test on 300 DBs running last night, it finished with 77 timeouts. But I think master also gives a few timeouts, I'll run with master and compare results.
There is some kind of concurrency at SQLite level, when a process accesses many different databases. I was able to reproduce the issue in a standalone program, but couldn't find a solution or a workaround. I had to reduce the number of IPED threads to run tests with several hundred WhatsApp databases.
I had to reduce the number of IPED threads to run tests with several hundred WhatsApp databases.
How many did you use @tc-wleite?
I used only 8 threads and increased parsing timeout, as I was trying to compare extracted messages (not speed). It was enough to avoid timeouts in my tests, but didn't try other setups.
I left the large test on 300 DBs running last night, it finished with 77 timeouts. But I think master also gives a few timeouts, I'll run with master and compare results.
Processing with master, I got just 1 timeout, using all 48 cores from my computer. Obviously, it recovered much more messages. I'll run the test again with this PR. And actually there are 170 DBs in this data set, not 300.
Processing with master, I got just 1 timeout, using all 48 cores from my computer. Obviously, it recovered much more messages. I'll run the test again with this PR. And actually there are 170 DBs in this data set, not 300.
Does this test you did consider the optimizations you integrated in this PR?
Does this test you did consider the optimizations you integrated in this PR?
Yes, it is running again.
Yes, it is running again.
Until now, it triggered 6 timeouts. It is taking a long time to parse a 4GB DB. I took 2 thread dumps yesterday in the first test, and 2 more now, all 4 dumps show the execution is/was at this point:
java.lang.Thread.State: RUNNABLE
at org.sqlite.core.NativeDB.step(Native Method)
- locked <0x0000001048053878> (a org.sqlite.core.NativeDB)
at org.sqlite.core.DB.execute(DB.java:851)
- locked <0x0000001048053878> (a org.sqlite.core.NativeDB)
at org.sqlite.jdbc3.JDBC3PreparedStatement.executeQuery(JDBC3PreparedStatement.java:77)
at iped.parsers.whatsapp.ExtractorAndroid.extractMessages(ExtractorAndroid.java:357)
at iped.parsers.whatsapp.ExtractorAndroid.extractChatList(ExtractorAndroid.java:217)
at iped.parsers.whatsapp.Extractor.getChatList(Extractor.java:34)
at iped.parsers.whatsapp.WhatsAppParser.extractChatList(WhatsAppParser.java:537)
at iped.parsers.whatsapp.WhatsAppParser.parseDB(WhatsAppParser.java:407)
at iped.parsers.whatsapp.WhatsAppParser.parseAndCheckIfIsMainDb(WhatsAppParser.java:469)
at iped.parsers.whatsapp.WhatsAppParser.parse(WhatsAppParser.java:250)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
at iped.parsers.standard.StandardParser.parse(StandardParser.java:245)
at iped.engine.io.ParsingReader$BackgroundParsing.run(ParsingReader.java:247)
at java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.13/Unknown Source)
at java.util.concurrent.FutureTask.run(java.base@11.0.13/Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.13/Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.13/Unknown Source)
at java.lang.Thread.run(java.base@11.0.13/Unknown Source)
Seems the query for quoted messages is the current bottleneck, is it possible to optimize it?
It is taking a long time to parse a 4GB DB
It timed out, probably in the same point I posted above. It was also being processed alone, so I think there is no SQLite native synchronization bottleneck in this specific case.
Seems the query for quoted messages is the current bottleneck, is it possible to optimize it?
I think so. All timouts are in old whatsapp? what type? NO_THUMBS_TABLE or THUMBS_TABLE?
All timouts are in old whatsapp?
Took a look at some timed out samples, seems most of them are from the new Android WhatsApp schema.
Until now, it triggered 6 timeouts.
It finished, this time there were a total 56 timeouts from 170 DBs.
It finished, this time there were a total 56 timeouts from 170 DBs.
Ok. I am taking a look at the whatsapp android new db query. I think it can be optimized.
Just push a fix to optimize android New. Android old is a little more tricky.
Just push a fix to optimize android New.
Great! I'll kick off another run.
Just push Android Old DB. Please, run full tests later ... thank you.
Hi @gfd2020! Just to let you know, with last commits, just one timeout happened! Tested twice. It also happens with master on the same DB, I will take a closer look into it. The parsing speed was 15% slower in average, what seems fine to me since the parser is doing more work. Thank you very much for the optimizations!
Another detail, I got a bit more messages than master, is it expected? Interestingly the last 2 runs with this PR recovered a different number of messages between them. I'll check if they came from the timed out DB. If yes, that should be fine, if not, that would be weird.
Hi @gfd2020! Just to let you know, with last commits, just one timeout happened! Tested twice. It also happens with master on the same DB, I will take a closer look into it. The parsing speed was 15% slower in average, what seems fine to me since the parser is doing more work. Thank you very much for the optimizations!
Oh! That's good news.
Another detail, I got a bit more messages than master, is it expected? Interestingly the last 2 runs with this PR recovered a different number of messages between them. I'll check if they came from the timed out DB. If yes, that should be fine, if not, that would be weird.
This is really strange, as they should be the same messages, just with added replies...
But, What do you mean more messages? More instant messages or more chats? Because if there are more chats, it is normal, as there will be more html for the responses, so there may be more partitioned chats.
More instant messages...
More instant messages...
That is strange. In my test ( 4 DBs) all have the same. Can you filter and find the strange one?
Can you filter and find the strange one?
Yes, I'll take a closer look soon.
I will try to implement interactive messages in this PR as well. I'll need a little more time.
I will try to implement interactive messages in this PR as well. I'll need a little more time.
Wouldn't it be better to implement it in a separate PR after this is merged to don't delay this feature? I was waiting for the check below to continue and to finish the review:
I would like to ask you not to publish this feature for now. I wanted to double-check the queries in case of carved messages. And it may take a while.
Let me know if you still plan to check it or not, or if you already did.
And sorry for not answering your last question, I forgot it. I'll try to check it tomorrow or in the next days.
Wouldn't it be better to implement it in a separate PR after this is merged to don't delay this feature? I was waiting for the check below to continue and to finish the review:
Ok. I leave it for another PR.
Let me know if you still plan to check it or not, or if you already did.
Not yet. I'm still on vacation this week. But I'll take a look until Friday.
And sorry for not answering your last question, I forgot it. I'll try to check it tomorrow or in the next days.
Please. See which types of dbs (android or iphone) did not have the same number of instant messages.
Hi @lfcnassif , Unfortunately I wasn't able to review last week. I'm already looking at it now. Did you redo that duplicate message test?
Did you redo that duplicate message test?
Sorry @gfd2020, still not yet. I plan to do this week...
Did you redo that duplicate message test?
Sorry @gfd2020, still not yet. I plan to do this week...
Ok @lfcnassif . Take your time.
I finished the sqls review. They seem to be ok ( I did a small update to get 2 more fields on android new). There are some raw texts in reportgenerator. I would do the localization but I don't know how to do it in all languages. Do I only do Portuguese and English?
I finished the sqls review. They seem to be ok ( I did a small update to get 2 more fields on android new).
Thanks @gfd2020!
There are some raw texts in reportgenerator. I would do the localization but I don't know how to do it in all languages. Do I only do Portuguese and English?
Just copy the English localization to the non Portuguese ones and add the [TBT] suffix to them, to flag they should be sent to foreign collaborators for translation later.
Hi @lfcnassif . I just finished localization fix. I think that is ready do review ( except for the duplicate instant messages ...).
PS: If the quoted message is on another chat page, it will not be able to be marked when clicked. I think it's a small bug that won't occur very often and if fixed it will take a lot of effort.
Hi @lfcnassif . I just finished localization fix.
Thanks @gfd2020!
PS: If the quoted message is on another chat page, it will not be able to be marked when clicked. I think it's a small bug that won't occur very often and if fixed it will take a lot of effort.
I don't think it's a bug, but a limitation. Actually, I thought in an approach to make it work: we could set an ID (or UUID) into each chat page metadata to allow it to be referenced by another page, so each replay message could have a page reference if the original message is in a different page. Switching to another page may not be what user wants, so we could ask him to confirm the action. Or we could simply show a message warning the original message is in a different page and do nothing. What do you think?
I don't think it's a bug, but a limitation. Actually, I thought in an approach to make it work: we could set an ID (or UUID) into each chat page metadata to allow it to be referenced by another page, so each replay message could have a page reference if the original message is in a different page. Switching to another page may not be what user wants, so we could ask him to confirm the action. Or we could simply show a message warning the original message is in a different page and do nothing. What do you think?
Your idea is good to put an ID on the page. I think the best thing is to just warn the user and do nothing. Remembering that deleted quoted messages also do not execute the link. But would the message be an alert in javascript?
But would the message be an alert in javascript?
Yes, could be, with the proper localization, I think the Map already does this...
But would the message be an alert in javascript?
Yes, could be, with the proper localization, I think the Map already does this...
Hi @lfcnassif . I make it work but internal viewer javafx doesnt show alerts. In browser it will show as bellow:
The javascript method alert('xxx') doesn't work? Sorry, I didn't remember that... Another approach is calling a java function from javascript side to display a java dialog, like done by HtmlLinkViewer.AttachmentHandler inner class, which shows a message dialog when a clicked attachment is not found.
I get this template and make it work on iped. It is a html, css, javascript inner modal alert. Very cool. https://www.cssscript.com/alert-confirm-modal-prompt/
It has confirm options too ... What do you think?
Seems very nice!
Implementation of WhatsApp quote messages. Android Old DB - OK ( carved quotes fully implemented ) IPHONE - OK ( carved quotes NOT fully implemented ) Android New DB - OK ( carved quotes fully implemented )