sepinf-inc / IPED

IPED Digital Forensic Tool. It is an open source software that can be used to process and analyze digital evidence, often seized at crime scenes by law enforcement or in a corporate investigation by private examiners.
Other
940 stars 218 forks source link

Old WhatsApp databases parsing affected by forwarded message feature [4.1.3 regression] #1774

Closed lfcnassif closed 1 year ago

lfcnassif commented 1 year ago

@thalespr reported this to me and sent a log with several entries like this:

java.sql.SQLException: Column forwarded not found. no such column
    at iped.parsers.sqlite.SQLiteUndeleteTableResultSetAdapter.getMappedColumnName(SQLiteUndeleteTableResultSetAdapter.java:76) ~[iped-parsers-impl-4.2-snapshot.jar:?]
    at iped.parsers.sqlite.SQLiteUndeleteTableResultSetAdapter.getInt(SQLiteUndeleteTableResultSetAdapter.java:377) ~[iped-parsers-impl-4.2-snapshot.jar:?]
    at iped.parsers.whatsapp.ExtractorAndroid.createMessageFromDBRow(ExtractorAndroid.java:504) ~[iped-parsers-impl-4.2-snapshot.jar:?]
    at iped.parsers.whatsapp.ExtractorAndroid.extractMessages(ExtractorAndroid.java:387) [iped-parsers-impl-4.2-snapshot.jar:?]
    at iped.parsers.whatsapp.ExtractorAndroid.extractChatList(ExtractorAndroid.java:213) [iped-parsers-impl-4.2-snapshot.jar:?]
    at iped.parsers.whatsapp.Extractor.getChatList(Extractor.java:34) [iped-parsers-impl-4.2-snapshot.jar:?]
    at iped.parsers.whatsapp.WhatsAppParser.extractChatList(WhatsAppParser.java:529) [iped-parsers-impl-4.2-snapshot.jar:?]
    at iped.parsers.whatsapp.WhatsAppParser.parseDB(WhatsAppParser.java:399) [iped-parsers-impl-4.2-snapshot.jar:?]
    at iped.parsers.whatsapp.WhatsAppParser.parseAndCheckIfIsMainDb(WhatsAppParser.java:461) [iped-parsers-impl-4.2-snapshot.jar:?]
    at iped.parsers.whatsapp.WhatsAppParser.parse(WhatsAppParser.java:243) [iped-parsers-impl-4.2-snapshot.jar:?]
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298) [tika-core-2.4.0-p1.jar:2.4.0]
    at iped.parsers.standard.StandardParser.parse(StandardParser.java:245) [iped-parsers-impl-4.2-snapshot.jar:?]
    at iped.engine.io.ParsingReader$BackgroundParsing.run(ParsingReader.java:247) [iped-engine-4.2-snapshot.jar:?]
    at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) [?:?]
    at java.util.concurrent.FutureTask.run(Unknown Source) [?:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
    at java.lang.Thread.run(Unknown Source) [?:?]

I was able to reproduce, chats are rendered without messages. It seems to happen when the parser tries to recover deleted messages, disabling it doesn't avoid the issue, but results in no chats decoded. Unfortunately this wasn't detected on #1287 huge regression test with the huge 300 DBs data set since I disabled deleted message recovery because of OOME issues when running the test on all DBs at the same time...

lfcnassif commented 1 year ago

Anyone could take a look at this? I won't have time this week since I'm in charge of the whole unit. I can send the sample privately.

wladimirleite commented 1 year ago

I will be back on Friday. If nobody takes care of this before that, I will look into it (please, send me the triggering database, if that is the case).

lfcnassif commented 1 year ago

I will be back on Friday. If nobody take care of this before that, I will look into it (please, send me the triggering database, if that is the case).

Thanks @tc-wleite, but don't worry, since this seems to affect just very old databases, please enjoy your vacation.

gfd2020 commented 1 year ago

Anyone could take a look at this? I won't have time this week since I'm in charge of the whole unit. I can send the sample privately.

The only way I could handle this error was by testing if the column exists. If it exists, do the normal select, if it doesn't exist, put the collumn with zero value, for example in the select: ' ... 0 as forwarded FROM ....'. Sorry I should have warned you about this, I forgot. Since the regression tests passed, I didn't even bother checking the code.

Just make a test, just like orthers tests ...

private boolean databaseHasForwarded(Connection conn) throws SQLException {
    boolean result = false;
    DatabaseMetaData md = conn.getMetaData();
    try (ResultSet rs = md.getColumns(null, null, "messages", "forwarded")) {
        if (rs.next()) {
            result = true;
        }
    }
    return result;
}
lfcnassif commented 1 year ago

Not sure, but I'll try to take a look at this in the (later) evening or at night...

hauck-jvsh commented 1 year ago

Maybe we can just add a try catch in the createMessageFromDBRow (ExtractorAndroid.java:504)

 try {
        m.setForwarded(rs.getInt("forwarded") > 0);
    } catch (SQLException e) {
        m.setForwarded(false);
    }
wladimirleite commented 1 year ago

As far as I remember, there is an auxiliary function that can be used (checkIfColumnExists).

lfcnassif commented 1 year ago

I'm reopening this to don't forget: @thalespr reported commit ffbcac0 may miss the thumbnails table with some old DBs.

lfcnassif commented 1 year ago

I wanted to create a PR with commit https://github.com/sepinf-inc/IPED/commit/f8832ba65ef9e06f59df921ae43adc5b77eeca13 for review, but I pushed it to master non intentionally. I would appreciate if anyone could take a look at it, since the last attempt to fix this caused a regression to some old DBs, missing the thumbnails table.

lfcnassif commented 1 year ago

Does someone have some time to take a look and test commit f8832ba so we can include it in v4.1.4 safely?

wladimirleite commented 1 year ago

Does someone have some time to take a look and test commit f8832ba so we can include it in v4.1.4 safely?

Hi @lfcnassif! I can test it later today, if nobody else is available.

paulobreim commented 1 year ago

Does someone have some time to take a look and test commit f8832ba so we can include it in v4.1.4 safely?

I can, but I don´t have the WA files to test.

lfcnassif commented 1 year ago

Hi @lfcnassif! I can test it later today, if nobody else is available.

Thank you @tc-wleite! I made a few tests, but I would appreciate a second opinion given the previous regression. But don't hurry, 4.1.4 can wait next week.

I can, but I don´t have the WA files to test.

Thanks @paulobreim. But since @tc-wleite knows the code and has samples, I think it would be easier for him, when he has some spare time.

wladimirleite commented 1 year ago

Thank you @tc-wleite! I made a few tests, but I would appreciate a second opinion given the previous regression. But don't hurry, 4.1.4 can wait next week.

I just collected ~10 older databases (from 2014 to 2020). I will run some tests and report here later.

wladimirleite commented 1 year ago

Tests finished and everything seems fine. I also included more recent databases in the tests. Only one of the older databases triggered the issue, when processing with 4.1.3. By the way, sorry for overlooking this while implementing the forwarded messages feature.

lfcnassif commented 1 year ago

Thank you for all tests @tc-wleite! Don't worry, I didn't detect this regression in my large test with 300 DBs either...

gfd2020 commented 1 year ago

I have an old Whatsapp database (2017) and it is not working on the master branch. No chat retrieved.

Not Working: 4.0.2 4.1.2 4.1.3 4.1.4 Master-branch (downloaded yesterday)

Working ( all chats retrieved.) : 3.15.6 3.17-rc1 3.18.14

wladimirleite commented 1 year ago

Hi @gfd2020! Can you share this database privately? Did you check the processing log for errors related to this file? Errors related to missing columns/tables usually can be found in the log.

lfcnassif commented 1 year ago

If confirmed, I think we should open another issue, since it seems a 4.0 regression not related to forwarded column.

gfd2020 commented 1 year ago

Hi @gfd2020! Can you share this database privately? Did you check the processing log for errors related to this file? Errors related to missing columns/tables usually can be found in the log.

I found the problem. If you order the msgstore.db file to be processed directly, IPED does not parse the chats, but if it is inside a directory and you order it to be processed, it works. Is this behavior expected?

Example: iped.exe -d C:\dir\msgtore.db -o out ( does not work) iped.exe -d C:\dir -o out ( works)

PS: The old versions (3.x) don't have this problem.

lfcnassif commented 1 year ago

Not sure, but I think this was intentional, to avoid parsing the whole root evidence into UI when clicking on it, since it can be huge.

wladimirleite commented 1 year ago

Not sure, but I think this was intentional, to avoid parsing the whole root evidence into UI when clicking on it, since it can be huge.

Current command line help says:

    -d, -data
      input data (can be used multiple times): folder, DD, 001, E01 images
      (+AFF on Linux), ISO, physical drive, or *.iped file (with tagged files
      to export and reindex)

Are there situations in which processing a single file of other types is useful? If not, to avoid confusion, it would be possible to reject, during the initial check of input parameters, other regular files as input. AD1, UFDR and probably some archive types should also be accepted (7z, zip, rar, gz, tar).

lfcnassif commented 1 year ago

Current command line help says:

    -d, -data
      input data (can be used multiple times): folder, DD, 001, E01 images
      (+AFF on Linux), ISO, physical drive, or *.iped file (with tagged files
      to export and reindex)

Hum... this is outdated and we should add Ex01, VHD, VHDX, VMDK, AD1, UFDR to the help.

Are there situations in which processing a single file of other types is useful?

I already did what @gfd2020 tried in the past to test processing of a single test file when it was located together with other files in the same folder into my regression test corpus. But in real use cases, I don't see an advantage doing this. Actually, the single file used to be rendered using a hard drive icon in the Evidence Tree, what is wrong, not sure how it is rendered today.

wladimirleite commented 1 year ago

Actually, the single file used to be rendered using a hard drive icon in the Evidence Tree, what is wrong, not sure how it is rendered today.

It uses the associated icon (not fixed as a hard drive anymore).

gfd2020 commented 1 year ago

Sorry for the confusion. I didn't know about this IPED behavior, as I use the 3.x version more, I thought it would be the same.