sepinf-inc / IPED

IPED Digital Forensic Tool. It is an open source software that can be used to process and analyze digital evidence, often seized at crime scenes by law enforcement or in a corporate investigation by private examiners.
Other
924 stars 217 forks source link

WhatsApp parsing timeout can break parsing of other WA databases #1679

Closed lfcnassif closed 1 year ago

lfcnassif commented 1 year ago

While testing #1651, processing hundreds of different WA databases together, one of them caused a timeout. Then parsing of a different WA DB fails with trace below:

org.apache.tika.exception.TikaException: WAExtractorException Exception
    at iped.parsers.whatsapp.WhatsAppParser.mergeParsedDBsAndOutputResults(WhatsAppParser.java:728)
    at iped.parsers.whatsapp.WhatsAppParser.parse(WhatsAppParser.java:257)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
    at iped.parsers.standard.StandardParser.parse(StandardParser.java:245)
    at iped.engine.io.ParsingReader$BackgroundParsing.run(ParsingReader.java:247)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
    at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.lang.RuntimeException: java.nio.channels.ClosedByInterruptException
    at iped.parsers.whatsapp.Message.getThumbData(Message.java:221)
    at iped.parsers.whatsapp.ReportGenerator.printMessage(ReportGenerator.java:430)
    at iped.parsers.whatsapp.ReportGenerator.lambda$generateNextChatHtml$0(ReportGenerator.java:147)
    at iped.parsers.whatsapp.ReportGenerator$1.lookup(ReportGenerator.java:633)
    at org.apache.commons.text.StringSubstitutor.resolveVariable(StringSubstitutor.java:1148)
    at org.apache.commons.text.StringSubstitutor.substitute(StringSubstitutor.java:1514)
    at org.apache.commons.text.StringSubstitutor.substitute(StringSubstitutor.java:1389)
    at org.apache.commons.text.StringSubstitutor.replace(StringSubstitutor.java:893)
    at iped.parsers.whatsapp.ReportGenerator.printMessageFile(ReportGenerator.java:644)
    at iped.parsers.whatsapp.ReportGenerator.generateNextChatHtml(ReportGenerator.java:125)
    at iped.parsers.whatsapp.WhatsAppParser.createReport(WhatsAppParser.java:281)
    at iped.parsers.whatsapp.WhatsAppParser.mergeParsedDBsAndOutputResults(WhatsAppParser.java:719)
    ... 9 more
Caused by: java.nio.channels.ClosedByInterruptException
    at java.base/java.nio.channels.spi.AbstractInterruptibleChannel.end(Unknown Source)
    at java.base/sun.nio.ch.FileChannelImpl.endBlocking(Unknown Source)
    at java.base/sun.nio.ch.FileChannelImpl.readInternal(Unknown Source)
    at java.base/sun.nio.ch.FileChannelImpl.read(Unknown Source)
    at iped.parsers.whatsapp.Message.getThumbData(Message.java:218)
    ... 20 more

In sequence many of this are thrown:

org.apache.tika.exception.TikaException: WAExtractorException Exception
    at iped.parsers.whatsapp.WhatsAppParser.mergeParsedDBsAndOutputResults(WhatsAppParser.java:728)
    at iped.parsers.whatsapp.WhatsAppParser.parse(WhatsAppParser.java:257)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
    at iped.parsers.standard.StandardParser.parse(StandardParser.java:245)
    at iped.engine.io.ParsingReader$BackgroundParsing.run(ParsingReader.java:247)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
    at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.lang.RuntimeException: java.nio.channels.ClosedChannelException
    at iped.parsers.whatsapp.Message.getThumbData(Message.java:221)
    at iped.parsers.whatsapp.ReportGenerator.printMessage(ReportGenerator.java:430)
    at iped.parsers.whatsapp.ReportGenerator.lambda$generateNextChatHtml$0(ReportGenerator.java:147)
    at iped.parsers.whatsapp.ReportGenerator$1.lookup(ReportGenerator.java:633)
    at org.apache.commons.text.StringSubstitutor.resolveVariable(StringSubstitutor.java:1148)
    at org.apache.commons.text.StringSubstitutor.substitute(StringSubstitutor.java:1514)
    at org.apache.commons.text.StringSubstitutor.substitute(StringSubstitutor.java:1389)
    at org.apache.commons.text.StringSubstitutor.replace(StringSubstitutor.java:893)
    at iped.parsers.whatsapp.ReportGenerator.printMessageFile(ReportGenerator.java:644)
    at iped.parsers.whatsapp.ReportGenerator.generateNextChatHtml(ReportGenerator.java:125)
    at iped.parsers.whatsapp.WhatsAppParser.createReport(WhatsAppParser.java:281)
    at iped.parsers.whatsapp.WhatsAppParser.mergeParsedDBsAndOutputResults(WhatsAppParser.java:719)
    ... 9 more
Caused by: java.nio.channels.ClosedChannelException
    at java.base/sun.nio.ch.FileChannelImpl.ensureOpen(Unknown Source)
    at java.base/sun.nio.ch.FileChannelImpl.read(Unknown Source)
    at iped.parsers.whatsapp.Message.getThumbData(Message.java:218)
    ... 20 more

When a timeout happens, the parsing thread is interrupted. That can close the thumb cache file channel if it is being read or written. We should reopen the channel if exceptions above are thrown.

lfcnassif commented 1 year ago

Another safer approach would be using a different thumb cache file per WA database being parsed. For now I'll keep the cache file static and reopen it if it is closed.