sepinf-inc / IPED

IPED Digital Forensic Tool. It is an open source software that can be used to process and analyze digital evidence, often seized at crime scenes by law enforcement or in a corporate investigation by private examiners.
Other
893 stars 214 forks source link

Parsing exception when searching for chat attachments #1989

Closed lfcnassif closed 7 months ago

lfcnassif commented 7 months ago

To upgrade to new telegram-decoder plugin, I managed to crawl 68 telegram DBs from our past cases database. Running master on them, I got the following exception with one of those Telegram DBs:

java.lang.RuntimeException: iped.exception.QueryNodeException: INVALID_SYNTAX_CANNOT_PARSE: Syntax Error, cannot parse name:"XXXXXXXX "XXXXXXXX\".MP4" && size:1094856: Lexical error at line 1, column 76.  Encountered: <EOF> after : "\" && size:1094856" 
    at iped.engine.search.IPEDSearcher.setQuery(IPEDSearcher.java:107)
    at iped.engine.search.IPEDSearcher.<init>(IPEDSearcher.java:66)
    at iped.engine.search.ItemSearcher.getResult(ItemSearcher.java:71)
    at iped.engine.search.ItemSearcher.searchIterable(ItemSearcher.java:46)
    at iped.engine.search.ItemSearcher.search(ItemSearcher.java:37)
    at iped.parsers.util.Util.getItems(Util.java:263)
    at iped.parsers.telegram.Extractor.getFileFromQuery(Extractor.java:446)
    at iped.parsers.telegram.Extractor.loadDocument(Extractor.java:383)
    at iped.parsers.telegram.Extractor.extractMessages(Extractor.java:261)
    at iped.parsers.telegram.TelegramParser.parseTelegramDBAndroid(TelegramParser.java:179)
    at iped.parsers.telegram.TelegramParser.parse(TelegramParser.java:482)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
    at iped.parsers.standard.StandardParser.parse(StandardParser.java:245)
    at iped.engine.io.ParsingReader$BackgroundParsing.run(ParsingReader.java:247)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
    at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.base/java.lang.Thread.run(Unknown Source)
Caused by: iped.exception.QueryNodeException: INVALID_SYNTAX_CANNOT_PARSE: Syntax Error, cannot parse name:"XXXXXXXX "XXXXXXXX\".MP4" && size:1094856: Lexical error at line 1, column 76.  Encountered: <EOF> after : "\" && size:1094856" 
    at iped.engine.search.QueryBuilder.getQuery(QueryBuilder.java:376)
    at iped.engine.search.QueryBuilder.getQuery(QueryBuilder.java:333)
    at iped.engine.search.IPEDSearcher.setQuery(IPEDSearcher.java:104)
    ... 18 more
Caused by: INVALID_SYNTAX_CANNOT_PARSE: Syntax Error, cannot parse name:"XXXXXXXX "XXXXXXXX\".MP4" && size:1094856: Lexical error at line 1, column 76.  Encountered: <EOF> after : "\" && size:1094856" 
    at org.apache.lucene.queryparser.flexible.standard.parser.StandardSyntaxParser.parse(StandardSyntaxParser.java:98)
    at org.apache.lucene.queryparser.flexible.core.QueryParserHelper.parse(QueryParserHelper.java:214)
    at org.apache.lucene.queryparser.flexible.standard.StandardQueryParser.parse(StandardQueryParser.java:280)
    at iped.engine.search.QueryBuilder.getQuery(QueryBuilder.java:369)
    ... 20 more
Caused by: org.apache.lucene.queryparser.flexible.standard.parser.TokenMgrError: Lexical error at line 1, column 76.  Encountered: <EOF> after : "\" && size:1094856"
    at org.apache.lucene.queryparser.flexible.standard.parser.StandardSyntaxParserTokenManager.getNextToken(StandardSyntaxParserTokenManager.java:2133)
    at org.apache.lucene.queryparser.flexible.standard.parser.StandardSyntaxParser.jj_scan_token(StandardSyntaxParser.java:1971)
    at org.apache.lucene.queryparser.flexible.standard.parser.StandardSyntaxParser.jj_3R_11(StandardSyntaxParser.java:1405)
    at org.apache.lucene.queryparser.flexible.standard.parser.StandardSyntaxParser.jj_3_3(StandardSyntaxParser.java:1571)
    at org.apache.lucene.queryparser.flexible.standard.parser.StandardSyntaxParser.jj_2_3(StandardSyntaxParser.java:1181)
    at org.apache.lucene.queryparser.flexible.standard.parser.StandardSyntaxParser.Clause(StandardSyntaxParser.java:258)
    at org.apache.lucene.queryparser.flexible.standard.parser.StandardSyntaxParser.ModClause(StandardSyntaxParser.java:249)
    at org.apache.lucene.queryparser.flexible.standard.parser.StandardSyntaxParser.ConjQuery(StandardSyntaxParser.java:186)
    at org.apache.lucene.queryparser.flexible.standard.parser.StandardSyntaxParser.DisjQuery(StandardSyntaxParser.java:163)
    at org.apache.lucene.queryparser.flexible.standard.parser.StandardSyntaxParser.Query(StandardSyntaxParser.java:124)
    at org.apache.lucene.queryparser.flexible.standard.parser.StandardSyntaxParser.TopLevelQuery(StandardSyntaxParser.java:114)
    at org.apache.lucene.queryparser.flexible.standard.parser.StandardSyntaxParser.parse(StandardSyntaxParser.java:92)
    ... 23 more

Debugging, there is an attachment name not being escaped properly. It contains a double quote, which is escaped fine, but also (0x84) char, which is not handled by Lucene escape function properly. It results in above exception which causes many conversations to be missed.

This can affect parsing of other chat app databases.