sepinf-inc / IPED

IPED Digital Forensic Tool. It is an open source software that can be used to process and analyze digital evidence, often seized at crime scenes by law enforcement or in a corporate investigation by private examiners.
Other
886 stars 211 forks source link

Aborting OutOfMemoryError caused by too many results from ItemSearcher called from UFEDChatParser #2038

Closed lfcnassif closed 4 months ago

lfcnassif commented 6 months ago

An user reported OOME with a 80GB heap. Analyzing a smaller 32GB heap I asked for, there are many parsing threads using up to 1GB each: image

Taking a look at them, most heap is being used by large ArrayList<Item> objects: image

It took me a while to find from where those large Item lists come from. Looking those Threads stacktrace, those Lists are returned by ItemSearcher.search(query) calls from UFEDChatParser:

ParsingThread-20
  at java.util.Collections$SynchronizedMap.get(Ljava/lang/Object;)Ljava/lang/Object; (Unknown Source)
  at java.util.Collections$UnmodifiableMap.get(Ljava/lang/Object;)Ljava/lang/Object; (Unknown Source)
  at iped.engine.task.index.IndexItem.getItem(Lorg/apache/lucene/document/Document;Liped/engine/data/IPEDSource;Z)Liped/data/IItem; (IndexItem.java:940)
  at iped.engine.data.IPEDSource.getItemByLuceneID(I)Liped/data/IItem; (IPEDSource.java:493)
  at iped.engine.data.IPEDSource.getItemByID(I)Liped/data/IItem; (IPEDSource.java:503)
  at iped.engine.search.ItemSearcher$1$1.next()Liped/data/IItemReader; (ItemSearcher.java:62)
  at iped.engine.search.ItemSearcher$1$1.next()Ljava/lang/Object; (ItemSearcher.java:51)
  at iped.engine.search.ItemSearcher.search(Ljava/lang/String;)Ljava/util/List; (ItemSearcher.java:37)
  at iped.parsers.ufed.UFEDChatParser.parse(Ljava/io/InputStream;Lorg/xml/sax/ContentHandler;Lorg/apache/tika/metadata/Metadata;Lorg/apache/tika/parser/ParseContext;)V (UFEDChatParser.java:112)
  at org.apache.tika.parser.CompositeParser.parse(Ljava/io/InputStream;Lorg/xml/sax/ContentHandler;Lorg/apache/tika/metadata/Metadata;Lorg/apache/tika/parser/ParseContext;)V (CompositeParser.java:298)
  at iped.parsers.standard.StandardParser.parse(Ljava/io/InputStream;Lorg/xml/sax/ContentHandler;Lorg/apache/tika/metadata/Metadata;Lorg/apache/tika/parser/ParseContext;)V (StandardParser.java:245)
  at iped.engine.io.ParsingReader$BackgroundParsing.run()V (ParsingReader.java:247)
  at java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object; (Unknown Source)
  at java.util.concurrent.FutureTask.run()V (Unknown Source)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V (Unknown Source)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run()V (Unknown Source)
  at java.lang.Thread.run()V (Unknown Source)

Maybe there is some problem with the query, but it would be better to use the safer ItemSearcher.searchIterable(query) instead of ItemSearcher.search(query) where possible, which returns an Iterable instead of an ArrayList.

lfcnassif commented 6 months ago

The user reported commit above fixed this OOME issue, so I'll merge it into master soon.

wladimirleite commented 4 months ago

@lfcnassif, while checking an issue related to the HTML generated by UFEDChatParser (reported by another user to @felipecampanini), I observed a different behavior comparing master and 4.1.5. Analyzing the situation, I think I found a small bug in your fix that can generate an infinite loop in for (IItemReader subitem = subItems.next(); subItems.hasNext();), as subItems.next() is executed only once, right?

wladimirleite commented 4 months ago

Clarifying: an infinite loop, if more than one item was found, and a lost message, if a single item was returned by the query, as hasNext() will return false after the next().

lfcnassif commented 4 months ago

Thank you @wladimirleite and sorry for my fault! subitem = subItems.next(); should be also put after the last semicolon in the for clause. But your solution is much better!