Closed wladimirleite closed 1 year ago
Wow, that's a ton of memory... I just tested this feature with medium size cases of 2-3 million items, my fault...
IPED should be able to handle cases up to ~135 million items. Beyond that, the main table internal height exceeds the maximum value for an Integer (although that is another issue, we can simply warn the user to apply some filter, he probably doesn't want to analyze 135million items on the table at the same time)
Is there anything that can be done about this?
Not sure if cache building could be faster, but consuming all available memory for sure is not desired/expected.
Could you take a heap dump to help identify the memory hungry class?
and it takes a bit less than 8 GB to load everything.
Out of curiosity, what version has this heap usage? And the other? I expect 4.0.x to use less memory than 3.18.x because of Lucene 5.x to 9.x upgrade.
I think this is high priority, version 4.1 should be as stable as previous versions. @patrickdalla could you take a look at this?
and it takes a bit less than 8 GB to load everything.
Out of curiosity, what version has this heap usage? And the other? I expect 4.0.x to use less memory than 3.18.x because of Lucene 5.x to 9.x upgrade.
The 8 GB I mentioned is with 4.2-snapshot. I don't have the cases processed with the older versions anymore, but I remember I had to use -Xmx with 3.18.x, going well beyond 32GB. So 4.x is using much less memory indeed to load the case.
Not sure if cache building could be faster, but consuming all available memory for sure is not expected/desired.
The first call seems to be synchronous. I have no idea if this is mandatory, but it seems a long wait comparing to other initialization steps. Not sure if the time spent was already affected by the memory shortage. Maybe reducing the memory consumption will speed things up.
Have you taken a heap dump to help identifying the memory hungry class?
No, but I can take tomorrow.
The first call seems to be synchronous. I have no idea if this is mandatory
It was just done to try to speed up the Day cache, to don't share CPU resources with other caches building, as the Day scale is the default and rendered first, maybe that is not needed...
I think this is high priority, version 4.1 should be as stable as previous versions. @patrickdalla could you take a look at this?
Right.
@tc-wleite The first time a case is opened, this cache is built, but the next times it is loaded from disk. As the log says, it was successfully built. So, this excessive memory consumption occurs only the first time, or in every opening? If the subsequent opening is acceptable, maybe the solution could be to find a way create this cache at index/processing time. As the log states also, other time granularity caches starts, but does not end. So, the UI block could have been caused by these threads.
In fact, what is being created is an time period index. I don't remember well why this was not made to be executed in processing time. I think that there was some constraint about timezone change.
As the log says, it was successfully built. So, this excessive memory consumption occurs only the first time, or in every opening?
I am not sure about the next executions after the first cache is built. As it was taking a while to open the case and I had to access it just to export some files, I disabled the cache creation. I will test and report back here.
This is the memory consumption when I opened the case again:
I don't remember well why this was not made to be executed in processing time
AFAIK it was to keep backwards compatibility with 4.0.x cases and also to work transparently with multicases.
So, the UI block could have been caused by these threads.
Usually when there isn't enough heap memory and high GC pressure, the UI starts to freeze for some while before an OutOfMemoryError is thrown.
The first call seems to be synchronous.
If all of them are changed to run sequentially, do they finish? Could you check what is the final heap usage?
The first call seems to be synchronous.
If all of them are changed to run sequentially, do they finish? Could you check what is the final heap usage?
I am going to verify.
If all of them are changed to run sequentially, do they finish? Could you check what is the final heap usage?
I changed the code to make all cache building steps run sequentially, but they didn't finish after a bit more than 2 hours. There were a few out-of-memory exceptions in the log, and it doesn't seem that enough memory is released when the first steps were finished.
Execution log:
2023-03-21 18:44:13 [INFO] [app.ui.ColumnsManager] Loading default columns
2023-03-21 18:44:13 [INFO] [timelinegraph.cache.IndexTimeStampCache] Starting to load/build time cache of [[class org.jfree.data.time.Day]]...
2023-03-21 18:47:53 [INFO] [timelinegraph.cache.IndexTimeStampCache] Time to build time cache of [[class org.jfree.data.time.Day]]: 219272ms
2023-03-21 18:47:53 [INFO] [timelinegraph.cache.IndexTimeStampCache] Starting to load/build time cache of [[class org.jfree.data.time.Hour]]...
2023-03-21 18:56:11 [INFO] [timelinegraph.cache.IndexTimeStampCache] Time to build time cache of [[class org.jfree.data.time.Hour]]: 498703ms
2023-03-21 18:56:12 [INFO] [timelinegraph.cache.IndexTimeStampCache] Starting to load/build time cache of [[class org.jfree.data.time.Year]]...
Exception in thread "pool-14-thread-1" java.lang.OutOfMemoryError: Java heap space
at java.base/java.util.Arrays.copyOf(Unknown Source)
at java.base/java.util.ArrayList.grow(Unknown Source)
at java.base/java.util.ArrayList.grow(Unknown Source)
at java.base/java.util.ArrayList.add(Unknown Source)
at java.base/java.util.ArrayList.add(Unknown Source)
at iped.app.timelinegraph.cache.EventTimestampCache.run(EventTimestampCache.java:150)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Exception in thread "pool-14-thread-2" java.lang.OutOfMemoryError: Java heap space
at java.base/java.util.Arrays.copyOf(Unknown Source)
at java.base/java.util.ArrayList.grow(Unknown Source)
at java.base/java.util.ArrayList.grow(Unknown Source)
at java.base/java.util.ArrayList.add(Unknown Source)
at java.base/java.util.ArrayList.add(Unknown Source)
at iped.app.timelinegraph.cache.EventTimestampCache.run(EventTimestampCache.java:150)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
2023-03-21 19:07:30 [INFO] [timelinegraph.cache.IndexTimeStampCache] Time to build time cache of [[class org.jfree.data.time.Year]]: 678652ms
2023-03-21 19:07:35 [INFO] [timelinegraph.cache.IndexTimeStampCache] Starting to load/build time cache of [[class org.jfree.data.time.Month]]...
Exception in thread "Thread-2" Exception in thread "D3D Screen Updater" java.lang.OutOfMemoryError: Java heap space
Exception in thread "JavaFX Application Thread" java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
Exception in thread "pool-15-thread-1" Exception in thread "Disposer" java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
Exception in thread "TimerQueue" java.lang.OutOfMemoryError: Java heap space
Exception in thread "pool-15-thread-2" java.lang.OutOfMemoryError: Java heap space
Exception in thread "pool-15-thread-3" java.lang.OutOfMemoryError: Java heap space
Exception in thread "pool-15-thread-4" java.lang.OutOfMemoryError: Java heap space
2023-03-21 20:12:49 [INFO] [timelinegraph.cache.IndexTimeStampCache] Time to build time cache of [[class org.jfree.data.time.Month]]: 3913660ms
2023-03-21 20:12:49 [INFO] [timelinegraph.cache.IndexTimeStampCache] Starting to load/build time cache of [[class org.jfree.data.time.Quarter]]...
Exception in thread "pool-16-thread-1" java.lang.OutOfMemoryError: Java heap space
Exception in thread "pool-16-thread-2" java.lang.OutOfMemoryError: Java heap space
Exception in thread "pool-16-thread-3" java.lang.OutOfMemoryError: Java heap space
Exception in thread "pool-16-thread-4" java.lang.OutOfMemoryError: Java heap space
2023-03-21 20:15:18 [INFO] [timelinegraph.cache.IndexTimeStampCache] Time to build time cache of [[class org.jfree.data.time.Quarter]]: 148867ms
2023-03-21 20:15:18 [INFO] [timelinegraph.cache.IndexTimeStampCache] Starting to load/build time cache of [[class org.jfree.data.time.Week]]...
Exception in thread "pool-17-thread-1" java.lang.OutOfMemoryError: Java heap space
Exception in thread "pool-17-thread-2" java.lang.OutOfMemoryError: Java heap space
Exception in thread "pool-17-thread-3" java.lang.OutOfMemoryError: Java heap space
Exception in thread "pool-17-thread-4" java.lang.OutOfMemoryError: Java heap space
2023-03-21 20:16:59 [INFO] [timelinegraph.cache.IndexTimeStampCache] Time to build time cache of [[class org.jfree.data.time.Week]]: 101224ms
2023-03-21 20:16:59 [INFO] [timelinegraph.cache.IndexTimeStampCache] Starting to load/build time cache of [[class iped.jfextensions.model.Minute]]...
Exception in thread "pool-18-thread-1" java.lang.OutOfMemoryError: Java heap space
Thank you for testing @tc-wleite!
and it doesn't seem that enough memory is released when the first steps were finished.
This seems bad, maybe the final (not the temporary) memory needed is too excessive...
I'm working on this.
The use of primitive could help, but the problem is that all data of timeperiods counting is loaded and maintained in memory, what should be solved.
To use bitsets in this case, we should have one bitset per Timeperiod,event tuple. So, lots of bitsets (most of them small in size). It seems that the use of bitsets can worsen the memory usage issue. See the memory profile below of this simple test. After the 5s pause the memory usage increases a lot, with the same values included in the array list.
ArrayList<Object> list = new ArrayList<Object>();
ArrayList<Object> list2 = new ArrayList<Object>();
boolean bitseteval=false;
for(int i=0; i<1000000;i++) {
ArrayList<Integer> b = new ArrayList<Integer>();
for(int j=0; j<Math.random()*100;j++) {
b.add((int) Math.floor(Math.random()*1000000));
}
list.add(b);
}
try {
Thread.sleep(5000);
} catch (InterruptedException e) {
}
for(int i=0; i<1000000;i++) {
SparseBitSet b = new SparseBitSet();
List<Integer> l = (List<Integer>)list.get(i);
for(int j=0; j<l.size();j++) {
b.set(l.get(j));
}
list2.add(b);
}
int pause2 = list.size();
With the java original BitSet it was worse, leading to OOM.
The use of primitive could help, but the problem is that all data of timeperiods counting is loaded and maintained in memory, what should be solved.
Possibly you have already thought about this, but keeping just the current used TimePeriod scale (Day, Hour...) on memory maybe would be enough? Or too slow to load on demand?
I just ask you to try to be careful about the changes, since I plan to include them in 4.1.2, thank you!
One thing that could help is taking a heap dump and check what is really memory hungry, to avoid wasting time trying to optimize unrelated things.
After some tests I noticed that ItemIds lists of every shown item were been kept in memory to be used when the user right click on a bar to select or mark its respective items. After I have removed this items from memory, I could navigate in the timeline chart panel in a case with 600K items, restricting its maximum heap size to 512 (without this limit it seems JVM postpone GC). There were some minimum response time increase. I think I can change the logic to select/highlight items on the selected bar to avoid the necessity to keep this ItemIds in memory. I think that with a specific lucene query based on the selected bar will be possible to do it. This is not enough to solve all the issue, as it does not affect timeline index creation, but will be done.
Thank you @patrickdalla for investigating this! @tc-wleite if you could take a heap dump of your huge case and send it to me privately, I can take a look to try to confirm what is consuming all the heap in your case. I usually use the Eclipse Memory Analyzer plugin for it, it can handle huge dumps not handled by JVisualVM and it has very nice features to investigate issues like this.
I found one another point where references to loaded entries were not released to GC. After solving it I could start IPED for this case with 1.9M items (>600K of them timestamped) limiting heap memory to 512MB, all indexes were successfully created, and I have make some chart navigations, leading to the following mem usage chart:
I pushed all the commits to branch TimelineInteractionFixes. I think that some optimizations can still be made. But, at least, what could be classified as "bug" was solved.
Thank you very much @patrickdalla!
keeping just the current used TimePeriod scale (Day, Hour...) on memory maybe would be enough?
Have you tried this? Was it bad for UI responsiveness?
If @tc-wleite case still triggers OOME with 32GB heap, maybe we can expose a simple option in conf/AnalysisConfig.txt to disable the feature for huge cases like that.
maybe we can expose a simple option in conf/AnalysisConfig.txt to disable the feature for huge cases like that
@tc-wleite have you tried to disable the Timeline tab in case/iped/conf/ResultSetViewersConf.xml?
@tc-wleite have you tried to disable the Timeline tab in case/iped/conf/ResultSetViewersConf.xml?
No, I wasn't familiar with that configuration. I will try to use it.
@tc-wleite have you tried to disable the Timeline tab in case/iped/conf/ResultSetViewersConf.xml?
No, I wasn't familiar with that configuration. I will try to use it.
I worked fine, thanks @lfcnassif! It is good to have the option to disable the tab if necessary.
Hi @patrickdalla, can we close this since #1608 was merged or do you intend to improve it further?
Well #1608 only improved some memory utilization while interacting with timeline. Improvement of memory utilization on timeline index creation can still be done. I think this issue could be kept alive.
Hi everybody,
I made some initial tests on persisting intermediary results to liberate memory, and it was still giving OOM. After some analisys I found that removing the "parsedDateCache" hashmap, a internal cache to avoid lucene lookup of ord values that were already looked up, and its respective string to date parsing, I could run the index creation of my case with 3M timestamped items on a JVM with -xmx1024M, what wasn't possible with -xmx2048 without removing this cache. This means that this cache was avoiding garbage collection of persisted info. This, although, made index creation times double for the Day timeperiod granularity. For other more refined granularity, as Second, the time doesn't increased as such, meaning, it seems, that this cache was not running linearly.
I have changed parsedDateCache to a long array, instead of date arraylist, populated at the begining with termsEnum. It reduced time by 30%, but still holds a lot of memory causing OOM for JVM with less than 2G as max heap size.
Well, it seems that with long[] as the cache, it the intermediary persistence worked.
I removed a method that cloned byteRef of ord lookup. This improved more the performance. I think I created this because when assessing some strategies I had partitioned the iteration of field values between thread. It is not done anymore, one filed is processed by only one thread, so, no more clone is needed to avoid concurrency problems.
After all commits, creation time of index of my case was reduced by more than 50%, when there is available mem.
I have created a rule that flushes intermediary results when available memory is bellow 40MB. When intermediary flush ocurrs, merge is needed at the end, what increases a lot the time to create the index.
I created #TimelineFlushedIndexCreation. Could you try with that big case @tc-wleite ?
Could you try with that big case @tc-wleite ?
Sure! I will try to do that tomorrow.
I created #TimelineFlushedIndexCreation. Could you try with that big case @tc-wleite ?
Opening the large case (it has 210M items), memory consumption (and speed) seem much better now!
2023-04-22 11:10:35 [INFO] [app.ui.AppMain] Indexador e Processador de Evidências Digitais 4.2-snapshot
2023-04-22 11:10:35 [INFO] [app.ui.AppMain] Java Version: 11.0.13
2023-04-22 11:10:35 [INFO] [app.ui.AppMain] Java Home: C:\Java\jdk-11.0.13
2023-04-22 11:10:35 [INFO] [app.ui.AppMain] Java VM Name: OpenJDK 64-Bit Server VM
2023-04-22 11:10:35 [INFO] [engine.config.Configuration] Loading configuration from D:\SARD\iped
2023-04-22 11:10:39 [INFO] [app.ui.App] Starting...
2023-04-22 11:10:42 [INFO] [iped.viewers.IcePDFViewer] IcePDFViewer took 451ms to be initialized.
2023-04-22 11:10:45 [INFO] [app.ui.App$1] GUI created
2023-04-22 11:10:45 [INFO] [application.connection.LocalOfficeConnection] Opening local OpenOffice.org connection.
2023-04-22 11:10:45 [INFO] [engine.datasource.SleuthkitReader] Sleuthkit version 4.12.0-iped-patch detected.
2023-04-22 11:10:46 [INFO] [iped.viewers.LibreOfficeViewer] LibreOffice running.
2023-04-22 11:10:46 [INFO] [iped.viewers.LibreOfficeViewer$1] Constructing LibreOffice frame...
2023-04-22 11:10:46 [INFO] [iped.viewers.LibreOfficeViewer$1] LibreOffice frame ok
2023-04-22 11:11:05 [INFO] [sleuthkit.datamodel.SleuthkitCase] Time to initialize parent node cache: 19,974 ms
2023-04-22 11:11:06 [INFO] [sleuthkit.datamodel.SleuthkitCase] sqlite-jdbc version 3.34.0 loaded in native mode
2023-04-22 11:11:06 [INFO] [engine.data.IPEDSource] Opening index D:\SARD\iped\index
2023-04-22 11:11:06 [INFO] [engine.lucene.ConfiguredFSDirectory] Using MMapDirectory to open index...
2023-04-22 11:11:06 [INFO] [engine.data.IPEDSource] Index opened
2023-04-22 11:11:06 [INFO] [engine.data.IPEDSource] Creating LuceneId to ID mapping...
2023-04-22 11:11:58 [INFO] [engine.data.Bookmarks] Loading state from file D:\SARD\iped\bookmarks.iped
2023-04-22 11:11:59 [INFO] [engine.data.IPEDMultiSource] Opening MultiReader...
2023-04-22 11:11:59 [INFO] [engine.data.IPEDMultiSource] MultiReader opened
2023-04-22 11:12:32 [INFO] [engine.data.IPEDMultiSource] Loaded 1 cases.
2023-04-22 11:12:32 [INFO] [app.ui.UICaseDataLoader] Loading Columns
2023-04-22 11:12:32 [INFO] [app.ui.ColumnsManager] Loading default columns
2023-04-22 11:12:33 [INFO] [timelinegraph.cache.IndexTimeStampCache] Starting to load/build time cache of [[class org.jfree.data.time.Day]]...
2023-04-22 11:17:05 [INFO] [timelinegraph.cache.IndexTimeStampCache] Time to build timeline index of [[class org.jfree.data.time.Day]]: 271970ms
2023-04-22 11:17:05 [INFO] [timelinegraph.cache.IndexTimeStampCache] Starting to load/build time cache of [[class org.jfree.data.time.Year]]...
2023-04-22 11:17:05 [INFO] [timelinegraph.cache.IndexTimeStampCache] Starting to load/build time cache of [[class org.jfree.data.time.Hour]]...
2023-04-22 11:17:05 [INFO] [timelinegraph.cache.IndexTimeStampCache] Starting to load/build time cache of [[class org.jfree.data.time.Month]]...
2023-04-22 11:17:05 [INFO] [timelinegraph.cache.IndexTimeStampCache] Starting to load/build time cache of [[class org.jfree.data.time.Quarter]]...
2023-04-22 11:17:05 [INFO] [timelinegraph.cache.IndexTimeStampCache] Starting to load/build time cache of [[class org.jfree.data.time.Week]]...
2023-04-22 11:17:05 [INFO] [timelinegraph.cache.IndexTimeStampCache] Starting to load/build time cache of [[class iped.jfextensions.model.Minute]]...
2023-04-22 11:17:05 [INFO] [timelinegraph.cache.IndexTimeStampCache] Starting to load/build time cache of [[class org.jfree.data.time.Second]]...
2023-04-22 11:17:08 [INFO] [app.ui.UICaseDataLoader] Listing all items
2023-04-22 11:17:08 [INFO] [app.ui.FileProcessor] Opening iped/help/Help_pt-BR.htm
2023-04-22 11:17:08 [INFO] [app.ui.UICaseDataLoader] Listing all items Finished
2023-04-22 11:17:08 [INFO] [app.ui.UICaseSearcherFilter] Searching for query *:*
2023-04-22 11:17:18 [INFO] [engine.search.QueryBuilder] Expanded query terms: []
2023-04-22 11:21:01 [INFO] [timelinegraph.cache.IndexTimeStampCache] Time to build timeline index of [[class org.jfree.data.time.Year]]: 235756ms
2023-04-22 11:21:24 [INFO] [timelinegraph.cache.IndexTimeStampCache] Time to build timeline index of [[class org.jfree.data.time.Quarter]]: 258616ms
2023-04-22 11:21:42 [INFO] [timelinegraph.cache.IndexTimeStampCache] Time to build timeline index of [[class org.jfree.data.time.Month]]: 276852ms
2023-04-22 11:22:09 [INFO] [timelinegraph.cache.IndexTimeStampCache] Time to build timeline index of [[class org.jfree.data.time.Week]]: 304007ms
2023-04-22 11:23:07 [INFO] [timelinegraph.cache.IndexTimeStampCache] Time to build timeline index of [[class org.jfree.data.time.Hour]]: 361842ms
2023-04-22 11:24:07 [INFO] [timelinegraph.cache.IndexTimeStampCache] Time to build timeline index of [[class iped.jfextensions.model.Minute]]: 422011ms
2023-04-22 11:26:52 [INFO] [timelinegraph.cache.IndexTimeStampCache] Time to build timeline index of [[class org.jfree.data.time.Second]]: 586847ms
Thank you.
About the intermediary persistence decision, for testing I have made a rule that when the remaining memory is below 40M it stops any new memory consumption until the intermediary result is flushed to liberate more memory.
Internally, this scheme works, but as it competes for memory with other modules, it will let to them little memory, what can incur in OOM by them. Does anyone suggest some better scheme, maybe to be used globally by other modules also?
Edited: I think a max memory/objects per module approach would be better.
Maybe some ideas can be borrowed from Lucene, they have different approaches for similar things. For example, we increase from 16MB to 64MB the memory buffer of each indexing thread. Some refs: https://lucene.apache.org/core/8_7_0/core/org/apache/lucene/index/IndexWriter.html https://lucene.apache.org/core/8_7_0//core/org/apache/lucene/index/MergePolicy.html https://lucene.apache.org/core/8_7_0/core/org/apache/lucene/index/MergeScheduler.html
Well, after some tests and detail corrections I think it is good enough: I could open and navigate in timeline in a 3.4M timestamped items case in a JVM limited by 768MB of heap size. I included a rule no to start the time period index creation in parallel if there are a lot of items and free memory is not big. The creation times also improved a lot, except when memory is low and intermediary flushes to disk has to be done.
I have a very large case here, processed with the master branch, with ~125 million items (many evidence files processed with "fastmode" profile). I had similar cases processed with 3.18.x and 4.0.x and they opened relatively quickly (~2 minutes). With the latest version, it is taking ~15 minutes, with a long wait in "Starting to load/build time cache of [[class org.jfree.data.time.Day]]", as can be seen in the log below.
After that, the analysis UI becomes visible, but unusable (any mouse/keyboard action makes the application "Not responding" in Windows). It seems all the JVM memory is consumed and the garbage collector is running continuously (screenshot of VisualVM below). Disabling the load/build of the timeline cache (in the source code) makes the loading process fast again, and it takes a bit less than 8 GB to load everything.
Is there anything that can be done about this? Although such large cases are not common, new IPED versions may create more items (depending on the processing options). Maybe putting a limit in the maximum number of items that this cache building process will run. Or maybe an option to disable it, to be used when the user needs to open a very large case.