Open BlindDuck opened 9 months ago
Thank you very much for the report. The error message indicates that the process is running out of memory, the Java Heap to be exact. Could you please enter the following commands on your system and paste the output here?
$ grep Mem /proc/meminfo
$ /usr/lib64/jvm/jre-11-openjdk/bin/java -XX:+PrintFlagsFinal -version | grep HeapSize
I have reproduced the error message on three different systems and installed also the resent openJDK.
system1: grep Mem /proc/meminfo MemTotal: 30740012 kB MemFree: 4689416 kB MemAvailable: 17962028 kB
/usr/lib64/jvm/jre-11-openjdk/bin/java -XX:+PrintFlagsFinal -version | grep HeapSize size_t ErgoHeapSizeLimit = 0 {product} {default} size_t HeapSizePerGCThread = 43620760 {product} {default} size_t InitialHeapSize = 492830720 {product} {ergonomic} size_t LargePageHeapSizeThreshold = 134217728 {product} {default} size_t MaxHeapSize = 7870611456 {product} {ergonomic} uintx NonNMethodCodeHeapSize = 5836300 {pd product} {ergonomic} uintx NonProfiledCodeHeapSize = 122910970 {pd product} {ergonomic} uintx ProfiledCodeHeapSize = 122910970 {pd product} {ergonomic} size_t ShenandoahSoftMaxHeapSize = 0 {manageable} {default} openjdk version "11.0.17" 2022-10-18 OpenJDK Runtime Environment (build 11.0.17+0-suse-150000.3.86.2-x8664) OpenJDK 64-Bit Server VM (build 11.0.17+0-suse-150000.3.86.2-x8664, mixed mode)
/opt/jdk-21.0.2/bin/java -XX:+PrintFlagsFinal -version | grep HeapSize size_t ErgoHeapSizeLimit = 0 {product} {default} size_t HeapSizePerGCThread = 43620760 {product} {default} size_t InitialHeapSize = 494927872 {product} {ergonomic} size_t LargePageHeapSizeThreshold = 134217728 {product} {default} size_t MaxHeapSize = 7872708608 {product} {ergonomic} size_t MinHeapSize = 8388608 {product} {ergonomic} uintx NonNMethodCodeHeapSize = 5839372 {pd product} {ergonomic} uintx NonProfiledCodeHeapSize = 122909434 {pd product} {ergonomic} uintx ProfiledCodeHeapSize = 122909434 {pd product} {ergonomic} size_t SoftMaxHeapSize = 7872708608 {manageable} {ergonomic} openjdk version "21.0.2" 2024-01-16 OpenJDK Runtime Environment (build 21.0.2+13-58) OpenJDK 64-Bit Server VM (build 21.0.2+13-58, mixed mode, sharing)
system2: grep Mem /proc/meminfo MemTotal: 65562564 kB MemFree: 4183688 kB MemAvailable: 42483060 kB
/usr/lib64/jvm/jre-11-openjdk/bin/java -XX:+PrintFlagsFinal -version | grep HeapSize size_t ErgoHeapSizeLimit = 0 {product} {default} size_t HeapSizePerGCThread = 43620760 {product} {default} size_t InitialHeapSize = 1052770304 {product} {ergonomic} size_t LargePageHeapSizeThreshold = 134217728 {product} {default} size_t MaxHeapSize = 16785604608 {product} {ergonomic} uintx NonNMethodCodeHeapSize = 5836300 {pd product} {ergonomic} uintx NonProfiledCodeHeapSize = 122910970 {pd product} {ergonomic} uintx ProfiledCodeHeapSize = 122910970 {pd product} {ergonomic} size_t ShenandoahSoftMaxHeapSize = 0 {manageable} {default} openjdk version "11.0.17" 2022-10-18 OpenJDK Runtime Environment (build 11.0.17+0-suse-150000.3.86.2-x8664) OpenJDK 64-Bit Server VM (build 11.0.17+0-suse-150000.3.86.2-x8664, mixed mode)
/opt/jdk-21.0.2/bin/java -XX:+PrintFlagsFinal -version | grep HeapSize size_t ErgoHeapSizeLimit = 0 {product} {default} size_t HeapSizePerGCThread = 43620760 {product} {default} size_t InitialHeapSize = 1056964608 {product} {ergonomic} size_t LargePageHeapSizeThreshold = 134217728 {product} {default} size_t MaxHeapSize = 16785604608 {product} {ergonomic} size_t MinHeapSize = 8388608 {product} {ergonomic} uintx NonNMethodCodeHeapSize = 5839372 {pd product} {ergonomic} uintx NonProfiledCodeHeapSize = 122909434 {pd product} {ergonomic} uintx ProfiledCodeHeapSize = 122909434 {pd product} {ergonomic} size_t SoftMaxHeapSize = 16785604608 {manageable} {ergonomic} openjdk version "21.0.2" 2024-01-16 OpenJDK Runtime Environment (build 21.0.2+13-58) OpenJDK 64-Bit Server VM (build 21.0.2+13-58, mixed mode, sharing)
system3 rasberry Pi 4b: sudo grep Mem /proc/meminfo MemTotal: 3930868 kB MemFree: 1572248 kB MemAvailable: 2408712 kB
/usr/bin/java -XX:+PrintFlagsFinal -version | grep HeapSize size_t ErgoHeapSizeLimit = 0 {product} {default} size_t HeapSizePerGCThread = 33554432 {product} {default} size_t InitialHeapSize = 62914560 {product} {ergonomic} size_t LargePageHeapSizeThreshold = 134217728 {product} {default} size_t MaxHeapSize = 1006632960 {product} {ergonomic} uintx NonNMethodCodeHeapSize = 5242880 {pd product} {default} uintx NonProfiledCodeHeapSize = 0 {pd product} {ergonomic} uintx ProfiledCodeHeapSize = 0 {pd product} {ergonomic} openjdk version "11.0.21" 2023-10-17 OpenJDK Runtime Environment (build 11.0.21+9-post-Raspbian-1deb10u1) OpenJDK Server VM (build 11.0.21+9-post-Raspbian-1deb10u1, mixed mode)
All JVMs on the three systems behave as expected. The JVMs (OpenJDK 11 or 21.02.2) set the maximum heap size of 1/4 of physical memory on each machine. In case of the weakest computer in the list, the 4 GiB Raspi, the JVM from the OpenJDK selects 1 GiB for the MaxHeapSize value for example. Well, 1 GiB of allowed Java Heap should be plenty enough memory for Jacksum to do the job!
I did some quick tests and limit the max Java Heap to 1 GB on my box, but I cannot reproduce the issue for now. If you gather a Heap Dump from a faulty run, I could have a look at the dump.
You can allowing the JVM to write a heap dump (a .hprof file) by setting an appropriate JVM option. Please put that option after the java
command and just before the -jar
option. Example:
$ java -XX:+HeapDumpOnOutOfMemoryError -jar jacksum-3.7.0.jar ...
After how many seconds/minutes/hours is that reproducible?
for system1 after round about 1h the erros occours: $ /usr/lib64/jvm/jre-11-openjdk/bin/java -XX:+HeapDumpOnOutOfMemoryError -jar /usr/local/bin/jacksum-3.7.0.jar --threads-hashing max --threads-reading max -a md5 --encoding hex --output-file-overwrite /tmp/jacksum-3.7.0-hashes.txt --error-file /tmp/jacksum.130322.error.txt --file-list-format list --file-list /tmp/jacksum-3.7.0-filelist.txt --verbose summary,warnings --header --path-relative-to /home/OldHome/ /home/OldHome/
Sun Feb 4 01:27:28 CET 2024 File PID: 130322 /tmp/jacksum.130322.output.txt java.lang.OutOfMemoryError: Java heap space Dumping heap to java_pid130326.hprof ... Heap dump file created [8929970313 bytes in 59.886 secs]
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Thread-0"
8929970313 4. Feb 02:38 java_pid130326.hprof "File size too big: 25 MB are allowed, 561 MB were attempted to upload." 342787332 Feb 4 02:38 java_pid130326.hprof.xz with https://wetransfer.com: https://we.tl/t-LRxp41yuxk
$ /opt/jdk-21.0.2/bin/java -XX:+HeapDumpOnOutOfMemoryError -jar /usr/local/bin/jacksum-3.7.0.jar --threads-hashing max --threads-reading max -a md5 --encoding hex --output-file-overwrite /tmp/jacksum-3.7.0-hashes.txt --error-file /tmp/jacksum.134719.error.txt --file-list-format list --file-list /tmp/jacksum-3.7.0-filelist.txt --verbose summary,warnings --header --path-relative-to /home/OldHome/ /home/OldHome/
Sun Feb 4 09:55:01 CET 2024 File PID: 134719 /tmp/jacksum.134719.output.txt java.lang.OutOfMemoryError: Java heap space Dumping heap to java_pid134723.hprof ... Heap dump file created [8931432106 bytes in 49.558 secs]
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Thread-0"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "pool-1-thread-4" ^C^C^COpenJDK 64-Bit Server VM warning: Exception java.lang.OutOfMemoryError occurred dispatching signal SIGINT to handler- the VM may need to be forcibly terminated OpenJDK 64-Bit Server VM warning: Exception java.lang.OutOfMemoryError occurred dispatching signal SIGINT to handler- the VM may need to be forcibly terminate
8931432106 Feb 4 11:10 java_pid134723.hprof 334900120 Feb 4 11:10 java_pid134723.hprof.xz with https://wetransfer.com: https://we.tl/t-9IzPHxIc1D
system2 with jdk21 did not throw an error till now, routhly the same start time.
Thank you for gathering the Java Heap Dumps. Much appreciated.
I have loaded and analyzed the Java Heap dumps using MAT. 99,97 % (1st dump) resp. 99,96 % (2nd dump) of the Java heap is consumed by an instance of a LinkedBlockingQueue called the workQueue which is a member of a java.util.concurrent.ThreadPoolExececutor instance. That queue is used for holding tasks and handing off to worker threads.
In Jacksum the instance is being created by calling Executors.newFixedThreadPool(cores); see also
In your cases the producer seems to be much faster than the consumers and the queue have been flooded with 8.3 millions tasks. Each task also contain the path to each file that needs to be hashed, and there are many paths with more than 2 k String length. So it gets continuously flooded until the Out Of Memory Error occured and Jacksum didn't even get a chance to work on the remaining tasks.
By default the LinkedBlockingQueue is unbounded which is the reason for the issue I think.
I found a description of that problem (and some potential mitigations) very well described at https://medium.com/@amardeepbhowmick92/task-queuing-in-executors-newfixedthreadpool-31bc8c24b4d2
So it seems that fortunately it is not a memory leak by definition, because all data is still needed to do the job, but unfortunately, due to unforseen, but likely conditions (clients are still busy, and producer is unstoppable hardworking to produce work for others) it could lead to unexpected high memory consumption which leads to an OOM in worst case.
I still cannot reproduce the issue with any of my hardware, but I have created a potential fix for that, so that the producer will need to help with the work before it can delegate work to others if they are all busy.
The fix is incluced in the binary attached to this note. Would be great if you could test it and see whether it can avoid the OOM. In theory it should work. Please let's see whether it works in real life as well.
It is not allowed to upload .jar to notes, so I have packed it using zip as a workaround. jacksum-3.8.0-SNAPSHOT-20240206.jar.zip
Great! Starting test runs...
Unfortunately it does not work out, sorry.
java -version openjdk version "11.0.17" 2022-10-18 OpenJDK Runtime Environment (build 11.0.17+0-suse-150000.3.86.2-x8664) OpenJDK 64-Bit Server VM (build 11.0.17+0-suse-150000.3.86.2-x8664, mixed mode)
system1: Wed Feb 7 11:05:37 CET 2024 File PID: 11550 /tmp/jacksum.11550.output.txt java.lang.OutOfMemoryError: Java heap space Dumping heap to java_pid11554.hprof ... Heap dump file created [8914258100 bytes in 61.682 secs]
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Thread-0" ^Ccat: /tmp/jacksum.11550.txt: No such file or directory Wed Feb 7 12:57:30 CET 2024
8914258100 Feb 7 12:16 java_pid11554.hprof 330988948 Feb 7 12:16 java_pid11554.hprof.xz https://we.tl/t-m7wdxtSmng
yes, it looks like my change had no effect at all. It still looks the same as before the change. I am going to have a deeper look into it at the weekend. Thanks for your patience!
Sorry for causing that much efford and disturbing your weekend.
In the meanwhile system2 was also finished
system2: /usr/lib64/jvm/jre-11-openjdk/bin/java -XX:+HeapDumpOnOutOfMemoryError -jar /usr/local/bin/jacksum-3.8.0-SNAPSHOT-20240206.jar --threads-hashing max --threads-reading max -a md5 --encoding hex --output-file-overwrite /tmp/jacksum-3.8.0-SNAPSHOT-20240206-hashes.txt --error-file /tmp/jacksum.9780.error.txt --file-list-format list --file-list /tmp/jacksum-3.8.0-SNAPSHOT-20240206-filelist.txt --verbose summary,warnings --header --path-relative-to XY XY
Wed Feb 7 16:49:56 CET 2024 File PID: 9780 /tmp/jacksum.9780.output.txt java.lang.OutOfMemoryError: Java heap space Dumping heap to java_pid9784.hprof ... Heap dump file created [19016693628 bytes in 28.497 secs]
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Thread-0"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "pool-1-thread-6"
19016693628 Feb 7 18:15 java_pid9784.hprof 740732288 Feb 7 18:15 java_pid9784.hprof.xz https://we.tl/t-N55uWYGNkm
Hi hi! Might I rais my hand in request for a bug fix?
My apologies @BlindDuck, I haven't got round to it yet. I will update this ticket soon.
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Thread-0"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "pool-1-thread-4"
Steps to reproduce the behavior:
The directory contains 1309,52GB 7020091 files 281204 subdirs the parent dir has 2480,08GB 14922770 files 672472 subdirs
an other dir contains 1289,23GB 1287811 files 115790 subdirs
2/4. See the output/error: " Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Thread-0"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "pool-1-thread-4" "
Expected behavior: a smooth run through as on the other processed directories. i.e /usr/lib jacksum.95988.report.txt jacksum.95988.output.txt jacksum.95988.error.txt jacksum.95932.output.txt jacksum.95932.error.txt
env.txt