java.util.concurrent.CancellationException in step3

yuntianf commented 1 year ago

Hi, I met a java.util.concurrent.CancellationException during step3 and I couldn't figure out how to solve it.

### Step 3 - UMI assignment to Nanopore SAM records ###

version=2.1
build.date=2023-01-24T09:31:20Z
READING config FILE: config.xml
Trying current working directory: /home/softwares/sicelore-2.1/config.xml
Not found trying application root:/home/softwares/sicelore-2.1/Jar/config.xml
O.K. Found and using /home/softwares/sicelore-2.1/Jar/config.xml
Log written to /home/real/umis/passedParsed.bam.log
READING umi clustering edit distances FILE: umiClusteringEditDistances.xml
Trying current working directory: /home/softwares/sicelore-2.1/umiClusteringEditDistances.xml
Not found trying application root:/home/softwares/sicelore-2.1/Jar/umiClusteringEditDistances.xml
O.K. Found and using /home/softwares/sicelore-2.1/Jar/umiClusteringEditDistances.xml
................11:07:44.740 [ForkJoinPool-10-worker-3] FATAL com.rw.umifinder.analyzers.clustering.UmiClustering - One Clustering Job failed:

java.util.concurrent.CancellationException: null
        at java.util.concurrent.ForkJoinTask.uncheckedThrow(ForkJoinTask.java:602) ~[?:?]
        at java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:567) ~[?:?]
        at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:670) ~[?:?]
        at java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159) ~[?:?]
        at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173) ~[?:?]
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) ~[?:?]
        at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596) ~[?:?]
        at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:679) ~[?:?]
        at com.rw.umifinder.analyzers.clustering.ClusterOne_MyClustering.clusterLocal(ClusterOne_MyClustering.java:196) ~[NanoporeBC_UMI_finder-2.1.jar:?]
        at com.rw.umifinder.analyzers.clustering.ClusterOne_MyClustering.call(ClusterOne_MyClustering.java:72) ~[NanoporeBC_UMI_finder-2.1.jar:?]
        at com.rw.umifinder.analyzers.clustering.ClusterOne_MyClustering.call(ClusterOne_MyClustering.java:37) ~[NanoporeBC_UMI_finder-2.1.jar:?]
        at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131) ~[guava-31.1-jre.jar:?]
        at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:74) ~[guava-31.1-jre.jar:?]
        at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82) ~[guava-31.1-jre.jar:?]
        at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1423) ~[?:?]
        at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:387) ~[?:?]
        at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1311) ~[?:?]
        at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1841) ~[?:?]
        at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1806) ~[?:?]
        at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:177) ~[?:?]
11:08:00.657 [ForkJoinPool-10-worker-22] FATAL com.rw.umifinder.analyzers.clustering.UmiClustering - One Clustering Job failed:

java.util.concurrent.CancellationException: null
        at java.util.concurrent.ForkJoinTask.uncheckedThrow(ForkJoinTask.java:602) ~[?:?]
        at java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:567) ~[?:?]
        at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:670) ~[?:?]
        at java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159) ~[?:?]
        at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173) ~[?:?]
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) ~[?:?]
        at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596) ~[?:?]
        at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:679) ~[?:?]
        at com.rw.umifinder.analyzers.clustering.ClusterOne_MyClustering.clusterLocal(ClusterOne_MyClustering.java:196) ~[NanoporeBC_UMI_finder-2.1.jar:?]
        at com.rw.umifinder.analyzers.clustering.ClusterOne_MyClustering.call(ClusterOne_MyClustering.java:72) ~[NanoporeBC_UMI_finder-2.1.jar:?]
        at com.rw.umifinder.analyzers.clustering.ClusterOne_MyClustering.call(ClusterOne_MyClustering.java:37) ~[NanoporeBC_UMI_finder-2.1.jar:?]
        at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131) ~[guava-31.1-jre.jar:?]
        at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:74) ~[guava-31.1-jre.jar:?]
        at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82) ~[guava-31.1-jre.jar:?]
        at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1423) ~[?:?]
        at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:387) ~[?:?]
        at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1311) ~[?:?]
        at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1841) ~[?:?]
        at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1806) ~[?:?]
        at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:177) ~[?:?]

To help you repeat this error, here is the output bam from step2: https://drive.google.com/drive/folders/11P4r24DxEHM1tCm66vzPn0KXCoXEi0N6?usp=sharing Any thoughts or comments are appreciated!

Thanks

RainerWaldmann commented 1 year ago

Sorry for the late reply. I don't get any error with your bam file.

SAM records scanned:    50965
SAM records with BC:    26241
SAM records with UMI:   7709
SAM records with readseq UMI:   18532

Looks like your bam is from 5' barcoded data. Did you add --fivePbc to the command line ? I can share the output from the UMI assignment if you are interested.

Rainer

kwglam commented 1 year ago

Hi, I think I encountered a similar issue here:

version=2.1 build.date=2023-01-24T09:31:20Z READING config FILE: config.xml Trying current working directory: /home/config.xml Not found trying application root:/home/packages/sicelore-2.1/Jar/config.xml O.K. Found and using /home/packages/sicelore-2.1/Jar/config.xml Log written to /home/umiparsed/passedParsed_chr6.bam.log READING umi clustering edit distances FILE: umiClusteringEditDistances.xml Trying current working directory: /home/umiparsed/umiClusteringEditDistances.xml Not found trying application root:/home/packages/sicelore-2.1/Jar/umiClusteringEditDistances.xml O.K. Found and using /home/packages/sicelore-2.1/Jar/umiClusteringEditDistances.xml ............................................................................................................................................................................................................................................................................................................................................................................................................................................... 2,237,448 SAM records processed in PT-1M-55.823493154S Umis found: 1,035,350 (46.27 % ) Umis found by clustering: 1,035,350 (46.27 % ) .................................................................................................................................................................................................................................................................................................................................................................................................................................................. 4,481,162 SAM records processed in PT-2M-30.789021144S Umis found: 2,000,564 (44.64 % ) Umis found by clustering: 2,000,564 (44.64 % ) .................................................................................................................................................................................................................................................................................................................... 5,778,547 SAM records processed in PT-2M-57.992996819S Umis found: 2,961,601 (51.25 % ) Umis found by clustering: 2,961,601 (51.25 % ) .....................................................................................................................................................................................09:28:17.113 [ForkJoinPool-10-worker-145] FATAL com.rw.umifinder.analyzers.clustering.UmiClustering - One Clustering Job failed:

java.util.concurrent.CancellationException: null at java.util.concurrent.ForkJoinTask.uncheckedThrow(ForkJoinTask.java:602) ~[?:?] at java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:567) ~[?:?] at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:670) ~[?:?] at java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159) ~[?:?] at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173) ~[?:?] at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) ~[?:?] at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596) ~[?:?] at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:679) ~[?:?] at com.rw.umifinder.analyzers.clustering.ClusterOne_MyClustering.clusterLocal(ClusterOne_MyClustering.java:196) ~[NanoporeBC_UMI_finder-2.1.jar:?] at com.rw.umifinder.analyzers.clustering.ClusterOne_MyClustering.call(ClusterOne_MyClustering.java:72) ~[NanoporeBC_UMI_finder-2.1.jar:?] at com.rw.umifinder.analyzers.clustering.ClusterOne_MyClustering.call(ClusterOne_MyClustering.java:37) ~[NanoporeBC_UMI_finder-2.1.jar:?] at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131) ~[guava-31.1-jre.jar:?] at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:74) ~[guava-31.1-jre.jar:?] at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82) ~[guava-31.1-jre.jar:?] at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1423) ~[?:?] at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:387) ~[?:?] at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1311) ~[?:?] at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1841) ~[?:?] at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1806) ~[?:?] at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:177) ~[?:?] 09:50:10.882 [ForkJoinPool-10-worker-41] FATAL com.rw.umifinder.analyzers.clustering.UmiClustering - One Clustering Job failed:

java.lang.OutOfMemoryError: Java heap space at com.rw.clustering.ClusteringEditDistanceBase.getTransposedEditDistance(ClusteringEditDistanceBase.java:129) ~[NanoporeBC_UMI_finder-2.1.jar:?] at com.rw.clustering.ClusteringEditDistanceBase.generateDistanceMatrixParalell(ClusteringEditDistanceBase.java:198) ~[NanoporeBC_UMI_finder-2.1.jar:?] at com.rw.clustering.ClusteringEditDistanceBase.generateDistanceMatrix(ClusteringEditDistanceBase.java:167) ~[NanoporeBC_UMI_finder-2.1.jar:?] at com.rw.clustering.DistanceMatrix.(DistanceMatrix.java:77) ~[NanoporeBC_UMI_finder-2.1.jar:?] at com.rw.umifinder.analyzers.clustering.ClusterOne_MyClustering.call(ClusterOne_MyClustering.java:70) ~[NanoporeBC_UMI_finder-2.1.jar:?] at com.rw.umifinder.analyzers.clustering.ClusterOne_MyClustering.call(ClusterOne_MyClustering.java:37) ~[NanoporeBC_UMI_finder-2.1.jar:?] at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131) ~[guava-31.1-jre.jar:?] at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:74) ~[guava-31.1-jre.jar:?] at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82) ~[guava-31.1-jre.jar:?] at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1423) ~[?:?] at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:387) ~[?:?] at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1311) ~[?:?] at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1841) ~[?:?] at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1806) ~[?:?] at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:177) ~[?:?]

I ran this step with chromosome-split bam files. This error occurred only when running on some chromosome like chr6 and chr10. Any thoughts on how to fix it? I can send you the bam file if needed. Thanks a lot!!

RainerWaldmann commented 1 year ago

Hi, Could you share the bam file? Rainer

kwglam commented 1 year ago

Hi, Could you share the bam file? Rainer

Sure. Any chance if I can send you a link via email? Thanks!

RainerWaldmann commented 1 year ago

My email address: waldmann@ipmc.cnrs.fr Rainer

RainerWaldmann commented 1 year ago

I don't get an error with your bam

================ DONE ======================= SAM records scanned: 14365703 SAM records with BC: 10769173 SAM records with UMI: 8181084 SAM records with readseq UMI: 2588089

I oversaw one line in the exception you got: java.lang.OutOfMemoryError: Java heap space.

What is the in the config.xml and the available RAM of the computer you are using?

Rainer

kwglam commented 1 year ago

I don't get an error with your bam

================ DONE =======================

SAM records scanned: 14365703 SAM records with BC: 10769173 SAM records with UMI: 8181084 SAM records with readseq UMI: 2588089 I oversaw one line in the exception you got: java.lang.OutOfMemoryError: Java heap space.

What is the in the config.xml and the available RAM of the computer you are using?

Rainer

I was using 2000000 for , and I set -Xmx300g for the run. My computer has 700G memory. BTW, I am using java v19. When I used v12, I received an error msg of "java.lang.UnsupportedClassVersionError: com/rw/parsermain/Main has been compiled by a more recent version of the Java Runtime (class file version 58.0)". I believe we need version >13. Thanks!

RainerWaldmann commented 1 year ago

I ran it with a chunk size of 250k and 300G of RAM. You might either reduce (e.g. 500,000) or increase the allocated RAM. If you have 700G of RAM, a chunk size of 1M and increasing the RAM might be the best option. RAM requirement depends on the dataset. If there is a gene with an enormous amount of reads that is expressed in most cells, the matrices for UMI clustering get big and this needs quite some RAM. If RAM is limited reducing the chunk size which is kept in RAM will reduce memory consumption. Even with chunk size of 100k it works fine. Only the UMI clustering of very highly expressed genes might be somewhat affected (e.g. Actin ...). Actually you need at least Java 13. We need to revise the readme.

kwglam commented 1 year ago

Thanks, Rainer. I ran the bam film that I sent to you with a chunk size of 250k and 300G of RAM. It did start to generate a bam but again stalled for more than a day without generating anything. When I checked it with the real-time view, it seemed the program was occupying the RAMs that I assigned to but CPUs were not running. Can I know which java version you are using?

RainerWaldmann commented 1 year ago

I am using Java 14. This should not make a difference. When a Java program starts running out of RAM, it initially gets more and more busy with garbage collection and does not advance anymore. As I mentioned, memory requirement for the clustering of UMIs depends on how many reads were generated for a specific genomic region (gene). In our hands with 250k chunk size and 300G RAM we never had RAM problems. However, RAM requirement might be higher in some cases.

Is there something special with your sample? Did you do a targeted sequencing ? Targeted sequencing is an extreme case where a huge number of reads are on one genomic region resulting in huge distance matrices.

Allow all the RAM you have on your machine (e.g. 680G) run it with 250k junk size. If there is still not enough RAM, reduce the chunk size to 50k.

kwglam commented 1 year ago

Thanks for your reply. I tried to run the script again with different java versions last week. I used 250k as chunk size and 300G for ram. Interestingly, the program ran perfectly fine without any errors no matter I used the whole bam file or chr-split bam file when using java v14 instead of v19. Don't know what exactly the cause of the problem but it appears that java v19 does clash with the program.

RainerWaldmann commented 1 year ago

Currently I am using Java 18. I tried it with Java 19.0.2. I don't see any problems. Looks like with your sample and settings RAM gets limiting with Java 19. There might be slight differences between different Java versions. As I mentioned before, some datasets (in particular targeted sequencing) have a high number of reads for a given genomic region. I changed the clustering code. I am now splitting abnormally huge clustering matrices to protect against high RAM usage. I am currently testing this and will release it soon.

kwglam commented 1 year ago

Thanks for spending time on this. I look forward to seeing the update. Thx!

ucagenomix / sicelore-2.1

java.util.concurrent.CancellationException in step3 #4

================ DONE ======================= SAM records scanned: 14365703 SAM records with BC: 10769173 SAM records with UMI: 8181084 SAM records with readseq UMI: 2588089

================ DONE =======================