Open yuntianf opened 1 year ago
Sorry for the late reply. I don't get any error with your bam file.
SAM records scanned: 50965
SAM records with BC: 26241
SAM records with UMI: 7709
SAM records with readseq UMI: 18532
Looks like your bam is from 5' barcoded data. Did you add --fivePbc to the command line ? I can share the output from the UMI assignment if you are interested.
Rainer
Hi, I think I encountered a similar issue here:
version=2.1 build.date=2023-01-24T09:31:20Z READING config FILE: config.xml Trying current working directory: /home/config.xml Not found trying application root:/home/packages/sicelore-2.1/Jar/config.xml O.K. Found and using /home/packages/sicelore-2.1/Jar/config.xml Log written to /home/umiparsed/passedParsed_chr6.bam.log READING umi clustering edit distances FILE: umiClusteringEditDistances.xml Trying current working directory: /home/umiparsed/umiClusteringEditDistances.xml Not found trying application root:/home/packages/sicelore-2.1/Jar/umiClusteringEditDistances.xml O.K. Found and using /home/packages/sicelore-2.1/Jar/umiClusteringEditDistances.xml ............................................................................................................................................................................................................................................................................................................................................................................................................................................... 2,237,448 SAM records processed in PT-1M-55.823493154S Umis found: 1,035,350 (46.27 % ) Umis found by clustering: 1,035,350 (46.27 % ) .................................................................................................................................................................................................................................................................................................................................................................................................................................................. 4,481,162 SAM records processed in PT-2M-30.789021144S Umis found: 2,000,564 (44.64 % ) Umis found by clustering: 2,000,564 (44.64 % ) .................................................................................................................................................................................................................................................................................................................... 5,778,547 SAM records processed in PT-2M-57.992996819S Umis found: 2,961,601 (51.25 % ) Umis found by clustering: 2,961,601 (51.25 % ) .....................................................................................................................................................................................09:28:17.113 [ForkJoinPool-10-worker-145] FATAL com.rw.umifinder.analyzers.clustering.UmiClustering - One Clustering Job failed:
java.util.concurrent.CancellationException: null at java.util.concurrent.ForkJoinTask.uncheckedThrow(ForkJoinTask.java:602) ~[?:?] at java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:567) ~[?:?] at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:670) ~[?:?] at java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159) ~[?:?] at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173) ~[?:?] at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) ~[?:?] at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596) ~[?:?] at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:679) ~[?:?] at com.rw.umifinder.analyzers.clustering.ClusterOne_MyClustering.clusterLocal(ClusterOne_MyClustering.java:196) ~[NanoporeBC_UMI_finder-2.1.jar:?] at com.rw.umifinder.analyzers.clustering.ClusterOne_MyClustering.call(ClusterOne_MyClustering.java:72) ~[NanoporeBC_UMI_finder-2.1.jar:?] at com.rw.umifinder.analyzers.clustering.ClusterOne_MyClustering.call(ClusterOne_MyClustering.java:37) ~[NanoporeBC_UMI_finder-2.1.jar:?] at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131) ~[guava-31.1-jre.jar:?] at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:74) ~[guava-31.1-jre.jar:?] at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82) ~[guava-31.1-jre.jar:?] at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1423) ~[?:?] at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:387) ~[?:?] at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1311) ~[?:?] at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1841) ~[?:?] at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1806) ~[?:?] at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:177) ~[?:?] 09:50:10.882 [ForkJoinPool-10-worker-41] FATAL com.rw.umifinder.analyzers.clustering.UmiClustering - One Clustering Job failed:
java.lang.OutOfMemoryError: Java heap space
at com.rw.clustering.ClusteringEditDistanceBase.getTransposedEditDistance(ClusteringEditDistanceBase.java:129) ~[NanoporeBC_UMI_finder-2.1.jar:?]
at com.rw.clustering.ClusteringEditDistanceBase.generateDistanceMatrixParalell(ClusteringEditDistanceBase.java:198) ~[NanoporeBC_UMI_finder-2.1.jar:?]
at com.rw.clustering.ClusteringEditDistanceBase.generateDistanceMatrix(ClusteringEditDistanceBase.java:167) ~[NanoporeBC_UMI_finder-2.1.jar:?]
at com.rw.clustering.DistanceMatrix.
I ran this step with chromosome-split bam files. This error occurred only when running on some chromosome like chr6 and chr10. Any thoughts on how to fix it? I can send you the bam file if needed. Thanks a lot!!
Hi, Could you share the bam file? Rainer
Hi, Could you share the bam file? Rainer
Sure. Any chance if I can send you a link via email? Thanks!
My email address: waldmann@ipmc.cnrs.fr Rainer
I don't get an error with your bam
I oversaw one line in the exception you got: java.lang.OutOfMemoryError: Java heap space.
What is the
Rainer
I don't get an error with your bam
================ DONE =======================
SAM records scanned: 14365703 SAM records with BC: 10769173 SAM records with UMI: 8181084 SAM records with readseq UMI: 2588089 I oversaw one line in the exception you got: java.lang.OutOfMemoryError: Java heap space.
What is the
in the config.xml and the available RAM of the computer you are using? Rainer
I was using 2000000 for
I ran it with a chunk size of 250k and 300G of RAM.
You might either reduce
Thanks, Rainer. I ran the bam film that I sent to you with a chunk size of 250k and 300G of RAM. It did start to generate a bam but again stalled for more than a day without generating anything. When I checked it with the real-time view, it seemed the program was occupying the RAMs that I assigned to but CPUs were not running. Can I know which java version you are using?
I am using Java 14. This should not make a difference. When a Java program starts running out of RAM, it initially gets more and more busy with garbage collection and does not advance anymore. As I mentioned, memory requirement for the clustering of UMIs depends on how many reads were generated for a specific genomic region (gene). In our hands with 250k chunk size and 300G RAM we never had RAM problems. However, RAM requirement might be higher in some cases.
Is there something special with your sample? Did you do a targeted sequencing ? Targeted sequencing is an extreme case where a huge number of reads are on one genomic region resulting in huge distance matrices.
Allow all the RAM you have on your machine (e.g. 680G) run it with 250k junk size. If there is still not enough RAM, reduce the chunk size to 50k.
Thanks for your reply. I tried to run the script again with different java versions last week. I used 250k as chunk size and 300G for ram. Interestingly, the program ran perfectly fine without any errors no matter I used the whole bam file or chr-split bam file when using java v14 instead of v19. Don't know what exactly the cause of the problem but it appears that java v19 does clash with the program.
Currently I am using Java 18. I tried it with Java 19.0.2. I don't see any problems. Looks like with your sample and settings RAM gets limiting with Java 19. There might be slight differences between different Java versions. As I mentioned before, some datasets (in particular targeted sequencing) have a high number of reads for a given genomic region. I changed the clustering code. I am now splitting abnormally huge clustering matrices to protect against high RAM usage. I am currently testing this and will release it soon.
Thanks for spending time on this. I look forward to seeing the update. Thx!
Hi, I met a
java.util.concurrent.CancellationException
during step3 and I couldn't figure out how to solve it.To help you repeat this error, here is the output bam from step2: https://drive.google.com/drive/folders/11P4r24DxEHM1tCm66vzPn0KXCoXEi0N6?usp=sharing Any thoughts or comments are appreciated!
Thanks