hartwigmedical / hmftools

Various algorithms for analysing genomics data
GNU General Public License v3.0
181 stars 56 forks source link

cobalt error on long read data #407

Closed jiadong324 closed 1 year ago

jiadong324 commented 1 year ago

Dear developer,

I am running cobalt (v1.14) on tumor-normal paired long read data, and the errors are shown below. Is that due to the low coverage of normal sample (~20X)?

Looking forward to your reply, thanks!

Read Count Complete
13:31:42.574 INFO  [main] - calculating sample ratios for A-tumor
13:31:42.574 INFO  [main] - Applying ratio gc normalization
13:31:46.727 INFO  [main] - median read count: 3.77260592509782, sparse consolidation count: 100
13:31:46.848 INFO  [main] - using 25629 sparse consolidated buckets, from 2560357 raw ratios
13:31:47.084 INFO  [main] - Persisting tumor gc read count to ./cobalt
13:31:47.095 INFO  [main] - calculating sample ratios for A-normal
13:31:47.095 INFO  [main] - Applying ratio gc normalization
13:31:51.049 INFO  [main] - using 25629 sparse consolidated buckets, from 2560359 raw ratios
13:31:51.097 ERROR [main] - [Thread[main,5,main]]: uncaught exception: java.util.NoSuchElementException
java.util.NoSuchElementException
        at java.base/java.util.ArrayList$Itr.next(ArrayList.java:1000)
        at com.google.common.collect.AbstractMapBasedMultimap$WrappedCollection$WrappedIterator.next(AbstractMapBasedMultimap.java:473)
        at com.hartwig.hmftools.cobalt.lowcov.LowCoverageRatioBuilder.populateLowCoverageRatio(LowCoverageRatioBuilder.java:73)
        at com.hartwig.hmftools.cobalt.lowcov.LowCoverageRatioBuilder.<init>(LowCoverageRatioBuilder.java:34)
        at com.hartwig.hmftools.cobalt.ratio.RatioSupplier$SampleRatios.<init>(RatioSupplier.java:109)
        at com.hartwig.hmftools.cobalt.ratio.RatioSupplier$GermlineRatios.<init>(RatioSupplier.java:132)
        at com.hartwig.hmftools.cobalt.ratio.RatioSupplier.tumorNormalPair(RatioSupplier.java:222)
        at com.hartwig.hmftools.cobalt.CobaltApplication.run(CobaltApplication.java:138)
        at com.hartwig.hmftools.cobalt.CobaltApplication.main(CobaltApplication.java:87)
p-priestley commented 1 year ago

How long are your reads? We have not tested COBALT at all with long reads.

jiadong324 commented 1 year ago

I found this study https://www.nature.com/articles/s41587-022-01468-y uses hmftools for long read data, so that I want to have a try. For my data, the average length for both tumor and normal sample is 15kb. Since it successfully calculate A-tumor, I think read length might not be the issue.

p-priestley commented 1 year ago

I was not aware of that study. 15kb is very long since we count the read starts in each 1kb bins so I think it could cause problems, but you are right that it may not be the cause here.

Can you provide your exact command that you ran on the bam please?

jiadong324 commented 1 year ago

Here is the command:

java -jar ~/Biotools/hmftools/cobalt_v1.14.jar -threads 6 -reference A-normal -reference_bam ./A-normal.bam -tumor A-tumor -tumor_bam ./A-tumor.bam -output_dir ./A/cobalt -gc_profile ./GC_profile.1000bp.38.cnp

BTW, I can successfully run Amber with the same data.

hongwingl commented 1 year ago

Hi, this is a bug in cobalt, I will put out a fix

jiadong324 commented 1 year ago

@hongwingl Thanks! Looking forward to your update.

hongwingl commented 1 year ago

I made a release that fixes this bug: https://github.com/hartwigmedical/hmftools/releases/tag/cobalt-v1.14.1

jiadong324 commented 1 year ago

@hongwingl Great! Thanks for your help!

jiadong324 commented 1 year ago

@hongwingl It works right now. But I have another error running purple:

failed to load ref genome: org.apache.commons.cli.ParseException: Supplied ref genome must have associated sequence dictionary

What is the sequence dictionary?

hongwingl commented 1 year ago

It is looking for a .dict file that gives the collection of sequence information of the fasta file. There are many tools you can use to create it. Such as samtools-dict

jiadong324 commented 1 year ago

Thanks! But I also have the same error as mentioned in https://github.com/hartwigmedical/hmftools/issues/405. The GC profile is GC_profile.1000bp.38.cnp, and the reference file is the same one as I used for the read alignment.

hongwingl commented 1 year ago

Can I close this? Looks like has been fixed in #405?