Closed mattloose closed 11 months ago
Hi Matt, what's the full command you are running?
I took the previous command:
dorado basecaller -x cuda:all -r dorado-0.3.1-linux-x64/models/dna_r10.4.1_e8.2_400bps_sup\@v4.2.0 ../path/to/pod5 --modified-bases 5mCG_5hmCG > calls.bam
and switched it to:
dorado basecaller -x cuda:all -r dorado-0.3.1-linux-x64/models/dna_r10.4.1_e8.2_400bps_sup\@v4.2.0 ../path/to/pod5 --modified-bases 5mCG_5hmCG --resume-from calls.bam > calls_continued.bam
Hi @mattloose - can you post the SAM header from your first basecaller command?
we have a parsing bug which doesn't play well with optional arguments being before the positional args in the original cmdline. one workaround is to update the header manually and copy over the remaining records and then use resume from that BAM.
Basically the CL key in the PG
line in your BAM will have
dorado basecaller -x cuda:all -r dorado-0.3.1-linux-x64/models/dna_r10.4.1_e8.2_400bps_sup\@v4.2.0
but the code is expecting it to be dorado basecaller dorado-0.3.1-linux-x64/models/dna_r10.4.1_e8.2_400bps_sup\@v4.2.0 <data> <optional args>
something like this should do the trick
samtools view -H calls.bam > tmp.sam
edit tmp.sam
to (1) remove the PG line added by samtools view and (2) move the optional args before model name in the CL key to the end. Then
samtools reheader tmp.sam calls.bam > calls_fixed.bam
after that, you can resume from calls_fixed.bam
OK - will test today!
I have the same problem. Could you please explain in further detail how to do the modification to the header?
After running
samtools view -H calls.bam > tmp.sam
the header in "tmp.sam" looks like this:
@PG ID:samtools PN:samtools VN:1.17 CL:samtools view -H incomplete.bam
I modify the header like this:
@PG ID:samtools PN:samtools VN:1.17 CL:dorado basecaller dorado-0.3.1-linux-x64/models/dna_r10.4.1_e8.2_400bps_sup\@v4.2.0
After trying to apply the fix, with:
samtools reheader tmp.sam calls.bam > calls_fixed.bam
I get the following error:
samtools reheader: input file 'calls.bam' must be BAM or CRAM
I think I may not be modifying the header correctly. Any suggestions? Thank you!
Hi @PJV-Ecu - this issue is fixed in version 0.3.2. Could you update the binary and test again?
Dear @tijyojwad - thanks for providing support. My first version of Dorado is 0.3.2+d8660a3. I reinstalled just in case.
The electric supply was gone in the middle of the basecalling process (to my dismay) and I have been trying recovery of the invested hours with the "--resume-from" option. I am unable to recover the .bam file header with the provided instructions. Please let me know if you have any further advice. Thank you!
I tried again from scratch and the process was killed after 2 days of processing. I cannot restart from the incomplete .bam file, as the error persists:
[2023-07-30 14:22:39.967] [error] Required key CL not found in header of calls.bam
I'm using Dorado version 0.3.2+d8660a3
Hi @PJV-Ecu - can you post your original and resume cmd? I'm trying some tests locally and I'm able to resume (simplex basecalling)
Dear tijyojwad,
Thanks for the provided options and insights. Unfortunately, I have checked with my coauthors and they are adamant about releasing a whole unpublished bacterial genome.
I have looked into the recommendation provided by @vellamike here:
https://github.com/nanoporetech/dorado/issues/320#issuecomment-1664709541
which consists of (literal copy):
Is this advisable and consistent with Dorado's algorithm?
Do you advise on trying version 0.3.4. ?:
https://cdn.oxfordnanoportal.com/software/analysis/dorado/preview/dorado-0.3.4-rc1-linux-x64.tar.gz
Thank you
Hi @PJV-Ecu - indeed that advice still holds true and is the recommended setup to make your basecalling runs more robust. In case any of those split runs fail, you have to resume or re-basecall a much smaller file compared to the whole dataset.
As for the resume feature, without your particular repro case it's hard to debug what's going on. Perhaps you could try resume with another unrestricted dataset, and if it doesn't work you could share that?
Closing due to inactivity. @PJV-Ecu we released a new version of dorado (v0.4.0) in case you are still running into issues and want to give it a try.
Running --resume-from BAMFILE.BAM gives a filesystem error - cannot make canonical path: No such file or directory [-x]
Am I doing something wrong?