nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
527 stars 63 forks source link

[Windows] Dorado --resume-from fails to open interrupted bam #1028

Closed fidibidi closed 1 month ago

fidibidi commented 1 month ago

Issue Report

Please describe the issue:

Via powershell; Dorado run was interrupted a couple days into basecalling; resulting in unfinished bam file. I was hoping to resume from this file, but attempts to use the --resume-from command have failed.

Steps to reproduce the issue:

essentially just run the command but with the --resume-from. I've wondered if i'm just incorrectly pathing to the file. but attempts at declaring absolute path the the data file haven't worked either

C:\Users\ONT\A1815.local.bam .\A1815.local.bam

It is worth noting perhaps that the bam file generated is rather large... 238GB, which I don't think should be the case...

Run environment:

then to resume:

dorado basecaller hac,5mCG_5hmCG F:\Data\081524_P2_A1815\081524_P2_A1815\20240815_1132_P2S-00718-A_PAY91898_867b13e9/pod5 --reference C:\Users\ONT\Documents\GCA_000001405.15_GRCh38_no_alt_analysis_set.fna --resume-from A1815.local.bam > F:\Data\081524_P2_A1815\bam\A1815.bam

Logs

[2024-09-19 12:10:46.337] [info] Running: "basecaller" "hac,5mCG_5hmCG" "F:\Data\081524_P2_A1815\081524_P2_A1815\20240815_1132_P2S-00718-A_PAY91898_867b13e9/pod5" "--reference" "C:\Users\ONT\Documents\GCA_000001405.15_GRCh38_no_alt_analysis_set.fna" "--resume-from" "A1815.local.bam" [2024-09-19 12:10:46.405] [info] - downloading dna_r10.4.1_e8.2_400bps_hac@v5.0.0 with httplib [2024-09-19 12:10:46.870] [info] - downloading dna_r10.4.1_e8.2_400bps_hac@v5.0.0_5mCG_5hmCG@v1 with httplib [2024-09-19 12:10:47.275] [info] Normalised: chunksize 10000 -> 9996 [2024-09-19 12:10:47.276] [info] Normalised: overlap 500 -> 498 [2024-09-19 12:10:47.277] [info] > Creating basecall pipeline [2024-09-19 12:10:54.926] [info] cuda:0 using chunk size 9996, batch size 1152 [2024-09-19 12:10:55.517] [info] cuda:0 using chunk size 4998, batch size 1408 [2024-09-19 12:11:38.079] [info] > Inspecting resume file... [2024-09-19 12:11:43.053] [error] finalise() not called on a HtsFile. [2024-09-19 12:11:43.054] [error] Could not open file: A1815.local.bam

fidibidi commented 1 month ago

Something worth noting have referenced other similar issues: #604 #427

This appears to be an issue with running dorado via Powershell... I ran a small test via CMD, in which I interrupted a run, and was able to successfully resume from the incomplete bam file...

I feel like it'd be very helpful for future folks to have this stated in the README.md for running dorado on Windows machines... save many days of troubleshooting.