nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
527 stars 63 forks source link

dorado demux Failed to open file "...calls.bam" : Exec format error #604

Closed DntBScrdDv closed 7 months ago

DntBScrdDv commented 9 months ago

Hi all, as above.

I ran dorado basecaller via windows PowerShell, and tried to run dorado demux, but get the error above...

.\dorado.exe basecaller hac ...\all\pod5 --kit-name SQK-RBK110-96 > ...\calls.bam

I don't have access to a cluster so had to run on my laptop. It took 17 hours, so I really don't want to have to re-run the basecaller - is there any way to recover this?

Thanks

DntBScrdDv commented 9 months ago

Hmmm... it seems the problem might be PowerShell... if I run a small subset on PowerShell or CMD, it works with CMD but not PowerShell...

Why is this? How can I recover the previous .bam file?

HalfPhoton commented 9 months ago

Hi @DntBScrdDv ,

Is the existing bam file corrupted?

Can you use --resume-from ?

Kind regards, Rich

DntBScrdDv commented 9 months ago

Hi @HalfPhoton, thanks for the reply.

Yes, I tried using SAM tools to inspect it and get the same errors.

Trying resume in either cmd or PowerShell gives:

[error] Could not open file: D:...pod5\calls_old.bam

I ended up re-running the whole analysis via cmd, and it worked fine. Interestingly the final bam file produce via cmd is almost exactly half the size of the unusable file produced when running dorado via PowerShell.

I'm guessing that it's therefore a problem with running dorado via PowerShell. Though again I don't know why.

Thanks,

HalfPhoton commented 9 months ago

Hi @DntBScrdDv ,

Thanks for updating us your progress with his issue.

We'll take a look at what's happening when using PowerShell.

Kind regards, Rich

DntBScrdDv commented 9 months ago

Many thanks, @HalfPhoton ,

Let me know if you want me to send the files.

All the best,

:)

HalfPhoton commented 7 months ago

HI @DntBScrdDv, Does explicitly setting the output encoding of the pipe operator help?

dorado ... --emit-sam | out-file -encoding ASCII .sam
samuelmontgomery commented 7 months ago

I am getting the same issue when running dorado basecaller via Powershell (after 4.5 days of modified basecalling.. sigh) Never had an issue on CMD - but it doesn't display the progress bar properly so I foolishly switched

@HalfPhoton i'll try on a subset of data and see how it goes

samuelmontgomery commented 7 months ago

I can confirm that basecalling the same subset in PowerShell:

dorado.exe basecaller sup,6mA,5mC_5hmC --no-trim --kit-name SQK-RBK114-24 --recursive --batchsize 704 .\pod5 --emit-sam > .\test_sam.sam results in Exec format error when trying to demux the reads

dorado.exe basecaller sup,6mA,5mC_5hmC --no-trim --kit-name SQK-RBK114-24 --recursive --batchsize 704 .\pod5 --emit-sam | out-file -encoding ASCII .\test_ascii.sam results in a working file to demux the reads

I have not had any issues running in CMD - so seems to be PowerShell specific, and isn't affected by running as admin

DntBScrdDv commented 7 months ago

Thanks for the message - I've managed to find space on an ancient cluster we have access to so have not had to retry on my laptop since... I will let you know if I do, though @samuelmontgomery seems to suggest this option works?

HI @DntBScrdDv, Does explicitly setting the output encoding of the pipe operator help?

dorado ... --emit-sam | out-file -encoding ASCII .sam
jonhultqvist commented 7 months ago

I have experienced the same issue when basecalling using dorado duplex and dorado "simplex" in PowerShell. Switching to CMD fixed the problem.

DntBScrdDv commented 7 months ago

I'm going to close this issue, as the solution proposed by @HalfPhoton seems to work. Probably worth including in the instructions for running dorado on PowerShell though...