nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
446 stars 54 forks source link

Failed to read bam records with unclassified files after demux #746

Closed RxLoutre closed 2 months ago

RxLoutre commented 2 months ago

Issue Report

Please describe the issue:

Hello ! I am testing dorado v0.6.0 to include it into my basecalling pipeline. To convert bam to fastq for both simplex and multiplex runs, I use samtools bam2fq. For multiplex run, bam2fq will run after dorado demux with providing a samplesheet.

It worked smoothly with dorado v0.5.0, but with dorado v0.6.0, the unclassified bam cannot be parsed by samtools bam2fq with the following error :

samtools bam2fq .test/20200101_0000_P2S-00867-A_PAQ90736_abcdefgh/sushi/bam/20200101_0000_P2S-00867-A_PAQ90736_abcdefgh_nobarcode_unclassified.bam > .test/20200101_0000_P2S-00867-A_PAQ90736_abcdefgh/sushi/bam/20200101_0000_P2S-00867-A_PAQ90736_abcdefgh_nobarcode_unclassified.fastq
samtools bam2fq: Failed to read bam record
samtools bam2fq: Error writing to FASTx files.: Numerical result out of range
[M::bam2fq_mainloop] discarded 0 singletons
[M::bam2fq_mainloop] processed 1 reads

However strangely enough, it worked for the other samples.

Do you have any clue of what could be going wrong ?

Thank you for you guidance.

Roxane

Steps to reproduce the issue:

Please list any steps to reproduce the issue.

Run environment:

Logs

Not sure if that is applicable here

tijyojwad commented 2 months ago

Hi @RxLoutre - thanks for reporting! I'll have a look at it

tijyojwad commented 2 months ago

Hi @RxLoutre I believe we've narrowed down the issue and will be fixing it in an upcoming patch release.

Just to confirm - were you running demux on an aligned BAM?

A temporary workaround would be to run demux with the --no-trim cmd

RxLoutre commented 2 months ago

Thanks @tijyojwad for the super quick fix ! :)

I am not sure, I do not think I had turned on alignment on this sub dataset but I cannot say with certainty. I could try with un-aligned bam to see.

Can you give me more details of the consequences of using --no-trim ? I myself have to make a release of our own pipeline, and I am not sure I want to wait the patch release of dorado, so I might as well use the --no-trim if it does not have too many unwanted side effects

Thank you,

Roxane

RxLoutre commented 2 months ago

Hmm, thinking about it more thoroughly, and looking at the help of --no-trim, I don't think I want to activate this option. I like to remove adapter sequence from our reads for sure. I will wait the patch and meanwhile, I will simply not convert the undetermined reads into fastq !

Best

tijyojwad commented 2 months ago

We're planning to release the patch by end of this week!

tijyojwad commented 2 months ago

Hi @RxLoutre - we just released dorado v0.6.1 yesterday with this fixed. You can find the binaries here https://github.com/nanoporetech/dorado?tab=readme-ov-file#installation