nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
499 stars 59 forks source link

Dorado failing on some datasets #553

Open tnn111 opened 9 months ago

tnn111 commented 9 months ago

I'm seeing runs with truncated bam/log files. The version is 0.5.1 and the command being run is:

dorado duplex --verbose sup,5mC_5hmC,6mA xad.pod5 > xad.bam 2> xad.log &

[2023-12-29 07:46:27.334] [debug] Sort channel 2985 [2023-12-29 07:46:27.336] [debug] Sorted channel 2985 [2023-12-29 07:46:42.574] [debug] Sort channel 2986 [2023-12-29 07:46:42.575] [debug] Sorted channel 2986 [2023-12-29 07:46:45.491] [debug] Sort channel 2987 [2023-12-29 07:46:45.491] [debug] Sorted channel 2987 [2023-12-29 07:46:55.143] [debug] Sort channel 2988 [2023-12-29 07:46:55.144] [debug] Sorted channel 2988 srun: error: nid001705: task 0: Segmentation fault srun: Terminating StepId=19811533.0

I am redoing datasets that were successfully basecalled to completion using dorado 0.4.x.

Suggestions?

tnn111 commented 9 months ago

PS: The problem is definitely data dependent. It happens 10-20% of the time.

vellamike commented 9 months ago

Thanks for reporting this and sorry about this issue, we've had another couple of reports of this issue and are working on a resolution.

tnn111 commented 9 months ago

Hi Mike,

I appreciate all the work you are all doing to make the models and the software better. Thank you and Happy New Year!

What I would really like when you resolve the issue is to know if this affected the quality of the data that was successfully basecalled or not? I’m not asking so I can hold anyone responsible; that would be ridiculous and wrong. But I’m a scientist and I’m trying to reprocess something like 20 Tbases (way over 100 PromethION flow cells by now) of metagenomic Nanopore sequencing to get more accurate results and methylation data and while I’d cringe and cry if I have to redo all of the ones that didn’t give an error when I ran them, I will do so if there’s a chance that the results will be significantly better as a result. I’ll take full responsibility for any decisions I make; I simply want them to be as informed as possible.

Thanks!

Torben

On Dec 29, 2023, at 12:17, Mike Vella @.***> wrote:

Thanks for reporting this and sorry about this issue, we've had another couple of reports of this issue and are working on a resolution.

— Reply to this email directly, view it on GitHub https://github.com/nanoporetech/dorado/issues/553#issuecomment-1872318413, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMXPRXAUF5GEVKCVH2F6Q3YL4QMDAVCNFSM6AAAAABBG53CXOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZSGMYTQNBRGM. You are receiving this because you authored the thread.

ymcki commented 9 months ago

What about 0.5.0? I am using 0.5.0 now. So far no errors for eight samples.

vellamike commented 9 months ago

Hi @tnn111

I am working on solving this issue (FYI this is a duplicate of !514).

I have looked into the cause and I can say that the issue does not affect the quality of data which was successfully basecalled,. The issue only occurs rarely with some reads, we will have a fix soon, there is no need to redo data which has run successfully.

vellamike commented 9 months ago

Hi @ymcki - the issue is also present in 0.5.0 but it's quite rare, so I'm not surprised that you've not seen any errors.

tnn111 commented 9 months ago

Hi Mike,

That’s great. I’ll test it out as soon as you release a fix.

The best estimate I have is that it occurs 10-20% of the time when I do duplex basecalling and simultaneous methylation detection for metagenomic reads. This estimate is based on more than 1 Tbase of data (5% of a very large metagenomic data set I am working with). This is of course still consistent with very rare reads, but in terms of the datasets affected, it looks pretty common. All of my data is from PromethION flow cells and I break the pod5 files up into 8 runs for each flow cell. Based on the numbers I have now, only ~10% made it through with all the runs completing.

Thanks, Torben

On Jan 4, 2024, at 07:19, Mike Vella @.***> wrote:

Hi @tnn111 https://github.com/tnn111 I am working on solving this issue (FYI this is a duplicate of !514).

I have looked into the cause and I can say that the issue does not affect the quality of data which was successfully basecalled,. The issue only occurs rarely with some reads, we will have a fix soon, there is no need to redo data which has run successfully.

— Reply to this email directly, view it on GitHub https://github.com/nanoporetech/dorado/issues/553#issuecomment-1877266966, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMXPRRGZZD5F7JHGCUJ4STYM3B6XAVCNFSM6AAAAABBG53CXOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZXGI3DMOJWGY. You are receiving this because you were mentioned.