nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
488 stars 59 forks source link

trim interval core dumped error #1020

Open valery-shap opened 5 days ago

valery-shap commented 5 days ago

Issue Report

Please describe the issue:

I'm trying to classify reads into their barcode groups during basecalling as part of the same command. But the process is aborted. Please provide a clear and concise description of the issue you are seeing and the result you expect. terminate called after throwing an instance of 'std::invalid_argument' what(): Trim interval 108-107 is invalid for sequence ATGTCCTGTACTTGGTTGGTTTATTGAAGCGGTATTTAACCACAAAGTTGTCGGTGTCTTTGTGGTTTTCGCATTTATCGTGAAACGCTTTCGCGTTTTTCGTGGCTTGGCAAGCAGGCACACGAAAAACGCGAAAGCGTTTCACGATAAATGCGAAAACCACAAAGACACCGACAACTTTC Aborted (core dumped)

Steps to reproduce the issue:

Please list any steps to reproduce the issue. Now I'm basecalling with a flag "--no-trim" and the process is not aborted. I read #539 but it was written that it would be fixed in the next version. I checked v.0.6.2 with the same data and it works without errors. Also, when using --no-trim during basecalling, but then this data was sent to dorado trim (0.8.0 version), there is no error too. But the number of demux reads after basecalling is different with the number of trimmed reads: 4330985 reads demuxed @ classifications/s: 1.170174e+03 starting adapter/primer trimming Simplex reads basecalled: 4231096 finished adapter/primer trimming

Run environment:

Logs

rowi2024 commented 4 days ago

I am having the same issue. I just upgraded to dorado 0.8.0 (linux version; same as above) to take advantage of new methylation calling models. I was not having this issue with my previous version, dorado 0.5.3. I prefer to do trimming and demuxing together, so do not want to use the --no trim option.

With the 0.8.0 I'm also getting warning messages about my GPUs that I don't get with 0.5.3: Unable to find chunk benchmarks for GPU "Tesla T4", model ... and chunk size 1728. Full benchmarking will run for this device, which may take some time.

Please advise.

propan2one commented 3 days ago

Hi I'm having the same problem as @rowi2024 by re-analyzing data previously basecalled with dorado v0.7.1. Both the

Thanks

malton-ont commented 2 days ago

Hi all,

Thank you all for reporting this. It does appear that a regression has slipped in to dorado 0.8.0 where the identified regions for adapter/primer trimming and for barcode trimming are occasionally creating a final trimming region which is nonsensical. We'll aim to get this patched for the next release.

For now I'm afraid the workaround would be to basecall and then demux separately, as this will separate the two trimming steps so the illegal overlap does not occur.

As for the Unable to find chunk benchmarks message - this is expected. Dorado 0.8.0 introduced pre-computed batch size benchmarks for specific hardware so we can skip the batch size detection. These benchmarks are not exhaustive, so different hardware and/or chunk sizes may still require the benchmark step to be performed. This was already happening in previous versions, we just now include a warning to explain why basecalling has not started immediately since this step can sometimes take a long time.

rowi2024 commented 2 days ago

Thank you for addressing this issue. I’ll check for the updates.

In the meantime, could you please confirm that the two commands below are the correct commands I should run to basecall, demux and trim my data? Also:

Also, thanks very much for explaining about the benchmarking error! This makes sense.

malton-ont commented 2 days ago

@rowi2024,

Yes, those commands look correct.