nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
445 stars 54 forks source link

Read splitting #805

Closed guangzhaocs closed 1 month ago

guangzhaocs commented 1 month ago

Previously, I reported a question (get more reads from Dorado). The old question here: https://github.com/nanoporetech/dorado/issues/701

So, could I prevent the splitting using some parameters (e.g., --no-trim)? Or how to combine these splitting reads to get the full output corresponding to the original reads? (my data is RNA)

I do not find the introduction of the the read splitting of concatemers. Could you provide more information about this?

Thanks a lot

tijyojwad commented 1 month ago

Hi @guangzhaocs - there's no way to prevent read splitting in dorado. What is your use case for not wanting them to be split?

You could roughly stitch them back together using the pi:z and sp:i tags. the pi tag mentions the original reads ids, and sp is the split point, so sorting by split point and concatenating will approximately get back the unspilt read.

More details on unspilt reads at https://github.com/nanoporetech/dorado/blob/release-v0.6/documentation/SAM.md#split-read-tags

guangzhaocs commented 1 month ago

Thanks very much. I do not have other questions.

tijyojwad commented 1 month ago

note that by default the adapters in reads are also trimmed off, so each split read will have their adapters removed. to get most of the original read back, you need to run with --no-trim