nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
488 stars 59 forks source link

--resume mode #72

Closed Kirk3gaard closed 1 year ago

Kirk3gaard commented 1 year ago

Hi In case Dorado crashes during basecalling it would be convenient to have a - - resume option to avoid basecalling reads several times.

Best regards Rasmus

vellamike commented 1 year ago

Hi @Kirk3gaard ,

The way this might work is if Dorado is passed a BAM/SAM file of the completed calls.

something like:

dorado basecall ... --resume crashed.sam

Is that along the lines of what you would like?

Kirk3gaard commented 1 year ago

Hi @vellamike

Yeah I think that should do the trick 👍 . However, for people not looking at modifications it might make sense to have a solution that works with the fastq output option as well.

olawa commented 1 year ago

Hi,

this would indeed be good to have. Also having an option to take a list of read id:s to either basecall or skip could be useful.

billytcl commented 1 year ago

+1 on the resume feature! It's really useful for HPC sessions that have a time limit.

vellamike commented 1 year ago

This feature unfortunately didn't make 0.3.0 but is definitely planned.

tijyojwad commented 1 year ago

Also having an option to take a list of read id:s to either basecall or skip could be useful.

@olawa this particular feature is already available through the --read-ids option in dorado. This is of course not checkpointing, but could be useful in rerunning on a subset of the original dataset.

tijyojwad commented 1 year ago

Resume mode is now available for simplex in dorado v0.3.1. Details are available at https://github.com/nanoporetech/dorado#simplex-basecalling

jdemeul commented 3 months ago

would this be an option too for duplex mode? As this mode is a good bit slower than simplex, would be doubly good to have the --resume option there.