nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
477 stars 59 forks source link

Running on a subset of files; --input_file_list parameter needed #392

Closed ulbivin closed 11 months ago

ulbivin commented 11 months ago

Hi,

Is there a way to run dorado on a subset of/specific pod5 files? If not, can you please add this similar to the --input_file_list parameter from guppy.

tijyojwad commented 11 months ago

Hi @ulbivin - in the next release (expected in a week or so) we are adding support to run basecalling on a specific pod5 file.

One way to run on a subset of pod5s is to group them into subfolders within a parent directory and point dorado to the relevant subfolder. Then if you want to run dorado on the whole dataset, you can point to the parent directory and use the -r (recursive) option to iterate over all pod5s.

iiSeymour commented 11 months ago

@ulbivin dorado v0.4.0 can now take a specific pod5 file as input:

$ ls pod5s/
1.pod5 2.pod5 3.pod5
$ dorado basecaller dna_r10.4.1_e8.2_400bps_hac@v4.2.0 pod5s/1.pod5 > calls_1.bam
$ dorado basecaller dna_r10.4.1_e8.2_400bps_hac@v4.2.0 pod5s/2.pod5 > calls_2.bam