nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
446 stars 54 forks source link

Issue about merge multiple pod5 files #743

Closed xiangpingyu closed 2 months ago

xiangpingyu commented 2 months ago

Dear developers,

Initially, we obtain multiple pod5 files from a single run. Should we merge these files into one pod5 file before proceeding with the Dorado analysis? For example, would we use the command $ dorado basecaller sup m6A merged.pod5 > calls.bam to perform this analysis? Or Could we use the command $ dorado basecaller sup m6A ./ > calls.bam, without the merge process.

Thank you for your guidance. Sophia

tijyojwad commented 2 months ago

You can run without merging as well. You'll need to add -r (for recursive) to the cmdline.

i.e. $ dorado basecaller sup m6A ./ -r > calls.bam

provided all yours file are in ./

xiangpingyu commented 2 months ago

@tijyojwad thank you! if i want to use GPU but not CPU to run this command, which parameter do I need to add?

Best, Sophia

tijyojwad commented 2 months ago

Hi @xiangpingyu if you have an NVIDIA GPU dorado should by default run on it. What do you see when you run nvidia-smi?

In general you can choose the gpu you want to use with -x "cuda:0" to run it on the first GPU on nvidia-smi list

xiangpingyu commented 2 months ago

@tijyojwad I'm currently running the basecaller with the -x "cuda:0" option because the initial execution seemed very slow. It appears that overnight, it has processed only about 3% of the data. Is this a normal processing speed? What speed should I typically expect when running the basecaller? thank you!

tijyojwad commented 2 months ago

How much data do you have (e.g. size of POD5 dataset, or number of reads/average read length), and what GPU are you using? Is your data on a local disk or are you reading over a network drive? Do you have multiple GPUs?

xiangpingyu commented 2 months ago

@tijyojwad I noticed that the GPU was not functioning properly, but it has now been fixed.

Thank you!