nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
477 stars 59 forks source link

Unable to specify local copy of duplex model for dorado duplex #676

Closed johnstonmj closed 6 months ago

johnstonmj commented 6 months ago

Issue Report

Please describe the issue:

I'd like to be able to explicitly set and use a local copy of the duplex model used by dorado duplex. I am collecting many directories of .temp_dorado_model-123456 as each invocation downloads the required duplex model. I have downloaded and can specify the required simplex model. I cannot find the option to specify the duplex model documented, but I would like the same functionality.

Ideally: dorado duplex --simplex_model ./models/dna_r10.4.1_e8.2_400bps_hac@v4.3.0 --duplex_model ./models/dna_r10.4.1_e8.2_5khz_stereo@v1.2 some_input.pod5 > some_output.bam

Use of a local copy of the duplex model would prevent re-downloading the same file on each invocation. This would speed each run, save on storage space, and reduce internet bandwidth.

Steps to reproduce the issue:

Running the command: dorado duplex dna_r10.4.1_e8.2_400bps_hac@v4.3.0 some_input.pod5 > some_output.bam

Begins with a download of: downloading dna_r10.4.1_e8.2_5khz_stereo@v1.2 with httplib

Run environment:

Logs

N/A

HalfPhoton commented 6 months ago

Hi @johnstonmj, If a stereo model is in the same directory as the simplex model then it will be re-used.

# Previously downloaded both models 
$ ls local_models/
dna_r10.4.1_e8.2_5khz_stereo@v1.2 dna_r10.4.1_e8.2_400bps_hac@v4.3.0

# Run duplex
$ dorado duplex ./local_models/dna_r10.4.1_e8.2_400bps_hac@v4.3.0/ tests/data/duplex/pod5/duplex.pod5 > tmp.bam
[2024-03-07 20:35:50.062] [info] > No duplex pairs file provided, pairing will be performed automatically
... # No download here
[2024-03-07 20:36:18.704] [info] > Basecalled @ Bases/s: 6.575190e+03

# Rename stereo model with .bak suffix
$ mv local_models/dna_r10.4.1_e8.2_5khz_stereo@v1.2 local_models/dna_r10.4.1_e8.2_5khz_stereo@v1.2.bak
$ ls local_models/
dna_r10.4.1_e8.2_5khz_stereo@v1.2.bak dna_r10.4.1_e8.2_400bps_hac@v4.3.0

# Downloads model
$ dorado duplex ./local_models/dna_r10.4.1_e8.2_400bps_hac@v4.3.0/ tests/data/duplex/pod5/duplex.pod5 > tmp.bam
[2024-03-07 20:38:18.092] [info] > No duplex pairs file provided, pairing will be performed automatically
[2024-03-07 20:38:18.094] [info] Assuming cert location is /etc/ssl/cert.pem
[2024-03-07 20:38:18.095] [info]  - downloading dna_r10.4.1_e8.2_5khz_stereo@v1.2 with foundation
...
johnstonmj commented 6 months ago

Perfect! Exactly what I wanted. Thanks @HalfPhoton