nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
477 stars 59 forks source link

Dorado correct tool #956

Closed ylaforgue59 closed 3 weeks ago

ylaforgue59 commented 1 month ago

Hello everyone,

I recently submitted a job to dorado correct, here is my script :

dorado correct \ --threads 24 \ --device cuda:0 \ --batch-size 128 \ --verbose \ --model-path $path_model/herro-v1 \ $path_dorado_input/barcode56.fastq \

$path_analysis_results/barcode56/barcode56_corrected_reads.fasta \ 2> $path_logs/barcode56/barcode56.log

I gave Dorado 15k reads in input (median read quality : 22.4), but I only got 27 reads in output (corrected).

Here is the log file: barcode56.log

How can I increase the number of reads in output?

HalfPhoton commented 1 month ago

Hi @ylaforgue59,

Does your dataset have a short read distribution?

Many of your reads might not be passing the alignment stage if they are short as there are some minimum requirements, specifically min_chain_score=4000.

Kind regards, Rich

ylaforgue59 commented 1 month ago

Hi @HalfPhoton ,

Here is the nanoplot report : NanoStats.txt

HalfPhoton commented 1 month ago

Ah ok, your dataset is not appropriate for use with Herro error correction which states:

length of ≥ 10000bp is recommended.

Your stats show the majority of your reads are too short. Short reads will not be considered as min_chain_score=4000 and the HERRO model uses a window size of 4096 bases.

Kind regards, Rich

ylaforgue59 commented 1 month ago

I thought I had "long reads" lol

Ok, I have another project (bacterial wgs) with median reads 25k. I could test herro on this project.

Can you recommend another corrective reads for ONT reads < 10K bp ?

Kind regards, Yoan

HalfPhoton commented 1 month ago

That's probably a question best directed towards the Nanopore Community forum.

Kind regards, Rich