Open youngnd opened 1 month ago
Hi @youngnd - is dorado correct
your full command? you need to pass in a fastq file to correct as well.
dorado correct <reads.fastq(.gz)>
we will add better error checking to catch and report this
Thanks for adding this to dorado, I'm hoping for greatly improved results.
I'm getting issues with the download too. I'll add curl to the container ( I have it on my server but not in the container) but imagine it will fail due to the rabid proxy.
So please give us a direct download link for the herro model. Then we can use the dorado correct argument to link to the downloaded model path, which will likely work well on a cluster.
Thanks !
sing exec image.sif dorado correct
[2024-05-24 14:01:57.876] [info] Running: "correct"
[2024-05-24 14:01:57.900] [info] - downloading herro-v1 with httplib
[2024-05-24 14:03:18.002] [error] Failed to download herro-v1: Could not establish connection
[2024-05-24 14:03:18.003] [info] - downloading herro-v1 with curl
sh: 1: curl: not found
[2024-05-24 14:03:18.007] [error] Failed to download herro-v1: ret=32512, errno=0
[2024-05-24 14:03:18.007] [error] Could not download model: herro-v1
Edit - but after adding curl to the container, I get a segfault, and non-connection message, but the file seems to be there. Is this ok?
27M │ ├── herro.pt
sing exec image.sif dorado correct
[2024-05-24 15:20:36.174] [info] Running: "correct"
[2024-05-24 15:20:36.192] [info] - downloading herro-v1 with httplib
[2024-05-24 15:21:56.289] [error] Failed to download herro-v1: Could not establish connection
[2024-05-24 15:21:56.289] [info] - downloading herro-v1 with curl
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 22.3M 100 22.3M 0 0 3469k 0 0:00:06 0:00:06 --:--:-- 3501k
Segmentation fault (core dumped)
~/programs/herro$ dust
32K ┌── .temp_dorado_model-3fecf49c731bcdaf│ █ │ 0%
24K │ ┌── config.toml │ ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒█ │ 0%
27M │ ├── herro.pt │ ████████████████████████████████████████████████████████████████████████████████████ │ 50%
27M │ ┌─┴ herro-v1 │ ████████████████████████████████████████████████████████████████████████████████████ │ 50%
27M ├─┴ .temp_dorado_model-6a608efaca49cf7a│ ████████████████████████████████████████████████████████████████████████████████████ │ 50%
24K │ ┌── config.toml │ ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒█ │ 0%
27M │ ├── herro.pt │ ████████████████████████████████████████████████████████████████████████████████████ │ 50%
27M │ ┌─┴ herro-v1 │ ████████████████████████████████████████████████████████████████████████████████████ │ 50%
27M ├─┴ .temp_dorado_model-dcca95896244d3b7│ ████████████████████████████████████████████████████████████████████████████████████ │ 50%
55M ┌─┴ .
Hi @colindaven - the model has now downloaded correctly.
You need to run dorado correct
with a path to a reads.fastq file to correct.
Note that dorado correct
doesn't work with piped data - it needs to be given a FASTX file with reads.
Thanks to ON support (Steven) This worked for me. dorado download --model herro-v1 And then specifying the model path manually as an argument (-m model-path) when you use the command: dorado correct -m herro-v1 reads.fastq.gz > corrected.fasta Note: It was unable to index gzipped files prepared using pigz (default ). It needs bgz files apparently
That's a good point. Will clarify in the docs that the input file needs to be bgzipped.
I am also getting an index error, [error] Could not create/load index for FASTx file. Below is the code I am running
set -e
herro_model=~/dorado-0.7.0-linux-x64/bin/herro-v1 input=/data/AQ924/dorado_v5/output/AQ924_v5.fa.bgz output=/data/AQ924/dorado_clean/output/AQ924_clean.fasta log=/data/AQ924/dorado_clean/output/AQ924_clean.log
mkdir -p /data/AQ924/dorado_clean/output echo "Running Dorado" nohup ~/dorado-0.7.0-linux-x64/bin/dorado correct -m $herro_model $input -v > $output 2> $log &
I started by re-running the pod5s for this project using the newest dorado v5 release, output into bam form.
Converted bam to fasta, using samtools fasta -c 6 AQ924_v5.bam > AQ924_v5.fa.bgz
Any help would be greatly appreciated.
Hi @fen2323 is the folder where the input is writable by the process?
can you also try to run dorado correct
on the fastq without compression?
@tijyojwad Yes the folder is writable. After re-running dorado basecaller with the emit fastq tag, it is now working. It will not work on the fastq file that was converted from the original basecaller .bam output.
@tijyojwad Another question, using the newest version of dorado, should I redo basecalling for both the passed and failed pod5 files? Will it then automatically perform the filtering based on quality of the basecalling model I select? Or should I just stick to the pod5 passed from the original run?
@fen2323
It will not work on the fastq file that was converted from the original basecaller .bam output.
hmm interesting, we haven't tested this path. I find that surprising though - is it not working on the uncompressed fastq or the compressed one (what you reported in your script)?
Dorado doesn't do any default filtering on Q scores. If the goal is to rescue more reads from the original dataset (i.e. convert some fail reads to pass reads), then I would run basecalling on the whole dataset and set your own filtering parameters for Dorado to use. But if you just want to use the pass reads, the rebase calling just that folder should be enough.
@tijyojwad
@fen2323
It will not work on the fastq file that was converted from the original basecaller .bam output.
hmm interesting, we haven't tested this path. I find that surprising though - is it not working on the uncompressed fastq or the compressed one (what you reported in your script)?
It will only work if the fastq is the direct output of dorado (using the emit tag). I tried both the fastq and compressed fastq and got the same index error on both. I tried both bedtools and samtools to convert from bam to fastq.
Thank you for answering my other question too.
Hi @fen2323
I've been able to do the following and generated a corrected output
$ dorado basecaller <model> <pod5> > output.bam
$ samtools bam2fq output.bam > output.fastq
$ dorado correct output.fastq > corrected.fastq
and I just realized the issue with your command. dorado correct needs a fastq (our documentation needs to be updated to reflect that). so you'll need to specify
samtools fastq -c 6 AQ924_v5.bam > AQ924_v5.fq.bgz
(although when I ran it I just got a fastq, not a compressed fastq)
@tijyojwad
I have some more data to run, I will try it again as you have shown and see if I can get it to work. Thank you
I can get dorado correct working now with the new version and with dorado in a singularity container with jobs started by nextflow. I was having trouble since I forgot the --nv
parameter to allow singularity to access the gpu, but all good now.
Thanks for packaging herro up in dorado.
@tijyojwad Do you know if it is possible to run basecalling and dorado correct at the same time on a PromethION tower? Second question, is it possible to run dorado correct and utilize GPU vs CPU?
Issue Report
Please describe the issue:
I downloaded dorado 0.7.0 and tried to use the correct option. " - downloading herro-v1 with httplib ./dorado correct [2024-05-24 09:46:55.603] [info] Running: "correct" [2024-05-24 09:46:55.604] [info] Assuming cert location is /etc/ssl/certs/ca-bundle.crt [2024-05-24 09:46:55.607] [info] - downloading herro-v1 with httplib
Segmentation fault (core dumped)" . I know that herro recently switched their web address for model downloads. COuld it be something to do with this?
Please provide a clear and concise description of the issue you are seeing and the result you expect. I expected the correct options to be displayed so i could errror correct my reads with herro after basecalling my 10.4 flow cell data.
Steps to reproduce the issue:
downloaded the software twice and re-ran in user and admin mode with the same outcome. sudo ./dorado correct [2024-05-24 09:46:55.603] [info] Running: "correct" [2024-05-24 09:46:55.604] [info] Assuming cert location is /etc/ssl/certs/ca-bundle.crt [2024-05-24 09:46:55.607] [info] - downloading herro-v1 with httplib
Please list any steps to reproduce the issue.
Run environment:
Logs
sudo ./dorado correct [2024-05-24 09:46:55.603] [info] Running: "correct" [2024-05-24 09:46:55.604] [info] Assuming cert location is /etc/ssl/certs/ca-bundle.crt [2024-05-24 09:46:55.607] [info] - downloading herro-v1 with httplib