Closed diego-rt closed 5 months ago
Chiming in here with a similar error
`(base) minknow@betsy:~/Desktop/p2_runs$ ~/Desktop/dorado-0.7.0-linux-x64/bin/dorado correct STL_F2_isoline_pooled.5k.fq > STL_F2_isoline_pooled.5k.herro.fa
[2024-05-22 09:15:14.690] [info] Running: "correct" "STL_F2_isoline_pooled.5k.fq"
[2024-05-22 09:15:14.691] [warning] Unknown certs location for current distribution. If you hit download issues, use the envvar 'SSL_CERT_FILE' to specify the location manually.
[2024-05-22 09:15:14.693] [info] - downloading herro-v1 with httplib
[2024-05-22 09:18:52.537] [info] > starting correction
terminate called after throwing an instance of 'c10::DynamicLibraryError'
what(): Error in dlopen for library libnvrtc.so.11.2and libnvrtc-672ee683.so.11.2
Exception raised from DynamicLibrary at /pytorch/pyold/aten/src/ATen/DynamicLibrary.cpp:35 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x71952ea389b7 in /home/minknow/Desktop/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #1:
Aborted (core dumped) `
Hi, thanks for the reporting this. I'm looking into it. I wasn't able to reproduce this locally, but then all the machines I'm testing on have cuda 11 installed. I'm looking into it now. Do your machines have CUDA installed by any chance?
What is the expected runtime for dorado correct? I've got around 2 terabases worth of ultra long sequencing and I'm wondering what is the feasibility of error correcting it all. I have access to compute nodes with 4x A100 but even with that I'm wondering whether it would take days or rather weeks?
Dorado correct needs to run an all-vs-all mapping job to get overlap information. So for the most part, dorado correct
is CPU limited for now. And the more threads there are, the faster that will run. The GPU portion is smaller, so 1 or 2 A100s will be sufficient to keep the inference part busy once all-vs-all gets going.
In our benchmarking, a whole genome LSK dataset with about 12M reads took around 3 days to run.
Hey @tijyojwad
Thanks for the quick reply! I'm using cuda/12.2.0 and you are right that when trying with cuda/11.3.1 it actually gets through:
[diego.terrones@clip-g4-1 dorado_2]$ ml build-env/f2021
Inactive Modules:
1) cuda/12.2.0
The following have been reloaded with a version change:
1) build-env/f2022 => build-env/f2021
[diego.terrones@clip-g4-1 dorado_2]$ ml cuda/11.3.1
Activating Modules:
1) cuda/11.3.1
[diego.terrones@clip-g4-1 dorado_2]$
[diego.terrones@clip-g4-1 dorado_2]$
[diego.terrones@clip-g4-1 dorado_2]$
[diego.terrones@clip-g4-1 dorado_2]$
[diego.terrones@clip-g4-1 dorado_2]$ dorado-0.7.0-linux-x64/bin/dorado correct -v -t 8 final.ont.fastq.gz > herro.fasta
[2024-05-22 23:32:44.446] [info] Running: "correct" "-v" "-t" "8" "final.ont.fastq.gz"
[2024-05-22 23:32:44.447] [debug] > aligner threads 8, corrector threads 4, writer threads 1
[2024-05-22 23:32:44.467] [info] Assuming cert location is /etc/ssl/certs/ca-bundle.crt
[2024-05-22 23:32:44.470] [info] - downloading herro-v1 with httplib
[2024-05-22 23:32:45.327] [debug] Usable memory for dev cuda:0: 30.4 GB
[2024-05-22 23:32:45.327] [debug] Using batch size 32 on device cuda:0
[2024-05-22 23:32:45.327] [debug] Usable memory for dev cuda:0: 30.4 GB
[2024-05-22 23:32:45.327] [debug] Using batch size 32 on device cuda:0
[2024-05-22 23:32:45.327] [debug] Starting process thread for cuda:0!
[2024-05-22 23:32:45.327] [debug] Starting process thread for cuda:0!
[2024-05-22 23:32:45.327] [debug] Starting decode thread!
[2024-05-22 23:32:45.328] [debug] Starting decode thread!
[2024-05-22 23:32:45.328] [debug] Starting decode thread!
[2024-05-22 23:32:45.328] [debug] Starting decode thread!
[2024-05-22 23:32:45.330] [debug] Looking for idx final.ont.fastq.gz.fai
[2024-05-22 23:32:45.331] [debug] > Map parameters input by user: dbg print qname=false and aln seq=false.
[2024-05-22 23:32:45.331] [debug] Initialized index options.
[2024-05-22 23:32:45.331] [debug] Loading index...
[2024-05-22 23:32:45.383] [debug] Loading model on cuda:0...
[2024-05-22 23:32:45.383] [debug] Loading model on cuda:0...
[2024-05-22 23:32:45.608] [debug] Loaded model on cuda:0!
[2024-05-22 23:32:45.611] [debug] Loaded model on cuda:0!
^[[O[2024-05-22 23:32:49.882] [debug] Loaded index with 3403 target seqs
[2024-05-22 23:32:49.927] [debug] Loaded mm2 index.
[2024-05-22 23:32:49.927] [info] > starting correction
[2024-05-22 23:32:49.927] [debug] Align with index 0
[2024-05-22 23:36:52.353] [debug] Pushing 2215 records for correction
[2024-05-22 23:38:14.311] [info] > Corrected reads written: 2215
[2024-05-22 23:38:14.311] [info] > finished correction
Hi @diego-rt - thank you so much for testing so quickly! So I think that narrows down the issue - we need to statically compile those deps into our package (or at least ship the dependencies). I will look into this ASAP.
I installed the nvidia toolkit and now it appears to be working. It's not making an output file yet but it hasn't crashed. I think it's past the previous error now.
the default index size is 8G i.e. it loads 8 gigabases worth of reads for the index. and keeps loading in 8G increments as it processed the whole file.
To get some outputs faster (although it makes the overall run slower for larger outputs), you can also lower the index size by settings -i 80M
which till set index size to 80 megabases. This is useful for sanity checking, but would recommend using the default for full genome runs.
Hi @diego-rt - the missing library issue has now been resolved with dorado v0.7.1
For a whole genome run, I would expect dorado correct
to run for several days on a machine with 96+ CPUs and a couple of GPUs. Note that dorado correct
is mainly cpu bottlenecked right now on the alignment phase, so your GPUs may be idle for most of the run. We're working on splitting alignment and inference into separate steps so users can better utilize their resources.
Hey @tijyojwad
Thanks a lot for the update! Alright, I will wait for the split of alignment from inference to be released then. Hopefully it will also be possible to generate alignment batches so that it can be best parallelised in an HPC environment?
Thanks a lot!
Hey there,
First of all, congratulation on the exciting new release!
What is the expected runtime for
dorado correct
? I've got around 2 terabases worth of ultra long sequencing and I'm wondering what is the feasibility of error correcting it all. I have access to compute nodes with 4x A100 but even with that I'm wondering whether it would take days or rather weeks?On the other hand, I'm testing out
dorado correct
on some UL data spanning a 7 Mb region sequenced at ~30x. In total it is around 213 Mbp of data. However, it seems to be crashing when run on an A100 using 8 CPU threads, and also on an RTX 6000. This is the output: