nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
531 stars 63 forks source link

dorado correct runtimes #831

Closed diego-rt closed 5 months ago

diego-rt commented 5 months ago

Hey there,

First of all, congratulation on the exciting new release!

What is the expected runtime for dorado correct? I've got around 2 terabases worth of ultra long sequencing and I'm wondering what is the feasibility of error correcting it all. I have access to compute nodes with 4x A100 but even with that I'm wondering whether it would take days or rather weeks?

On the other hand, I'm testing out dorado correct on some UL data spanning a 7 Mb region sequenced at ~30x. In total it is around 213 Mbp of data. However, it seems to be crashing when run on an A100 using 8 CPU threads, and also on an RTX 6000. This is the output:

[diego.terrones@clip-g4-2 dorado_2]$ singularity exec --nv docker://docker.artifactory.imp.ac.at/tanakalab/docker-dorado:0.7.0 dorado correct -v -t 8 final.ont.fastq > herro.fasta
INFO:    Using cached SIF image
[2024-05-22 20:33:43.766] [info] Running: "correct" "-v" "-t" "8" "final.ont.fastq"
[2024-05-22 20:33:43.766] [debug] > aligner threads 8, corrector threads 4, writer threads 1
[2024-05-22 20:33:43.776] [info]  - downloading herro-v1 with httplib
[2024-05-22 20:33:45.174] [debug] Usable memory for dev cuda:0: 30.4 GB
[2024-05-22 20:33:45.174] [debug] Using batch size 32 on device cuda:0
[2024-05-22 20:33:45.174] [debug] Usable memory for dev cuda:0: 30.4 GB
[2024-05-22 20:33:45.174] [debug] Using batch size 32 on device cuda:0
[2024-05-22 20:33:45.174] [debug] Starting process thread for cuda:0!
[2024-05-22 20:33:45.174] [debug] Starting process thread for cuda:0!
[2024-05-22 20:33:45.175] [debug] Starting decode thread!
[2024-05-22 20:33:45.175] [debug] Looking for idx final.ont.fastq.fai
[2024-05-22 20:33:45.175] [debug] Starting decode thread!
[2024-05-22 20:33:45.175] [debug] Starting decode thread!
[2024-05-22 20:33:45.175] [debug] Starting decode thread!
[2024-05-22 20:33:45.178] [debug] > Map parameters input by user: dbg print qname=false and aln seq=false.
[2024-05-22 20:33:45.178] [debug] Initialized index options.
[2024-05-22 20:33:45.178] [debug] Loading index...
[2024-05-22 20:33:45.231] [debug] Loading model on cuda:0...
[2024-05-22 20:33:45.231] [debug] Loading model on cuda:0...
[2024-05-22 20:33:45.464] [debug] Loaded model on cuda:0!
[2024-05-22 20:33:45.479] [debug] Loaded model on cuda:0!
[2024-05-22 20:33:56.873] [debug] Loaded index with 3403 target seqs
[2024-05-22 20:33:56.935] [debug] Loaded mm2 index.
[2024-05-22 20:33:56.935] [info] > starting correction
[2024-05-22 20:33:56.935] [debug] Align with index 0
[2024-05-22 20:37:27.207] [debug] Pushing 2215 records for correction                                                                                                                
terminate called after throwing an instance of 'c10::DynamicLibraryError'                                                                                                            
  what():  Error in dlopen for library libnvrtc.so.11.2and libnvrtc-672ee683.so.11.2
Exception raised from DynamicLibrary at /pytorch/pyold/aten/src/ATen/DynamicLibrary.cpp:35 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7fe397d869b7 in /Apps/dorado/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #1: <unknown function> + 0x39f139f (0x7fe390d3f39f in /Apps/dorado/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #2: <unknown function> + 0x89889e2 (0x7fe395cd69e2 in /Apps/dorado/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #3: <unknown function> + 0x8988e32 (0x7fe395cd6e32 in /Apps/dorado/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #4: torch::jit::fuser::cuda::codegenOutputQuery(cudaDeviceProp const*, int&, int&, bool&) + 0x37 (0x7fe397ccff97 in /Apps/dorado/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #5: torch::jit::tensorexpr::CudaCodeGen::CompileToNVRTC(std::string const&, std::string const&) + 0x5e (0x7fe397cdf3de in /Apps/dorado/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #6: torch::jit::tensorexpr::CudaCodeGen::Initialize() + 0x1f57 (0x7fe397ce6bf7 in /Apps/dorado/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #7: <unknown function> + 0xa9a5268 (0x7fe397cf3268 in /Apps/dorado/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #8: torch::jit::tensorexpr::CreateCodeGen(std::string const&, std::shared_ptr<torch::jit::tensorexpr::Stmt>, std::vector<torch::jit::tensorexpr::CodeGen::BufferArg, std::allocator<torch::jit::tensorexpr::CodeGen::BufferArg> > const&, c10::Device, std::string const&) + 0x9b (0x7fe394f755ab in /Apps/dorado/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #9: torch::jit::tensorexpr::TensorExprKernel::compile() + 0x1ec7 (0x7fe39509a217 in /Apps/dorado/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #10: torch::jit::tensorexpr::TensorExprKernel::TensorExprKernel(std::shared_ptr<torch::jit::Graph> const&, std::string const&, std::unordered_map<c10::Symbol, std::function<torch::jit::tensorexpr::Tensor (std::vector<c10::variant<torch::jit::tensorexpr::BufHandle, torch::jit::tensorexpr::VarHandle, double, long, bool, std::vector<torch::jit::tensorexpr::BufHandle, std::allocator<torch::jit::tensorexpr::BufHandle> >, std::vector<double, std::allocator<double> >, std::vector<long, std::allocator<long> >, std::string, c10::monostate>, std::allocator<c10::variant<torch::jit::tensorexpr::BufHandle, torch::jit::tensorexpr::VarHandle, double, long, bool, std::vector<torch::jit::tensorexpr::BufHandle, std::allocator<torch::jit::tensorexpr::BufHandle> >, std::vector<double, std::allocator<double> >, std::vector<long, std::allocator<long> >, std::string, c10::monostate> > > const&, std::vector<torch::jit::tensorexpr::ExprHandle, std::allocator<torch::jit::tensorexpr::ExprHandle> > const&, std::vector<torch::jit::tensorexpr::ExprHandle, std::allocator<torch::jit::tensorexpr::ExprHandle> > const&, c10::optional<c10::ScalarType> const&, c10::Device)>, std::hash<c10::Symbol>, std::equal_to<c10::Symbol>, std::allocator<std::pair<c10::Symbol const, std::function<torch::jit::tensorexpr::Tensor (std::vector<c10::variant<torch::jit::tensorexpr::BufHandle, torch::jit::tensorexpr::VarHandle, double, long, bool, std::vector<torch::jit::tensorexpr::BufHandle, std::allocator<torch::jit::tensorexpr::BufHandle> >, std::vector<double, std::allocator<double> >, std::vector<long, std::allocator<long> >, std::string, c10::monostate>, std::allocator<c10::variant<torch::jit::tensorexpr::BufHandle, torch::jit::tensorexpr::VarHandle, double, long, bool, std::vector<torch::jit::tensorexpr::BufHandle, std::allocator<torch::jit::tensorexpr::BufHandle> >, std::vector<double, std::allocator<double> >, std::vector<long, std::allocator<long> >, std::string, c10::monostate> > > const&, std::vector<torch::jit::tensorexpr::ExprHandle, std::allocator<torch::jit::tensorexpr::ExprHandle> > const&, std::vector<torch::jit::tensorexpr::ExprHandle, std::allocator<torch::jit::tensorexpr::ExprHandle> > const&, c10::optional<c10::ScalarType> const&, c10::Device)> > > >, std::vector<long, std::allocator<long> >, bool, std::unordered_map<torch::jit::Value const*, std::vector<torch::jit::StrideInput, std::allocator<torch::jit::StrideInput> >, std::hash<torch::jit::Value const*>, std::equal_to<torch::jit::Value const*>, std::allocator<std::pair<torch::jit::Value const* const, std::vector<torch::jit::StrideInput, std::allocator<torch::jit::StrideInput> > > > >) + 0x708 (0x7fe39509abc8 in /Apps/dorado/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #11: <unknown function> + 0x7a27d78 (0x7fe394d75d78 in /Apps/dorado/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #12: <unknown function> + 0x7a238bc (0x7fe394d718bc in /Apps/dorado/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #13: <unknown function> + 0x7a89263 (0x7fe394dd7263 in /Apps/dorado/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #14: <unknown function> + 0x7a893b1 (0x7fe394dd73b1 in /Apps/dorado/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #15: <unknown function> + 0x7a7cd1c (0x7fe394dcad1c in /Apps/dorado/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #16: <unknown function> + 0x7a86c53 (0x7fe394dd4c53 in /Apps/dorado/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #17: <unknown function> + 0x7a876fd (0x7fe394dd56fd in /Apps/dorado/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #18: <unknown function> + 0x7a8794f (0x7fe394dd594f in /Apps/dorado/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #19: <unknown function> + 0x7a87a20 (0x7fe394dd5a20 in /Apps/dorado/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #20: <unknown function> + 0x7a87274 (0x7fe394dd5274 in /Apps/dorado/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #21: <unknown function> + 0x7a876fd (0x7fe394dd56fd in /Apps/dorado/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #22: <unknown function> + 0x7a8794f (0x7fe394dd594f in /Apps/dorado/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #23: <unknown function> + 0x7a88248 (0x7fe394dd6248 in /Apps/dorado/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #24: torch::jit::Code::Code(std::shared_ptr<torch::jit::Graph> const&, std::string, unsigned long) + 0x52 (0x7fe394dc84c2 in /Apps/dorado/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #25: <unknown function> + 0x7ab1781 (0x7fe394dff781 in /Apps/dorado/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #26: torch::jit::ProfilingGraphExecutorImpl::getOptimizedPlanFor(std::vector<c10::IValue, std::allocator<c10::IValue> >&, c10::optional<unsigned long>) + 0xa81 (0x7fe394dfef81 in /Apps/dorado/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #27: torch::jit::ProfilingGraphExecutorImpl::getPlanFor(std::vector<c10::IValue, std::allocator<c10::IValue> >&, c10::optional<unsigned long>) + 0x79 (0x7fe394dff529 in /Apps/dorado/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #28: <unknown function> + 0x7a6ba8a (0x7fe394db9a8a in /Apps/dorado/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #29: torch::jit::Method::operator()(std::vector<c10::IValue, std::allocator<c10::IValue> >, std::unordered_map<std::string, c10::IValue, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, c10::IValue> > > const&) const + 0x14e (0x7fe3949f658e in /Apps/dorado/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #30: /Apps/dorado/dorado-0.7.0-linux-x64/bin/dorado() [0x8a9ba1]
frame #31: /Apps/dorado/dorado-0.7.0-linux-x64/bin/dorado() [0x89e7ed]
frame #32: /Apps/dorado/dorado-0.7.0-linux-x64/bin/dorado() [0x8a01d0]
frame #33: <unknown function> + 0x1196e380 (0x7fe39ecbc380 in /Apps/dorado/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #34: <unknown function> + 0x94ac3 (0x7fe38c03fac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #35: clone + 0x44 (0x7fe38c0d0a04 in /lib/x86_64-linux-gnu/libc.so.6)

Aborted
faulk-lab commented 5 months ago

Chiming in here with a similar error

`(base) minknow@betsy:~/Desktop/p2_runs$ ~/Desktop/dorado-0.7.0-linux-x64/bin/dorado correct STL_F2_isoline_pooled.5k.fq > STL_F2_isoline_pooled.5k.herro.fa [2024-05-22 09:15:14.690] [info] Running: "correct" "STL_F2_isoline_pooled.5k.fq" [2024-05-22 09:15:14.691] [warning] Unknown certs location for current distribution. If you hit download issues, use the envvar 'SSL_CERT_FILE' to specify the location manually. [2024-05-22 09:15:14.693] [info] - downloading herro-v1 with httplib [2024-05-22 09:18:52.537] [info] > starting correction terminate called after throwing an instance of 'c10::DynamicLibraryError'
what(): Error in dlopen for library libnvrtc.so.11.2and libnvrtc-672ee683.so.11.2 Exception raised from DynamicLibrary at /pytorch/pyold/aten/src/ATen/DynamicLibrary.cpp:35 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x71952ea389b7 in /home/minknow/Desktop/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #1: + 0x39f139f (0x7195279f139f in /home/minknow/Desktop/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #2: + 0x89889e2 (0x71952c9889e2 in /home/minknow/Desktop/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #3: + 0x8988e32 (0x71952c988e32 in /home/minknow/Desktop/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #4: torch::jit::fuser::cuda::codegenOutputQuery(cudaDeviceProp const, int&, int&, bool&) + 0x37 (0x71952e981f97 in /home/minknow/Desktop/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #5: torch::jit::tensorexpr::CudaCodeGen::CompileToNVRTC(std::string const&, std::string const&) + 0x5e (0x71952e9913de in /home/minknow/Desktop/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #6: torch::jit::tensorexpr::CudaCodeGen::Initialize() + 0x1f57 (0x71952e998bf7 in /home/minknow/Desktop/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #7: + 0xa9a5268 (0x71952e9a5268 in /home/minknow/Desktop/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #8: torch::jit::tensorexpr::CreateCodeGen(std::string const&, std::shared_ptr, std::vector<torch::jit::tensorexpr::CodeGen::BufferArg, std::allocator > const&, c10::Device, std::string const&) + 0x9b (0x71952bc275ab in /home/minknow/Desktop/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #9: torch::jit::tensorexpr::TensorExprKernel::compile() + 0x1ec7 (0x71952bd4c217 in /home/minknow/Desktop/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #10: torch::jit::tensorexpr::TensorExprKernel::TensorExprKernel(std::shared_ptr const&, std::string const&, std::unordered_map<c10::Symbol, std::function<torch::jit::tensorexpr::Tensor (std::vector<c10::variant<torch::jit::tensorexpr::BufHandle, torch::jit::tensorexpr::VarHandle, double, long, bool, std::vector<torch::jit::tensorexpr::BufHandle, std::allocator >, std::vector<double, std::allocator >, std::vector<long, std::allocator >, std::string, c10::monostate>, std::allocator<c10::variant<torch::jit::tensorexpr::BufHandle, torch::jit::tensorexpr::VarHandle, double, long, bool, std::vector<torch::jit::tensorexpr::BufHandle, std::allocator >, std::vector<double, std::allocator >, std::vector<long, std::allocator >, std::string, c10::monostate> > > const&, std::vector<torch::jit::tensorexpr::ExprHandle, std::allocator > const&, std::vector<torch::jit::tensorexpr::ExprHandle, std::allocator > const&, c10::optional const&, c10::Device)>, std::hash, std::equal_to, std::allocator<std::pair<c10::Symbol const, std::function<torch::jit::tensorexpr::Tensor (std::vector<c10::variant<torch::jit::tensorexpr::BufHandle, torch::jit::tensorexpr::VarHandle, double, long, bool, std::vector<torch::jit::tensorexpr::BufHandle, std::allocator >, std::vector<double, std::allocator >, std::vector<long, std::allocator >, std::string, c10::monostate>, std::allocator<c10::variant<torch::jit::tensorexpr::BufHandle, torch::jit::tensorexpr::VarHandle, double, long, bool, std::vector<torch::jit::tensorexpr::BufHandle, std::allocator >, std::vector<double, std::allocator >, std::vector<long, std::allocator >, std::string, c10::monostate> > > const&, std::vector<torch::jit::tensorexpr::ExprHandle, std::allocator > const&, std::vector<torch::jit::tensorexpr::ExprHandle, std::allocator > const&, c10::optional const&, c10::Device)> > > >, std::vector<long, std::allocator >, bool, std::unordered_map<torch::jit::Value const, std::vector<torch::jit::StrideInput, std::allocator >, std::hash<torch::jit::Value const>, std::equal_to<torch::jit::Value const>, std::allocator<std::pair<torch::jit::Value const* const, std::vector<torch::jit::StrideInput, std::allocator > > > >) + 0x708 (0x71952bd4cbc8 in /home/minknow/Desktop/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #11: + 0x7a27d78 (0x71952ba27d78 in /home/minknow/Desktop/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #12: + 0x7a238bc (0x71952ba238bc in /home/minknow/Desktop/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #13: + 0x7a89263 (0x71952ba89263 in /home/minknow/Desktop/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #14: + 0x7a893b1 (0x71952ba893b1 in /home/minknow/Desktop/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #15: + 0x7a7cd1c (0x71952ba7cd1c in /home/minknow/Desktop/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #16: + 0x7a86c53 (0x71952ba86c53 in /home/minknow/Desktop/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #17: + 0x7a876fd (0x71952ba876fd in /home/minknow/Desktop/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #18: + 0x7a8794f (0x71952ba8794f in /home/minknow/Desktop/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #19: + 0x7a87a20 (0x71952ba87a20 in /home/minknow/Desktop/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #20: + 0x7a87274 (0x71952ba87274 in /home/minknow/Desktop/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #21: + 0x7a876fd (0x71952ba876fd in /home/minknow/Desktop/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #22: + 0x7a8794f (0x71952ba8794f in /home/minknow/Desktop/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #23: + 0x7a88248 (0x71952ba88248 in /home/minknow/Desktop/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #24: torch::jit::Code::Code(std::shared_ptr const&, std::string, unsigned long) + 0x52 (0x71952ba7a4c2 in /home/minknow/Desktop/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #25: + 0x7ab1781 (0x71952bab1781 in /home/minknow/Desktop/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #26: torch::jit::ProfilingGraphExecutorImpl::getOptimizedPlanFor(std::vector<c10::IValue, std::allocator >&, c10::optional) + 0xa81 (0x71952bab0f81 in /home/minknow/Desktop/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #27: torch::jit::ProfilingGraphExecutorImpl::getPlanFor(std::vector<c10::IValue, std::allocator >&, c10::optional) + 0x79 (0x71952bab1529 in /home/minknow/Desktop/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #28: + 0x7a6ba8a (0x71952ba6ba8a in /home/minknow/Desktop/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #29: torch::jit::Method::operator()(std::vector<c10::IValue, std::allocator >, std::unordered_map<std::string, c10::IValue, std::hash, std::equal_to, std::allocator<std::pair<std::string const, c10::IValue> > > const&) const + 0x14e (0x71952b6a858e in /home/minknow/Desktop/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #30: /home/minknow/Desktop/dorado-0.7.0-linux-x64/bin/dorado() [0x8a9ba1] frame #31: /home/minknow/Desktop/dorado-0.7.0-linux-x64/bin/dorado() [0x89e7ed] frame #32: /home/minknow/Desktop/dorado-0.7.0-linux-x64/bin/dorado() [0x8a01d0] frame #33: + 0x1196e380 (0x71953596e380 in /home/minknow/Desktop/dorado-0.7.0-linux-x64/bin/../lib/libdorado_torch_lib.so) frame #34: + 0x94ac3 (0x719522294ac3 in /lib/x86_64-linux-gnu/libc.so.6) frame #35: + 0x126850 (0x719522326850 in /lib/x86_64-linux-gnu/libc.so.6)

Aborted (core dumped) `

tijyojwad commented 5 months ago

Hi, thanks for the reporting this. I'm looking into it. I wasn't able to reproduce this locally, but then all the machines I'm testing on have cuda 11 installed. I'm looking into it now. Do your machines have CUDA installed by any chance?

What is the expected runtime for dorado correct? I've got around 2 terabases worth of ultra long sequencing and I'm wondering what is the feasibility of error correcting it all. I have access to compute nodes with 4x A100 but even with that I'm wondering whether it would take days or rather weeks?

Dorado correct needs to run an all-vs-all mapping job to get overlap information. So for the most part, dorado correct is CPU limited for now. And the more threads there are, the faster that will run. The GPU portion is smaller, so 1 or 2 A100s will be sufficient to keep the inference part busy once all-vs-all gets going.

In our benchmarking, a whole genome LSK dataset with about 12M reads took around 3 days to run.

diego-rt commented 5 months ago

Hey @tijyojwad

Thanks for the quick reply! I'm using cuda/12.2.0 and you are right that when trying with cuda/11.3.1 it actually gets through:

[diego.terrones@clip-g4-1 dorado_2]$ ml build-env/f2021

Inactive Modules:
  1) cuda/12.2.0

The following have been reloaded with a version change:
  1) build-env/f2022 => build-env/f2021

[diego.terrones@clip-g4-1 dorado_2]$ ml cuda/11.3.1

Activating Modules:
  1) cuda/11.3.1

[diego.terrones@clip-g4-1 dorado_2]$ 
[diego.terrones@clip-g4-1 dorado_2]$ 
[diego.terrones@clip-g4-1 dorado_2]$ 
[diego.terrones@clip-g4-1 dorado_2]$ 
[diego.terrones@clip-g4-1 dorado_2]$ dorado-0.7.0-linux-x64/bin/dorado correct -v -t 8 final.ont.fastq.gz > herro.fasta
[2024-05-22 23:32:44.446] [info] Running: "correct" "-v" "-t" "8" "final.ont.fastq.gz"
[2024-05-22 23:32:44.447] [debug] > aligner threads 8, corrector threads 4, writer threads 1
[2024-05-22 23:32:44.467] [info] Assuming cert location is /etc/ssl/certs/ca-bundle.crt
[2024-05-22 23:32:44.470] [info]  - downloading herro-v1 with httplib
[2024-05-22 23:32:45.327] [debug] Usable memory for dev cuda:0: 30.4 GB
[2024-05-22 23:32:45.327] [debug] Using batch size 32 on device cuda:0
[2024-05-22 23:32:45.327] [debug] Usable memory for dev cuda:0: 30.4 GB
[2024-05-22 23:32:45.327] [debug] Using batch size 32 on device cuda:0
[2024-05-22 23:32:45.327] [debug] Starting process thread for cuda:0!
[2024-05-22 23:32:45.327] [debug] Starting process thread for cuda:0!
[2024-05-22 23:32:45.327] [debug] Starting decode thread!
[2024-05-22 23:32:45.328] [debug] Starting decode thread!
[2024-05-22 23:32:45.328] [debug] Starting decode thread!
[2024-05-22 23:32:45.328] [debug] Starting decode thread!
[2024-05-22 23:32:45.330] [debug] Looking for idx final.ont.fastq.gz.fai
[2024-05-22 23:32:45.331] [debug] > Map parameters input by user: dbg print qname=false and aln seq=false.
[2024-05-22 23:32:45.331] [debug] Initialized index options.
[2024-05-22 23:32:45.331] [debug] Loading index...
[2024-05-22 23:32:45.383] [debug] Loading model on cuda:0...
[2024-05-22 23:32:45.383] [debug] Loading model on cuda:0...
[2024-05-22 23:32:45.608] [debug] Loaded model on cuda:0!
[2024-05-22 23:32:45.611] [debug] Loaded model on cuda:0!
^[[O[2024-05-22 23:32:49.882] [debug] Loaded index with 3403 target seqs
[2024-05-22 23:32:49.927] [debug] Loaded mm2 index.
[2024-05-22 23:32:49.927] [info] > starting correction
[2024-05-22 23:32:49.927] [debug] Align with index 0
[2024-05-22 23:36:52.353] [debug] Pushing 2215 records for correction                                                                                                                        
[2024-05-22 23:38:14.311] [info] > Corrected reads written: 2215
[2024-05-22 23:38:14.311] [info] > finished correction
tijyojwad commented 5 months ago

Hi @diego-rt - thank you so much for testing so quickly! So I think that narrows down the issue - we need to statically compile those deps into our package (or at least ship the dependencies). I will look into this ASAP.

faulk-lab commented 5 months ago

I installed the nvidia toolkit and now it appears to be working. It's not making an output file yet but it hasn't crashed. I think it's past the previous error now.

tijyojwad commented 5 months ago

the default index size is 8G i.e. it loads 8 gigabases worth of reads for the index. and keeps loading in 8G increments as it processed the whole file.

To get some outputs faster (although it makes the overall run slower for larger outputs), you can also lower the index size by settings -i 80M which till set index size to 80 megabases. This is useful for sanity checking, but would recommend using the default for full genome runs.

tijyojwad commented 5 months ago

Hi @diego-rt - the missing library issue has now been resolved with dorado v0.7.1

For a whole genome run, I would expect dorado correct to run for several days on a machine with 96+ CPUs and a couple of GPUs. Note that dorado correct is mainly cpu bottlenecked right now on the alignment phase, so your GPUs may be idle for most of the run. We're working on splitting alignment and inference into separate steps so users can better utilize their resources.

diego-rt commented 5 months ago

Hey @tijyojwad

Thanks a lot for the update! Alright, I will wait for the split of alignment from inference to be released then. Hopefully it will also be possible to generate alignment batches so that it can be best parallelised in an HPC environment?

Thanks a lot!