nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
527 stars 63 forks source link

CUDA Error #589

Closed krobik26 closed 7 months ago

krobik26 commented 9 months ago

I keep trying to run the basecalling with:

./dorado basecaller dna_r10.4.1_e8.2_400bps_hac@v4.2.0 /PATH/pod5 --kit-name SQK-NBD114-24 > /PATH/output.fastq

Every time I run this I either get:

CUDA device requested but no devices found.

or if I add the argument -x cuda:0 : [2024-01-19 11:49:04.147] [error] No CUDA GPUs are available Exception raised from device_count_ensure_non_zero at /pytorch/pyold/c10/cuda/CUDAFunctions.cpp:120 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x150a97054a77 in /gpfs/gibbs/project/neugebauer/kr728/dorado/bin/../lib/libdorado_torch_lib.so) frame #1: c10::detail::torchCheckFail(char const, char const, unsigned int, char const*) + 0x68 (0x150a905d91f4 in /gpfs/gibbs/project/neugebauer/kr728/dorado/bin/../lib/libdorado_torch_lib.so) frame #2: c10::cuda::device_count_ensure_non_zero() + 0x5d (0x150a9701eccd in /gpfs/gibbs/project/neugebauer/kr728/dorado/bin/../lib/libdorado_torch_lib.so) frame #3: + 0x89859ed (0x150a94fa19ed in /gpfs/gibbs/project/neugebauer/kr728/dorado/bin/../lib/libdorado_torch_lib.so) frame #4: + 0xa61aca6 (0x150a96c36ca6 in /gpfs/gibbs/project/neugebauer/kr728/dorado/bin/../lib/libdorado_torch_lib.so) frame #5: + 0xa61ad30 (0x150a96c36d30 in /gpfs/gibbs/project/neugebauer/kr728/dorado/bin/../lib/libdorado_torch_lib.so) frame #6: at::_ops::empty_strided::redispatch(c10::DispatchKeySet, c10::ArrayRef, c10::ArrayRef, c10::optional, c10::optional, c10::optional, c10::optional) + 0xf3 (0x150a919b2b03 in /gpfs/gibbs/project/neugebauer/kr728/dorado/bin/../lib/libdorado_torch_lib.so) frame #7: + 0x56c3c2e (0x150a91cdfc2e in /gpfs/gibbs/project/neugebauer/kr728/dorado/bin/../lib/libdorado_torch_lib.so) frame #8: at::_ops::empty_strided::call(c10::ArrayRef, c10::ArrayRef, c10::optional, c10::optional, c10::optional, c10::optional) + 0x1ba (0x150a919f48da in /gpfs/gibbs/project/neugebauer/kr728/dorado/bin/../lib/libdorado_torch_lib.so) frame #9: + 0x48988b9 (0x150a90eb48b9 in /gpfs/gibbs/project/neugebauer/kr728/dorado/bin/../lib/libdorado_torch_lib.so) frame #10: at::native::_to_copy(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x1b38 (0x150a91220278 in /gpfs/gibbs/project/neugebauer/kr728/dorado/bin/../lib/libdorado_torch_lib.so) frame #11: + 0x588e21b (0x150a91eaa21b in /gpfs/gibbs/project/neugebauer/kr728/dorado/bin/../lib/libdorado_torch_lib.so) frame #12: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0xf5 (0x150a91691605 in /gpfs/gibbs/project/neugebauer/kr728/dorado/bin/../lib/libdorado_torch_lib.so) frame #13: + 0x56c7603 (0x150a91ce3603 in /gpfs/gibbs/project/neugebauer/kr728/dorado/bin/../lib/libdorado_torch_lib.so) frame #14: at::_ops::_to_copy::call(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x1f9 (0x150a917179c9 in /gpfs/gibbs/project/neugebauer/kr728/dorado/bin/../lib/libdorado_torch_lib.so) frame #15: at::native::to(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, bool, c10::optional) + 0x11b (0x150a91216e5b in /gpfs/gibbs/project/neugebauer/kr728/dorado/bin/../lib/libdorado_torch_lib.so) frame #16: + 0x5a5b131 (0x150a92077131 in /gpfs/gibbs/project/neugebauer/kr728/dorado/bin/../lib/libdorado_torch_lib.so) frame #17: at::_ops::to_dtype_layout::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, bool, c10::optional) + 0x114 (0x150a91827b74 in /gpfs/gibbs/project/neugebauer/kr728/dorado/bin/../lib/libdorado_torch_lib.so) frame #18: + 0x56c773e (0x150a91ce373e in /gpfs/gibbs/project/neugebauer/kr728/dorado/bin/../lib/libdorado_torch_lib.so) frame #19: at::_ops::to_dtype_layout::call(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, bool, c10::optional) + 0x20e (0x150a9189630e in /gpfs/gibbs/project/neugebauer/kr728/dorado/bin/../lib/libdorado_torch_lib.so) frame #20: void torch::nn::Module::to_impl<c10::Device&, bool&>(c10::Device&, bool&) + 0x1e0 (0x150a94602200 in /gpfs/gibbs/project/neugebauer/kr728/dorado/bin/../lib/libdorado_torch_lib.so) frame #21: torch::nn::Module::to(c10::Device, bool) + 0x1c (0x150a945fb2dc in /gpfs/gibbs/project/neugebauer/kr728/dorado/bin/../lib/libdorado_torch_lib.so) frame #22: void torch::nn::Module::to_impl<c10::Device&, bool&>(c10::Device&, bool&) + 0xd0 (0x150a946020f0 in /gpfs/gibbs/project/neugebauer/kr728/dorado/bin/../lib/libdorado_torch_lib.so) frame #23: torch::nn::Module::to(c10::Device, bool) + 0x1c (0x150a945fb2dc in /gpfs/gibbs/project/neugebauer/kr728/dorado/bin/../lib/libdorado_torch_lib.so) frame #24: void torch::nn::Module::to_impl<c10::Device&, bool&>(c10::Device&, bool&) + 0xd0 (0x150a946020f0 in /gpfs/gibbs/project/neugebauer/kr728/dorado/bin/../lib/libdorado_torch_lib.so) frame #25: torch::nn::Module::to(c10::Device, bool) + 0x1c (0x150a945fb2dc in /gpfs/gibbs/project/neugebauer/kr728/dorado/bin/../lib/libdorado_torch_lib.so) frame #26: ./dorado() [0x906036] frame #27: ./dorado() [0x9037a7] frame #28: ./dorado() [0x8f94c6] frame #29: ./dorado() [0x89476b] frame #30: ./dorado() [0x89482b] frame #31: + 0xfe67 (0x150a8b91be67 in /lib64/libpthread.so.0) frame #32: ./dorado() [0x894e3f] frame #33: ./dorado() [0x899080] frame #34: + 0x1196e440 (0x150a9df8a440 in /gpfs/gibbs/project/neugebauer/kr728/dorado/bin/../lib/libdorado_torch_lib.so) frame #35: + 0x81ca (0x150a8b9141ca in /lib64/libpthread.so.0) frame #36: clone + 0x43 (0x150a8aa39e73 in /lib64/libc.so.6)

I tried running, as per the guide: export LD_LIBRARY_PATH=/PATH/dorado-x.y.z-linux-x64/lib:$LD_LIBRARY_PATH

But still these errors kept coming up. Also is my orginal code correct if I would like to basecall and classify by barcodes into fastq files. Thanks!

tijyojwad commented 9 months ago

Are you able to run any other CUDA application? I would suggest trying the NVIDIA CUDA samples first to see if they are runnable.

is my orginal code correct if I would like to basecall and classify by barcodes into fastq files

Your command is correct for classifying. The output of that command will be a BAM file with a classification for each read and with barcodes trimmed from the reads. If you want to split them into per barcode files, you will need use the dorado demux cmd as outlined here - https://github.com/nanoporetech/dorado#in-line-with-basecalling-1

HalfPhoton commented 7 months ago

Closing as stale as there has been no reply. Please re-open if needed.