Open caspargross opened 1 month ago
Hi @caspargross,
Your duplex command is malformed:
dorado duplex ... --modified-bases-models <mods_model> <simplex_model> <data>
This arrangement results in the simplex model being consumed by the --modified-bases-models
argument which accepts multiple inputs.
To resolve, please put the simplex model and data before the optional arguments.
dorado duplex <simpelx_model> <data> ... --modified-bases-models <mods_model>
As for the substr
error we'll take a look at this internally as it's come up in a couple of places.
Kind regards, Rich
Hi @HalfPhoton ,
thanks for your help. I changed the code as you suggested but it did not solve the issues. However I found out that the basecalling works fine using only one GPU. To me it seems that there is a problem with CUDA ressource allocation or deallocation when running duplex basecalling on multiple devices. For me the issue can be solved by parallelizing multiple jobs using only a single CUDA device. Best regards, Caspar
I am reopening this Issue because this problem is still not solved. I tried the following points:
Here is one of the failing pod5 files: https://cloud.imgag.de/s/bgC6CPg7ciTf9df
Logfile:
[2024-06-06 15:49:52.953] [info] Running: "duplex" "bin/dorado_models/dna_r10.4.1_e8.2_400bps_hac@v4.1.0" "/tmp/dorado_duplex.HG001_TUE_01/channel_mod-65.pod5" "--device" "cuda:1" "--min-qscore" "9" "--verbose" "--batchsize" "5000" "--chunksize" "9216"
[2024-06-06 15:49:52.988] [info] > No duplex pairs file provided, pairing will be performed automatically
[2024-06-06 15:49:52.989] [debug] > Reads to process: 255715
[2024-06-06 15:49:53.246] [info] Normalised: chunksize 9216 -> 9215
[2024-06-06 15:49:53.247] [info] Normalised: chunksize 9216 -> 9215
[2024-06-06 15:49:53.433] [debug] cuda:1 memory available: 84.15GB
[2024-06-06 15:49:53.433] [debug] cuda:1 memory limit 74.74GB
[2024-06-06 15:49:53.433] [debug] cuda:1 maximum safe estimated batch size at chunk size 9215 is 11648
[2024-06-06 15:49:53.433] [info] cuda:1 using chunk size 9215, batch size 5056
[2024-06-06 15:49:53.433] [debug] cuda:1 Model memory 21.54GB
[2024-06-06 15:49:53.433] [debug] cuda:1 Decode memory 10.83GB
[2024-06-06 15:49:54.170] [debug] cuda:1 memory available: 83.82GB
[2024-06-06 15:49:54.170] [debug] cuda:1 memory limit 40.91GB
[2024-06-06 15:49:54.170] [debug] cuda:1 maximum safe estimated batch size at chunk size 9215 is 13248
[2024-06-06 15:49:54.170] [info] cuda:1 using chunk size 9215, batch size 5056
[2024-06-06 15:49:54.170] [debug] cuda:1 Model memory 11.93GB
[2024-06-06 15:49:54.170] [debug] cuda:1 Decode memory 3.67GB
[2024-06-06 15:49:54.560] [error] finalise() not called on a HtsFile.
[2024-06-06 15:49:54.561] [error] CUDA error: invalid argument
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from c10_cuda_check_implementation at /pytorch/pyold/c10/cuda/CUDAException.cpp:44 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7fe0837aa9b7 in /mnt/storage3/projects/research/2024_lonGER/bin/dorado-0.7.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7fe07cd2f115 in /mnt/storage3/projects/research/2024_lonGER/bin/dorado-0.7.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7fe083774958 in /mnt/storage3/projects/research/2024_lonGER/bin/dorado-0.7.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #3: <unknown function> + 0x897dd13 (0x7fe0816efd13 in /mnt/storage3/projects/research/2024_lonGER/bin/dorado-0.7.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #4: <unknown function> + 0x4530bc1 (0x7fe07d2a2bc1 in /mnt/storage3/projects/research/2024_lonGER/bin/dorado-0.7.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #5: at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, c10::optional<c10::MemoryFormat>) + 0x14 (0x7fe07d29c604 in /mnt/storage3/projects/research/2024_lonGER/bin/dorado-0.7.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #6: at::detail::empty_cpu(c10::ArrayRef<long>, c10::ScalarType, bool, c10::optional<c10::MemoryFormat>) + 0x40 (0x7fe07d29c650 in /mnt/storage3/projects/research/2024_lonGER/bin/dorado-0.7.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #7: at::detail::empty_cpu(c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>) + 0x34 (0x7fe07d29c6a4 in /mnt/storage3/projects/research/2024_lonGER/bin/dorado-0.7.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #8: at::native::empty_cpu(c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>) + 0x1f (0x7fe07d997e0f in /mnt/storage3/projects/research/2024_lonGER/bin/dorado-0.7.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #9: <unknown function> + 0x5703e31 (0x7fe07e475e31 in /mnt/storage3/projects/research/2024_lonGER/bin/dorado-0.7.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #10: <unknown function> + 0x5703e8b (0x7fe07e475e8b in /mnt/storage3/projects/research/2024_lonGER/bin/dorado-0.7.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #11: at::_ops::empty_memory_format::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>) + 0xe7 (0x7fe07e1086e7 in /mnt/storage3/projects/research/2024_lonGER/bin/dorado-0.7.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #12: <unknown function> + 0x56c718f (0x7fe07e43918f in /mnt/storage3/projects/research/2024_lonGER/bin/dorado-0.7.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #13: at::_ops::empty_memory_format::call(c10::ArrayRef<c10::SymInt>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>) + 0x1b2 (0x7fe07e148922 in /mnt/storage3/projects/research/2024_lonGER/bin/dorado-0.7.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #14: bin/dorado-0.7.1-linux-x64/bin/dorado() [0x8b2b7f]
frame #15: bin/dorado-0.7.1-linux-x64/bin/dorado() [0xa3c03f]
frame #16: bin/dorado-0.7.1-linux-x64/bin/dorado() [0xa3aac6]
frame #17: bin/dorado-0.7.1-linux-x64/bin/dorado() [0x87492d]
frame #18: bin/dorado-0.7.1-linux-x64/bin/dorado() [0x82739a]
frame #19: bin/dorado-0.7.1-linux-x64/bin/dorado() [0x8310ab]
frame #20: <unknown function> + 0x114df (0x7fe0789144df in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #21: bin/dorado-0.7.1-linux-x64/bin/dorado() [0x826569]
frame #22: bin/dorado-0.7.1-linux-x64/bin/dorado() [0x837e00]
frame #23: <unknown function> + 0x1196e380 (0x7fe08a6e0380 in /mnt/storage3/projects/research/2024_lonGER/bin/dorado-0.7.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #24: <unknown function> + 0x8609 (0x7fe07890b609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #25: clone + 0x43 (0x7fe077d65353 in /lib/x86_64-linux-gnu/libc.so.6)
Execution command:
CUDA_LAUNCH_BLOCKING=1 \
bin/dorado-0.7.1-linux-x64/bin/dorado duplex \
bin/dorado_models/dna_r10.4.1_e8.2_400bps_hac@v4.1.0 \
/tmp/dorado_duplex.HG001_TUE_01/channel_mod-65.pod5 \
--device 'cuda:1' \
--min-qscore 9 \
--verbose \
--batchsize 5000 \
--chunksize 9216 \
> /tmp/dorado_duplex.HG001_TUE_01/channel_mod-65.bam \
2> logs/dorado_duplex.HG001_TUE_01.channel_mod-65.log
Update: It works with --chunksize 5000
and --batchsize 5000
. I saw mentionned somewhere that changing batchsize has an impact on the basecalling results however.
I needed to reduce the chunksize further and further to basecall the last channel subsets. For the last file I had to use chunksize 2000 and batchsize 5000. GPU memory usage is only at 16% with those settings.
Issue Report
Please describe the issue:
I am trying to run dorado duplex with the recommended approach to split up the pod5 in individual channels. The error occurs while running
dorado duplex
on a singe channel.pod5
. I use a 4khz HAC model and also want to call modifications.All steps are run as part of a snakemake pipeline. The first job runs fine and produces a bamfile but has an error in the logfile and throws
soft lockup
kernel errors. All subsequent jobs, using the same run command and profiles stop with a CUDA error. The second CUDA error seems to be similar to the one reported by other people here (#849, #842) . I dont know if the first and second erros are related.Steps to reproduce the issue:
Run multiple
dorado duplex
basecallings on split channel .pod5.Run environment:
/tmp
folderLogs
The first job in the pipeline has the following error but still produces a bamfile:
Along with following kernel erros:
All downstram jobs then fail with the following errors: