CUDA error at /builds/ofan/ont_core_cpp/ont_core/common/cuda_common.cpp:232: CUDA_ERROR_INVALID_DEVICE

hello, I followed the instruction and run the megalodon (2.4.1) and guppy_basecall_server (6.0.1+652ffd1) with the GPU support. But it always reported error information. Although there were some similar issues that others have encountered, I can not solve my problem through their discussion. So I have no choice but to ask for additional help. Attached below is my demo in scripts:

megalodon \ ~/01.data/01.ONT_data/02.ONT_test/AW/FAST5_PASS \ --guppy-params "-d /public/home/zenglingsen/04.software/03.Guppy/rerio/basecall_models/" \ --guppy-server-path /public/home/zenglingsen/04.software/03.Guppy/ont-guppy/bin/guppy_basecall_server \ --guppy-config res_dna_r941_prom_modbases_5mC_CpG_v001.cfg \ --outputs basecalls mappings mod_mappings mods per_read_mods \ --reference /public/home/zenglingsen/01.data/03.Reference/GCF_000003025.6_Sscrofa11.1_genomic.fna \ --mod-motif m CG 0 \ --output-directory /public/home/zenglingsen/01.data/01.ONT_data/02.ONT_test/AW/FAST5_PASS/out \ --overwrite \ --devices 0 \ --processes 8 \

the error file showed:

[10:17:00] Running Megalodon version 2.4.1 [10:17:00] Loading guppy basecalling backend

ERROR: Guppy server initialization failed. See guppy logs in [--output-directory] for more details.
    Try running the guppy server initialization command found in log.txt in order to pinpoint the source of this issue.

this is the content in the log.txt:

[10:17:00] Running Megalodon version 2.4.1 DBG 10:17:00 : Command: """/public/home/zenglingsen/04.software/02.Anaconda/Or/envs/pytorch/bin/megalodon /public/home/zenglingsen/01.data/01.ONT_data/02.ONT_test/AW/FAST5_PASS --guppy-params -d /public/home/zenglingsen/04.software/03.Guppy/rerio/basecall_models/ --guppy-server-path /public/home/zenglingsen/04.software/03.Guppy/ont-guppy/bin/guppy_basecall_server --guppy-config res_dna_r941_prom_modbases_5mC_CpG_v001.cfg --outputs basecalls mappings mod_mappings mods per_read_mods --reference /public/home/zenglingsen/01.data/03.Reference/GCF_000003025.6_Sscrofa11.1_genomic.fna --mod-motif m CG 0 --output-directory /public/home/zenglingsen/01.data/01.ONT_data/02.ONT_test/AW/FAST5_PASS/out --overwrite --devices 1 --processes 8""" --- MainProcess-MainThread megalodon.py:1793 [10:17:00] Loading guppy basecalling backend DBG 10:17:00 : Guppy version: "6.0.1" --- MainProcess-MainThread backends.py:939 DBG 10:17:00 : Pyguppy version: "6.0.1" --- MainProcess-MainThread backends.py:940 DBG 10:17:00 : guppy server init command: "/public/home/zenglingsen/04.software/03.Guppy/ont-guppy/bin/guppy_basecall_server -p auto -l /public/home/zenglingsen/01.data/01.ONT_data/02.ONT_test/AW/FAST5_PASS/out/guppy_log -c res_dna_r941_prom_modbases_5mC_CpG_v001.cfg --quiet --post_out -x cuda:1 -d /public/home/zenglingsen/04.software/03.Guppy/rerio/basecall_models/" --- MainProcess-MainThread backends.py:1018 DBG 10:17:01 : Found guppy log file: /public/home/zenglingsen/01.data/01.ONT_data/02.ONT_test/AW/FAST5_PASS/out/guppy_log/guppy_basecall_server_log-2023-08-08_10-17-01.log --- MainProcess-MainThread backends.py:1033

ERROR: Guppy server initialization failed. See guppy logs in [--output-directory] for more details.
    Try running the guppy server initialization command found in log.txt in order to pinpoint the source of this issue.

this is the guppy_server log file:

2023-08-08 10:17:01.473438 [guppy/message] ONT Guppy basecall server software version 6.0.1+652ffd1, client-server API version 10.0.0 log path: /public/home/zenglingsen/01.data/01.ONT_data/02.ONT_test/AW/FAST5_PASS/out/guppy_log chunk size: 2000 chunks per runner: 512 max queued reads: 2000 num basecallers: 4 num socket threads: 2 max returned events: 50000 gpu device: cuda:1 kernel path:
runners per device: 4 2023-08-08 10:17:01.475945 [guppy/info] crashpad_handler not supported on this platform. 2023-08-08 10:17:01.478127 [guppy/info] Listening on port ipc:///tmp/3763-2667-29cb-f0a3. 2023-08-08 10:17:01.607464 [guppy/error] CUDA error at /builds/ofan/ont_core_cpp/ont_core/common/cuda_common.cpp:232: CUDA_ERROR_INVALID_DEVICE. Error initialising basecall server using port: ipc://auto. Aborting. 2023-08-08 10:17:01.608638 [guppy/message] The basecall server has shut down successfully.

the last one is the GPU information in partition:

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+

I have followed the reasonable steps through discussion like Guppy can't run on GPU with trained model #46, error while running guppy 4.4.1 on GPU mode #6, and so on. But it does not work.

Any advice and comment are greatful. Thank you very much.

nanoporetech / megalodon

CUDA error at /builds/ofan/ont_core_cpp/ont_core/common/cuda_common.cpp:232: CUDA_ERROR_INVALID_DEVICE #351