nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
493 stars 59 forks source link

Error running dorado basecaller 0.7.0 using model rna004_130bps_sup@v5.0.0: double free or corruption (fasttop) #824

Open heruiyang opened 4 months ago

heruiyang commented 4 months ago

Issue Report

Please describe the issue:

Running dorado basecaller 0.7.0 with modification calling using direct RNA model 0.5.0 results in error: Error in `dorado': double free or corruption (fasttop): 0x00007f03240926c0

Steps to reproduce the issue:

Run dorado basecaller 0.7.0 with command specified below

Run environment:

Logs

[2024-05-21 16:22:00.667] [info] Running: "basecaller" "--recursive" "--verbose" "--device" "cuda:all" "--modified-bases" "m6A" "pseU" "--no-trim" "--emit-moves" "--min-qscore" "8" "/gpfs/commons/groups/vickovic_lab/rhe/projects/012/nanopore_data_processing/basecalling_test/models/rna004_130bps_sup@v5.0.0" "/gpfs/commons/groups/vickovic_lab/rhe/projects/012/nanopore_data_processing/data/dwm_test_nanopore_directRNA/E055/C1/dummy_dir/pod5_pass"
[2024-05-21 16:22:01.007] [debug] - matching modification model found: rna004_130bps_sup@v5.0.0_m6A@v1
[2024-05-21 16:22:01.007] [debug] - matching modification model found: rna004_130bps_sup@v5.0.0_pseU@v1
[2024-05-21 16:22:01.015] [info] Normalised: overlap 500 -> 492
[2024-05-21 16:22:01.015] [info] > Creating basecall pipeline
[2024-05-21 16:22:01.015] [debug] CRFModelConfig { qscale:1.200000 qbias:2.000000 stride:6 bias:1 clamp:0 out_features:4096 state_len:5 outsize:4096 blank_score:0.000000 scale:1.000000 num_features:1 sample_rate:4000 mean_qscore_start_pos:60 SignalNormalisationParams { strategy:pa StandardisationScalingParams { standardise:1 mean:80.875900 stdev:17.269760}} BasecallerParams { chunk_size:18432 overlap:492 batch_size:0} convs: { 0: ConvParams { insize:1 size:64 winlen:5 stride:1 activation:swish} 1: ConvParams { insize:64 size:64 winlen:5 stride:1 activation:swish} 2: ConvParams { insize:64 size:128 winlen:9 stride:3 activation:swish} 3: ConvParams { insize:128 size:128 winlen:9 stride:2 activation:swish} 4: ConvParams { insize:128 size:512 winlen:5 stride:2 activation:swish}} model_type: tx { crf_encoder: CRFEncoderParams { insize:512 n_base:4 state_len:5 scale:5.000000 blank_score:2.000000 expand_blanks:1 permute:1} transformer: TxEncoderParams { d_model:512 nhead:8 depth:18 dim_feedforward:2048 deepnorm_alpha:2.449490}}}
[2024-05-21 16:22:01.246] [info]  - BAM format does not support `U`, so RNA output files will include `T` instead of `U` for all file types.
[2024-05-21 16:22:04.862] [debug] cuda:0 memory available: 16.20GB
[2024-05-21 16:22:04.862] [debug] cuda:0 memory limit 15.20GB
[2024-05-21 16:22:04.862] [debug] cuda:0 maximum safe estimated batch size at chunk size 18432 is 96
[2024-05-21 16:22:04.862] [debug] cuda:0 maximum safe estimated batch size at chunk size 9216 is 224
[2024-05-21 16:22:04.862] [debug] Auto batchsize cuda:0: testing up to 224 in steps of 32
[2024-05-21 16:22:05.241] [debug] Auto batchsize cuda:0: 32, time per chunk 4.168609 ms
[2024-05-21 16:22:05.546] [debug] Auto batchsize cuda:0: 64, time per chunk 2.162592 ms
[2024-05-21 16:22:05.840] [debug] Auto batchsize cuda:0: 96, time per chunk 1.382219 ms
[2024-05-21 16:22:06.145] [debug] Auto batchsize cuda:0: 128, time per chunk 1.140312 ms
[2024-05-21 16:22:06.459] [debug] Auto batchsize cuda:0: 160, time per chunk 0.895174 ms
[2024-05-21 16:22:06.810] [debug] Auto batchsize cuda:0: 192, time per chunk 0.858096 ms
[2024-05-21 16:22:07.211] [debug] Auto batchsize cuda:0: 224, time per chunk 0.850757 ms
[2024-05-21 16:22:07.214] [debug] Largest batch size for cuda:0: 224, time per chunk 0.850757 ms
[2024-05-21 16:22:07.214] [debug] Final batch size for cuda:0[0]: 96
[2024-05-21 16:22:07.214] [debug] Final batch size for cuda:0[1]: 224
[2024-05-21 16:22:07.214] [info] cuda:0 using chunk size 18432, batch size 96
[2024-05-21 16:22:07.214] [debug] cuda:0 Model memory 10.28GB
[2024-05-21 16:22:07.214] [debug] cuda:0 Decode memory 1.25GB
[2024-05-21 16:22:07.920] [info] cuda:0 using chunk size 9216, batch size 224
[2024-05-21 16:22:07.920] [debug] cuda:0 Model memory 11.99GB
[2024-05-21 16:22:07.920] [debug] cuda:0 Decode memory 1.46GB
[2024-05-21 16:22:08.988] [debug] BasecallerNode chunk size 18432
[2024-05-21 16:22:08.988] [debug] BasecallerNode chunk size 9216
[2024-05-21 16:22:09.002] [debug] Load reads from file /gpfs/commons/groups/vickovic_lab/rhe/projects/012/nanopore_data_processing/data/dwm_test_nanopore_directRNA/E055/C1/dummy_dir/pod5_pass/PAU02007_pass_01d58b16_057f5581_5.pod5
*** Error in `dorado': double free or corruption (fasttop): 0x00007f03240926c0 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x81329)[0x7f040375f329]
/nfs/sw/dorado/dorado-0.7.0/bin/../lib/libdorado_torch_lib.so(_ZN10cask_cudnn18ImplicitGemmShaderINS_18ImplicitGemmParamsILi4ELi512EEEE3runERNS_7RunInfoEPvS6_PKvS8_S8_S8_S8_S8_P11CUstream_st+0x3a4)[0x7f04152c8694]
/nfs/sw/dorado/dorado-0.7.0/bin/../lib/libdorado_torch_lib.so(_ZN5cudnn3cnn5infer16InferNdSubEngineILb1EL19cudnnTensorFormat_t0ELS3_0ELS3_0EL15cudnnDataType_t0ELb0ELi70ELNS1_9subtree_tE0EN10cask_cudnn11ConvolutionENS6_10ShaderListINS6_10ConvShaderES7_EES9_E27execute_internal_fprop_implEP12cudnnContextP11CUstream_stPKvSH_SH_SH_SH_SH_mPvSI_j+0x78f)[0x7f0413d5cd6f]
/nfs/sw/dorado/dorado-0.7.0/bin/../lib/libdorado_torch_lib.so(_ZN5cudnn3cnn5infer16InferNdSubEngineILb1EL19cudnnTensorFormat_t0ELS3_0ELS3_0EL15cudnnDataType_t0ELb0ELi70ELNS1_9subtree_tE0EN10cask_cudnn11ConvolutionENS6_10ShaderListINS6_10ConvShaderES7_EES9_E21execute_internal_implERKNS_7backend11VariantPackEP11CUstream_st+0x9d)[0x7f0413d5d49d]
/nfs/sw/dorado/dorado-0.7.0/bin/../lib/libdorado_torch_lib.so(_ZN5cudnn3cnn15EngineInterface7executeERKNS_7backend11VariantPackEP11CUstream_st+0xd5)[0x7f041401e415]
/nfs/sw/dorado/dorado-0.7.0/bin/../lib/libdorado_torch_lib.so(_ZN5cudnn3cnn15EngineContainerIL24cudnnBackendEngineName_t34EE21execute_internal_implERKNS_7backend11VariantPackEP11CUstream_st+0x10)[0x7f041386d7f0]
/nfs/sw/dorado/dorado-0.7.0/bin/../lib/libdorado_torch_lib.so(_ZN5cudnn3cnn15EngineInterface7executeERKNS_7backend11VariantPackEP11CUstream_st+0xd5)[0x7f041401e415]
/nfs/sw/dorado/dorado-0.7.0/bin/../lib/libdorado_torch_lib.so(_ZNK5cudnn3cnn26AutoTransformationExecutor16execute_pipelineERNS0_15EngineInterfaceERKNS_7backend11VariantPackEP11CUstream_st+0x9e)[0x7f0412a78eee]
/nfs/sw/dorado/dorado-0.7.0/bin/../lib/libdorado_torch_lib.so(_ZNK5cudnn3cnn22BatchPartitionExecutorclERNS0_15EngineInterfaceEPS2_RKNS_7backend11VariantPackEP11CUstream_st+0xb7)[0x7f0412a7b997]
/nfs/sw/dorado/dorado-0.7.0/bin/../lib/libdorado_torch_lib.so(_ZN5cudnn3cnn28GeneralizedConvolutionEngineINS0_15EngineContainerIL24cudnnBackendEngineName_t34EEEE21execute_internal_implERKNS_7backend11VariantPackEP11CUstream_st+0x12b)[0x7f04139a4a4b]
/nfs/sw/dorado/dorado-0.7.0/bin/../lib/libdorado_torch_lib.so(_ZN5cudnn3cnn15EngineInterface7executeERKNS_7backend11VariantPackEP11CUstream_st+0xd5)[0x7f041401e415]
/nfs/sw/dorado/dorado-0.7.0/bin/../lib/libdorado_torch_lib.so(_ZN5cudnn7backend7executeEP12cudnnContextRNS0_13ExecutionPlanERNS0_11VariantPackE+0x1349)[0x7f0413847d19]
/nfs/sw/dorado/dorado-0.7.0/bin/../lib/libdorado_torch_lib.so(cudnnBackendExecute+0x111)[0x7f04138480c1]
/nfs/sw/dorado/dorado-0.7.0/bin/../lib/libdorado_torch_lib.so(+0x89b584f)[0x7f040de5e84f]
/nfs/sw/dorado/dorado-0.7.0/bin/../lib/libdorado_torch_lib.so(+0x89bf27b)[0x7f040de6827b]
/nfs/sw/dorado/dorado-0.7.0/bin/../lib/libdorado_torch_lib.so(+0x89c00db)[0x7f040de690db]
/nfs/sw/dorado/dorado-0.7.0/bin/../lib/libdorado_torch_lib.so(+0x89a54ca)[0x7f040de4e4ca]
/nfs/sw/dorado/dorado-0.7.0/bin/../lib/libdorado_torch_lib.so(_ZN2at6native17cudnn_convolutionERKNS_6TensorES3_N3c108ArrayRefIlEES6_S6_lbbb+0x96)[0x7f040de4eb16]
/nfs/sw/dorado/dorado-0.7.0/bin/../lib/libdorado_torch_lib.so(+0xa632127)[0x7f040fadb127]
/nfs/sw/dorado/dorado-0.7.0/bin/../lib/libdorado_torch_lib.so(+0xa6321e0)[0x7f040fadb1e0]
/nfs/sw/dorado/dorado-0.7.0/bin/../lib/libdorado_torch_lib.so(_ZN2at4_ops17cudnn_convolution4callERKNS_6TensorES4_N3c108ArrayRefIlEES7_S7_lbbb+0x23d)[0x7f040a9d0e9d]
/nfs/sw/dorado/dorado-0.7.0/bin/../lib/libdorado_torch_lib.so(_ZN2at6native12_convolutionERKNS_6TensorES3_RKN3c108optionalIS1_EENS4_8ArrayRefIlEESA_SA_bSA_lbbbb+0x1505)[0x7f0409d8fcf5]
/nfs/sw/dorado/dorado-0.7.0/bin/../lib/libdorado_torch_lib.so(+0x58a7496)[0x7f040ad50496]
/nfs/sw/dorado/dorado-0.7.0/bin/../lib/libdorado_torch_lib.so(+0x58a7517)[0x7f040ad50517]
/nfs/sw/dorado/dorado-0.7.0/bin/../lib/libdorado_torch_lib.so(_ZN2at4_ops12_convolution4callERKNS_6TensorES4_RKN3c108optionalIS2_EENS5_8ArrayRefIlEENSA_INS5_6SymIntEEESB_bSD_lbbbb+0x29b)[0x7f040a5730fb]
/nfs/sw/dorado/dorado-0.7.0/bin/../lib/libdorado_torch_lib.so(_ZN2at6native11convolutionERKNS_6TensorES3_RKN3c108optionalIS1_EENS4_8ArrayRefIlEESA_SA_bSA_l+0x21d)[0x7f0409d83d3d]
/nfs/sw/dorado/dorado-0.7.0/bin/../lib/libdorado_torch_lib.so(+0x58a6f55)[0x7f040ad4ff55]
/nfs/sw/dorado/dorado-0.7.0/bin/../lib/libdorado_torch_lib.so(+0x58a6fbf)[0x7f040ad4ffbf]
/nfs/sw/dorado/dorado-0.7.0/bin/../lib/libdorado_torch_lib.so(_ZN2at4_ops11convolution4callERKNS_6TensorES4_RKN3c108optionalIS2_EENS5_8ArrayRefIlEENSA_INS5_6SymIntEEESB_bSD_l+0x223)[0x7f040a572443]
/nfs/sw/dorado/dorado-0.7.0/bin/../lib/libdorado_torch_lib.so(_ZN2at6native6conv1dERKNS_6TensorES3_RKN3c108optionalIS1_EENS4_8ArrayRefIlEESA_SA_l+0x1c5)[0x7f0409d86f35]
/nfs/sw/dorado/dorado-0.7.0/bin/../lib/libdorado_torch_lib.so(+0x5a57b31)[0x7f040af00b31]
/nfs/sw/dorado/dorado-0.7.0/bin/../lib/libdorado_torch_lib.so(_ZN2at4_ops6conv1d4callERKNS_6TensorES4_RKN3c108optionalIS2_EENS5_8ArrayRefIlEESB_SB_l+0x20c)[0x7f040a9ce98c]
/nfs/sw/dorado/dorado-0.7.0/bin/../lib/libdorado_torch_lib.so(_ZN5torch2nn10Conv1dImpl7forwardERKN2at6TensorE+0x3a0)[0x7f040d4daa40]
dorado[0xa6959b]
dorado[0xa6ea98]
dorado[0xa5c370]
dorado[0xa5c4d8]
dorado[0xa58fbb]
/nfs/sw/dorado/dorado-0.7.0/bin/../lib/libdorado_torch_lib.so(+0x1196e380)[0x7f0416e17380]
/lib64/libpthread.so.0(+0x7ea5)[0x7f0404c4aea5]
/lib64/libc.so.6(clone+0x6d)[0x7f04037dcb0d]
======= Memory map: ========
00400000-0042d000 r--p 00000000 00:27 28147497692461608                  /nfs/sw/dorado/dorado-0.7.0/bin/dorado
0042d000-10cea000 r-xp 0002d000 00:27 28147497692461608                  /nfs/sw/dorado/dorado-0.7.0/bin/dorado
10cea000-5be9e000 r--p 108ea000 00:27 28147497692461608                  /nfs/sw/dorado/dorado-0.7.0/bin/dorado
5be9f000-5c257000 r--p 5ba9e000 00:27 28147497692461608                  /nfs/sw/dorado/dorado-0.7.0/bin/dorado
5c257000-6ad90000 rw-p 5be56000 00:27 28147497692461608                  /nfs/sw/dorado/dorado-0.7.0/bin/dorado
6ad90000-6cfc7000 rw-p 00000000 00:00 0 
6ec20000-804a2000 rw-p 00000000 00:00 0                                  [heap]
200000000-200400000 ---p 00000000 00:00 0 
200400000-200600000 rw-s 00000000 00:05 45118                            /dev/nvidia1
200600000-205e00000 rw-s 00000000 00:05 33007                            /dev/nvidiactl
205e00000-206e00000 ---p 00000000 00:00 0 
206e00000-207000000 rw-s 00000000 00:05 33007                            /dev/nvidiactl
207000000-207200000 rw-s 00000000 00:05 33007                            /dev/nvidiactl
207200000-207400000 rw-s 00000000 00:05 33007                            /dev/nvidiactl
207400000-207600000 rw-s 207400000 00:05 14551                           /dev/nvidia-uvm
207600000-207800000 rw-s 00000000 00:05 33007                            /dev/nvidiactl
207800000-207a00000 ---p 00000000 00:00 0 
207a00000-207c00000 rw-s 00000000 00:05 33007                            /dev/nvidiactl
207c00000-207e00000 ---p 00000000 00:00 0 
207e00000-208000000 rw-s 00000000 00:04 430173391                        /dev/zero (deleted)
208000000-208400000 ---p 00000000 00:00 0 
208400000-208600000 rw-s 00000000 00:05 33007                            /dev/nvidiactl
208600000-208800000 ---p 00000000 00:00 0 
208800000-208a00000 rw-s 00000000 00:05 33007                            /dev/nvidiactl
208a00000-300200000 ---p 00000000 00:00 0 
10000000000-10004000000 ---p 00000000 00:00 0 
7efef4000000-7efef706c000 rw-p 00000000 00:00 0 
7efef706c000-7efef8000000 ---p 00000000 00:00 0 
7efef8000000-7efef8021000 rw-p 00000000 00:00 0 
7efef8021000-7efefc000000 ---p 00000000 00:00 0 
7efefc000000-7efefc021000 rw-p 00000000 00:00 0 
7efefc021000-7eff00000000 ---p 00000000 00:00 0 
7eff00000000-7eff000b9000 rw-p 00000000 00:00 0 
7eff000b9000-7eff04000000 ---p 00000000 00:00 0 
7eff04000000-7eff04021000 rw-p 00000000 00:00 0 
7eff04021000-7eff08000000 ---p 00000000 00:00 0 
7eff08000000-7eff08021000 rw-p 00000000 00:00 0 
7eff08021000-7eff0c000000 ---p 00000000 00:00 0 
7eff0c000000-7eff0c021000 rw-p 00000000 00:00 0 
7eff0c021000-7eff10000000 ---p 00000000 00:00 0 
7eff10000000-7eff10036000 rw-p 00000000 00:00 0 
7eff10036000-7eff14000000 ---p 00000000 00:00 0 
7eff14000000-7eff18000000 ---p 00000000 00:00 0 
7eff18000000-7eff18021000 rw-p 00000000 00:00 0 
7eff18021000-7eff1c000000 ---p 00000000 00:00 0 
7eff1c000000-7eff1ffec000 rw-p 00000000 00:00 0 
7eff1ffec000-7eff20000000 ---p 00000000 00:00 0 
7eff20000000-7f0044000000 ---p 00000000 00:00 0 
7f0046000000-7f0068000000 ---p 00000000 00:00 0 
7f0068000000-7f0068021000 rw-p 00000000 00:00 0 
7f0068021000-7f006c000000 ---p 00000000 00:00 0 
7f006c000000-7f006c021000 rw-p 00000000 00:00 0 
7f006c021000-7f0070000000 ---p 00000000 00:00 0 
7f0070000000-7f0072fe3000 rw-p 00000000 00:00 0 
7f0072fe3000-7f0074000000 ---p 00000000 00:00 0 
7f0074000000-7f0074021000 rw-p 00000000 00:00 0 
7f0074021000-7f0078000000 ---p 00000000 00:00 0 
7f0078000000-7f007b4b7000 rw-p 00000000 00:00 0 
7f007b4b7000-7f007c000000 ---p 00000000 00:00 0 
7f007c000000-7f007e9c5000 rw-p 00000000 00:00 0 
7f007e9c5000-7f0080000000 ---p 00000000 00:00 0 
7f0080000000-7f0080116000 rw-p 00000000 00:00 0 
7f0080116000-7f0084000000 ---p 00000000 00:00 0 
7f0084000000-7f0086c4d000 rw-p 00000000 00:00 0 
7f0086c4d000-7f0088000000 ---p 00000000 00:00 0 
7f0088000000-7f00880c4000 rw-p 00000000 00:00 0 
7f00880c4000-7f008c000000 ---p 00000000 00:00 0 
7f008c000000-7f008c021000 rw-p 00000000 00:00 0 
7f008c021000-7f0090000000 ---p 00000000 00:00 0 
7f0090000000-7f00900c5000 rw-p 00000000 00:00 0 
7f00900c5000-7f0094000000 ---p 00000000 00:00 0 
7f0096000000-7f018a000000 ---p 00000000 00:00 0 
7f018c000000-7f018c021000 rw-p 00000000 00:00 0 
7f018c021000-7f0190000000 ---p 00000000 00:00 0 
7f0190000000-7f0190021000 rw-p 00000000 00:00 0 
7f0190021000-7f0194000000 ---p 00000000 00:00 0 
7f0194000000-7f0194021000 rw-p 00000000 00:00 0 
7f0194021000-7f0198000000 ---p 00000000 00:00 0 
7f0198000000-7f0199894000 rw-p 00000000 00:00 0 
7f0199894000-7f019c000000 ---p 00000000 00:00 0 
7f019c000000-7f019c089000 rw-p 00000000 00:00 0 
7f019c089000-7f01a0000000 ---p 00000000 00:00 0 
7f01a0000000-7f01a1adb000 rw-p 00000000 00:00 0 
7f01a1adb000-7f01a4000000 ---p 00000000 00:00 0 
7f01a4000000-7f01a5c8f000 rw-p 00000000 00:00 0 
7f01a5c8f000-7f01a8000000 ---p 00000000 00:00 0 
7f01a8000000-7f01a8103000 rw-p 00000000 00:00 0 
7f01a8103000-7f01ac000000 ---p 00000000 00:00 0 
7f01ac000000-7f01ad7ec000 rw-p 00000000 00:00 0 
7f01ad7ec000-7f01b0000000 ---p 00000000 00:00 0 
7f01b0000000-7f01b3ff4000 rw-p 00000000 00:00 0 
7f01b3ff4000-7f01b4000000 ---p 00000000 00:00 0 
7f01b45ba000-7f01bc000000 r--p 00000000 00:2c 17252029939                /gpfs/commons/groups/vickovic_lab/rhe/projects/012/nanopore_data_processing/data/dwm_test_nanopore_directRNA/E055/C1/dummy_dir/pod5_pass/PAU02007_pass_01d58b16_057f5581_5.pod5
7f01bc000000-7f01bc021000 rw-p 00000000 00:00 0 
7f01bc021000-7f01c0000000 ---p 00000000 00:00 0 
7f01c2000000-7f0218e00000 ---p 00000000 00:00 0 
7f0218e00000-7f0219200000 rw-s 00000000 00:04 430176279                  /dev/zero (deleted)
7f0219200000-7f0219600000 rw-s 00000000 00:04 430176280                  /dev/zero (deleted)
7f0219600000-7f0219a00000 rw-s 00000000 00:04 430176281                  /dev/zero (deleted)
7f0219a00000-7f0219e00000 rw-s 00000000 00:04 430176282                  /dev/zero (deleted)
7f0219e00000-7f02efa00000 ---p 00000000 00:00 0 
7f02efa00000-7f02eff01000 rw-s 00000000 00:04 430173402                  /dev/zero (deleted)
7f02eff01000-7f02fc000000 ---p 00000000 00:00 0 
7f02fc000000-7f02ff809000 rw-p 00000000 00:00 0 
7f02ff809000-7f0300000000 ---p 00000000 00:00 0 
7f0300000000-7f0301c98000 rw-p 00000000 00:00 0 
7f0301c98000-7f0304000000 ---p 00000000 00:00 0 
7f0304000000-7f0307d09000 rw-p 00000000 00:00 0 
7f0307d09000-7f0308000000 ---p 00000000 00:00 0 
7f0308000000-7f030bc07000 rw-p 00000000 00:00 0 
7f030bc07000-7f030c000000 ---p 00000000 00:00 0 
7f030c000000-7f030fd32000 rw-p 00000000 00:00 0 
7f030fd32000-7f0310000000 ---p 00000000 00:00 0 
7f0310000000-7f0313f07000 rw-p 00000000 00:00 0 
7f0313f07000-7f0314000000 ---p 00000000 00:00 0 
7f0314000000-7f0317801000 rw-p 00000000 00:00 0 
7f0317801000-7f0318000000 ---p 00000000 00:00 0 
7f0318000000-7f031bf01000 rw-p 00000000 00:00 0 
7f031bf01000-7f031c000000 ---p 00000000 00:00 0 
7f031c000000-7f031fe01000 rw-p 00000000 00:00 0 
7f031fe01000-7f0320000000 ---p 00000000 00:00 0 
7f0320000000-7f0323c01000 rw-p 00000000 00:00 0 
7f0323c01000-7f0324000000 ---p 00000000 00:00 0 
7f0324000000-7f03240b5000 rw-p 00000000 00:00 0 
7f03240b5000-7f0328000000 ---p 00000000 00:00 0 
7f0328000000-7f032bf0b000 rw-p 00000000 00:00 0 
7f032bf0b000-7f032c000000 ---p 00000000 00:00 0 
7f032c000000-7f032c063000 rw-p 00000000 00:00 0 
7f032c063000-7f0330000000 ---p 00000000 00:00 0 
7f0330000000-7f0331800000 ---p 00000000 00:00 0 
7f0331800000-7f0331a00000 rw-s 00000000 00:04 430173396                  /dev/zero (deleted)
7f0331a00000-7f0331c00000 rw-s 00000000 00:04 430173399                  /dev/zero (deleted)
7f0331c00000-7f0350000000 ---p 00000000 00:00 0 
7f0352000000-7f035b000000 ---p 00000000 00:00 0 
7f035b000000-7f035c000000 rw-s 00000000 00:04 430173397                  /dev/zero (deleted)
7f035c000000-7f0360000000 ---p 00000000 00:00 0 
7f0363ffc000-7f0363ffd000 ---p 00000000 00:00 0 
7f0363ffd000-7f03647fd000 rwxp 00000000 00:00 0 
7f03647fd000-7f03647fe000 ---p 00000000 00:00 0 
7f03647fe000-7f0364ffe000 rwxp 00000000 00:00 0 
7f0364ffe000-7f0364fff000 ---p 00000000 00:00 0 
7f0364fff000-7f03657ff000 rwxp 00000000 00:00 0 
7f03657ff000-7f0365800000 ---p 00000000 00:00 0 
7f0365800000-7f0366000000 rwxp 00000000 00:00 0 
7f0366000000-7f036b000000 ---p 00000000 00:00 0 
7f036b000000-7f036c000000 rw-s 00000000 00:04 430173398                  /dev/zero (deleted)
7f036c7f9000-7f036c7fa000 ---p 00000000 00:00 0 
7f036c7fa000-7f036cffa000 rwxp 00000000 00:00 0 
7f036cffa000-7f036cffb000 ---p 00000000 00:00 0 
7f036cffb000-7f036d7fb000 rwxp 00000000 00:00 0 
7f036d7fb000-7f036d7fc000 ---p 00000000 00:00 0 
7f036d7fc000-7f036dffc000 rwxp 00000000 00:00 0 
7f036dffc000-7f036dffd000 ---p 00000000 00:00 0 
7f036dffd000-7f036e7fd000 rwxp 00000000 00:00 0 
7f036e7fd000-7f036e7fe000 ---p 00000000 00:00 0 
7f036e7fe000-7f036effe000 rwxp 00000000 00:00 0 
7f036effe000-7f036efff000 ---p 00000000 00:00 0 
7f036efff000-7f036f7ff000 rwxp 00000000 00:00 0 
7f036f7ff000-7f036f800000 ---p 00000000 00:00 0 
7f036f800000-7f0370000000 rwxp 00000000 00:00 0 
7f0370000000-7f037d000000 ---p 00000000 00:00 0 
7f037d000000-7f037e000000 rw-s 00000000 00:04 430173400                  /dev/zero (deleted)
7f037e000000-7f0380e00000 ---p 00000000 00:00 0 
7f0380e00000-7f0381e00000 rw-s 00000000 00:04 430173401                  /dev/zero (deleted)
7f0381e00000-7f0382000000 ---p 00000000 00:00 0 
7f03827fd000-7f03827fe000 ---p 00000000 00:00 0 
7f03827fe000-7f0382ffe000 rwxp 00000000 00:00 0 
7f0382ffe000-7f0382fff000 ---p 00000000 00:00 0 
7f0382fff000-7f03837ff000 rwxp 00000000 00:00 0 
7f03837ff000-7f0383800000 ---p 00000000 00:00 0 
7f0383800000-7f0384000000 rwxp 00000000 00:00 0 
7f0384000000-7f03a5200000 ---p 00000000 00:00 0 
7f03a5200000-7f03a5400000 rw-s 00000000 00:04 430173395                  /dev/zero (deleted)
7f03a5400000-7f03b8000000 ---p 00000000 00:00 0 
7f03b8000000-7f03b8021000 rw-p 00000000 00:00 0 
7f03b8021000-7f03bc000000 ---p 00000000 00:00 0 
7f03bc7fd000-7f03bc7fe000 ---p 00000000 00:00 0 
7f03bc7fe000-7f03bcffe000 rwxp 00000000 00:00 0 
7f03bcffe000-7f03bcfff000 ---p 00000000 00:00 0 
7f03bcfff000-7f03bd7ff000 rwxp 00000000 00:00 0 
7f03bd7ff000-7f03bd800000 ---p 00000000 00:00 0 
7f03bd800000-7f03be000000 rwxp 00000000 00:00 0 
7f03be000000-7f03bf800000 ---p 00000000 00:00 0 
7f03bf800000-7f03bfa00000 rw-s 00000000 00:04 430173394                  /dev/zero (deleted)
7f03bfa00000-7f03d0000000 ---p 00000000 00:00 0 
7f03d07fd000-7f03d07fe000 ---p 00000000 00:00 0 
7f03d07fe000-7f03d0ffe000 rwxp 00000000 00:00 0 
7f03d0ffe000-7f03d0fff000 ---p 00000000 00:00 0 
7f03d0fff000-7f03d17ff000 rwxp 00000000 00:00 0 
7f03d17ff000-7f03d1800000 ---p 00000000 00:00 0 
7f03d1800000-7f03d2000000 rwxp 00000000 00:00 0 
7f03d2000000-7f03d4400000 ---p 00000000 00:00 0 
7f03d4400000-7f03d4600000 rw-s 00000000 00:05 33007                      /dev/nvidiactl
7f03d4600000-7f03d4800000 rw-s 00000000 00:04 430173388                  /dev/zero (deleted)
7f03d4800000-7f03d4a00000 rw-s 00000000 00:04 430173389                  /dev/zero (deleted)
7f03d4a00000-7f03d5000000 ---p 00000000 00:00 0 
7f03d5000000-7f03d5200000 rw-s 00000000 00:05 33007                      /dev/nvidiactl
7f03d5200000-7f03d5400000 rw-s 00000000 00:04 430173392                  /dev/zero (deleted)
7f03d5400000-7f03d5769000 rw-s 00000000 00:05 33007                      /dev/nvidiactl
7f03d5769000-7f03dc000000 ---p 00000000 00:00 0 
7f03dc000000-7f03dc021000 rw-p 00000000 00:00 0 
7f03dc021000-7f03e0000000 ---p 00000000 00:00 0 
7f03e028b000-7f03e028c000 ---p 00000000 00:00 0 
7f03e028c000-7f03e0a8c000 rwxp 00000000 00:00 0 
7f03e0a8c000-7f03e0a8d000 ---p 00000000 00:00 0 
7f03e0a8d000-7f03e128d000 rwxp 00000000 00:00 0 
7f03e128d000-7f03e128e000 ---p 00000000 00:00 0 
7f03e128e000-7f03e1a8e000 rwxp 00000000 00:00 0 
7f03e1a8e000-7f03e1a8f000 ---p 00000000 00:00 0 
7f03e1a8f000-7f03e228f000 rwxp 00000000 00:00 0 
7f03e228f000-7f03e2290000 ---p 00000000 00:00 0 
7f03e2290000-7f03e2a90000 rwxp 00000000 00:00 0 
7f03e2a90000-7f03e2a91000 ---p 00000000 00:00 0 
7f03e2a91000-7f03e3291000 rwxp 00000000 00:00 0 
7f03e3291000-7f03e3292000 ---p 00000000 00:00 0 
7f03e3292000-7f03e3a92000 rwxp 00000000 00:00 0 
7f03e3a92000-7f03e3a93000 ---p 00000000 00:00 0 
7f03e3a93000-7f03e4293000 rwxp 00000000 00:00 0 
7f03e4293000-7f03e4294000 ---p 00000000 00:00 0 
7f03e4294000-7f03e4a94000 rwxp 00000000 00:00 0 
7f03e4d9f000-7f03e4da0000 ---p 00000000 00:00 0 
7f03e4da0000-7f03e55a0000 rwxp 00000000 00:00 0 
7f03e55a0000-7f03e55a1000 ---p 00000000 00:00 0 
7f03e55a1000-7f03e5da1000 rwxp 00000000 00:00 0 
7f03e5da1000-7f03e5da2000 ---p 00000000 00:00 0 
7f03e5da2000-7f03e65a2000 rwxp 00000000 00:00 0 
7f03e65a2000-7f03e65a3000 ---p 00000000 00:00 0 
7f03e65a3000-7f03e6da3000 rwxp 00000000 00:00 0 
7f03e6da3000-7f03e8000000 rw-p 00000000 00:00 0 
7f03e8000000-7f03e8021000 rw-p 00000000 00:00 0 
7f03e8021000-7f03ec000000 ---p 00000000 00:00 0 
7f03ec07e000-7f03ec07f000 ---p 00000000 00:00 0 
7f03ec07f000-7f03ec87f000 rwxp 00000000 00:00 0 
7f03ec87f000-7f03ec880000 ---p 00000000 00:00 0 
7f03ec880000-7f03ed080000 rwxp 00000000 00:00 0 
7f03ed080000-7f03ed081000 ---p 00000000 00:00 0 
7f03ed081000-7f03ed881000 rwxp 00000000 00:00 0 
7f03ed881000-7f03ed882000 ---p 00000000 00:00 0 
7f03ed882000-7f03ee082000 rwxp 00000000 00:00 0 
7f03ee082000-7f03ee895000 rw-p 00000000 00:00 0 
7f03ee895000-7f03ee896000 rw-s 00000000 00:05 33007                      /dev/nvidiactl
7f03ee896000-7f03ee897000 rw-s 00000000 00:05 33007                      /dev/nvidiactl
7f03ee897000-7f03ee898000 rw-s 00000000 00:05 33007                      /dev/nvidiactl
7f03ee898000-7f03ee899000 rw-s 00000000 00:05 33007                      /dev/nvidiactl
7f03ee899000-7f03ee89a000 rw-s 00000000 00:05 33007                      /dev/nvidiactl
7f03ee89a000-7f03ee89b000 rw-s 00000000 00:05 33007                      /dev/nvidiactl
7f03ee89b000-7f03ee89c000 rw-s 00000000 00:05 33007                      /dev/nvidiactl
7f03ee89c000-7f03ee89d000 rw-s 00000000 00:05 33007                      /dev/nvidiactl
7f03ee89d000-7f03ee89e000 rw-s 00000000 00:05 33007                      /dev/nvidiactl
7f03ee89e000-7f03ee89f000 rw-s 00000000 00:05 33007                      /dev/nvidiactl
7f03ee89f000-7f03ee8a0000 rw-s 00000000 00:05 33007                      /dev/nvidiactl
7f03ee8a0000-7f03ee8a1000 rw-s 00000000 00:05 33007                      /dev/nvidiactl
7f03ee8a1000-7f03ee8a2000 rw-s 00000000 00:05 33007                      /dev/nvidiactl
7f03ee8a2000-7f03ee8a3000 rw-s 00000000 00:05 33007                      /dev/nvidiactl
7f03ee8a3000-7f03ee8a4000 rw-s 00000000 00:05 33007                      /dev/nvidiactl
7f03ee8a4000-7f03ee8a5000 rw-s 00000000 00:05 33007                      /dev/nvidiactl
7f03ee8a5000-7f03ee8a6000 rw-s 00000000 00:05 33007                      /dev/nvidiactl
7f03ee8a6000-7f03ee8a7000 rw-s 00000000 00:05 33007                      /dev/nvidiactl
7f03ee8a7000-7f03ee8a8000 rw-s 00000000 00:05 33007                      /dev/nvidiactl
7f03ee8a8000-7f03ee8a9000 rw-s 00000000 00:05 33007                      /dev/nvidiactl
7f03ee8a9000-7f03ee8aa000 rw-s 00000000 00:05 33007                      /dev/nvidiactl
7f03ee8aa000-7f03ee8ab000 rw-s 00000000 00:05 33007                      /dev/nvidiactl
7f03ee8ab000-7f03ee8ac000 rw-s 00000000 00:05 33007                      /dev/nvidiactl
7f03ee8ac000-7f03ee8ad000 rw-s 00000000 00:05 33007                      /dev/nvidiactl
7f03ee8ad000-7f03ee8ae000 rw-s 00000000 00:05 33007                      /dev/nvidiactl
7f03ee8ae000-7f03ee8af000 rw-s 00000000 00:05 33007                      /dev/nvidiactl
7f03ee8af000-7f03ee8b0000 rw-s 00000000 00:05 33007                      /dev/nvidiactl
7f03ee8b0000-7f03ee8b1000 rw-s 00000000 00:05 33007                      /dev/nvidiactl
7f03ee8b1000-7f03ee8b2000 rw-s 00000000 00:05 33007                      /dev/nvidiactl
7f03ee8b2000-7f03ee8b3000 rw-s 00000000 00:05 33007                      /dev/nvidiactl
7f03ee8b3000-7f03ee8b4000 rw-s 00000000 00:05 33007                      /dev/nvidiactl
7f03ee8b4000-7f03ee8b5000 rw-s 00000000 00:05 33007                      /dev/nvidiactl
7f03ee8b5000-7f03ee8b6000 rw-s 00000000 00:05 33007                      /dev/nvidiactl
7f03ee8b6000-7f03ee8b7000 rw-s 00000000 00:05 33007                      /dev/nvidiactl
7f03ee8b7000-7f03ee8b8000 rw-s 00000000 00:05 33007                      /dev/nvidiactl
7f03ee8b8000-7f03ee8b9000 rw-s 00000000 00:05 33007                      /dev/nvidiactl
7f03ee8b9000-7f03ef8b6000 ---p 00000000 00:00 0 
7f03f0000000-7f03f0021000 rw-p 00000000 00:00 0 
7f03f0021000-7f03f4000000 ---p 00000000 00:00 0 
7f03f43bb000-7f03f463b000 rw-p 00000000 00:00 0 
7f03f463b000-7f03f463c000 ---p 00000000 00:00 0 
7f03f463c000-7f03f4e3c000 rwxp 00000000 00:00 0 
7f03f4e3c000-7f03f503c000 rw-s 00000000 00:04 430173390                  /dev/zero (deleted)
7f03f503c000-7f03f605d000 ---p 00000000 00:00 0 
7f03f605d000-7f03f68bd000 rw-p 00000000 00:00 0 
7f03f68bd000-7f03fce00000 r--p 00000000 fd:00 402654364                  /usr/lib/locale/locale-archive
7f03fce00000-7f03fd600000 rw-p 00000000 00:00 0 
7f03fd64c000-7f03fd74d000 rw-p 00000000 00:00 0 
7f03fd74d000-7f0400a5d000 r-xp 00000000 fd:00 469828070                  /usr/local/cuda-12.3/targets/x86_64-linux/lib/libnvrtc.so.12.3.103
7f0400a5d000-7f0400c5c000 ---p 03310000 fd:00 469828070                  /usr/local/cuda-12.3/targets/x86_64-linux/lib/libnvrtc.so.12.3.103
7f0400c5c000-7f0401263000 r--p 0330f000 fd:00 469828070                  /usr/local/cuda-12.3/targets/x86_64-linux/lib/libnvrtc.so.12.3.103
7f0401263000-7f0401347000 rw-p 03916000 fd:00 469828070                  /usr/local/cuda-12.3/targets/x86_64-linux/lib/libnvrtc.so.12.3.103
7f0401347000-7f0401445000 rw-p 00000000 00:00 0 
7f0401445000-7f0401527000 r--p 00000000 fd:00 345028537                  /usr/lib64/libcuda.so.545.23.08
7f0401527000-7f0401a47000 r-xp 000e2000 fd:00 345028537                  /usr/lib64/libcuda.so.545.23.08
7f0401a47000-7f0402f3d000 r--p 00602000 fd:00 345028537                  /usr/lib64/libcuda.so.545.23.08
7f0402f3d000-7f0402f3e000 ---p 01af8000 fd:00 345028537                  /usr/lib64/libcuda.so.545.23.08
7f0402f3e000-7f0402f57000 r--p 01af8000 fd:00 345028537                  /usr/lib64/libcuda.so.545.23.08
7f0402f57000-7f040305d000 rw-p 01b11000 fd:00 345028537                  /usr/lib64/libcuda.so.545.23.08
7f040305d000-7f04030c9000 rw-p 00000000 00:00 0 
7f04030c9000-7f04030d0000 r-xp 00000000 00:27 2814749788565789           /nfs/sw/dorado/dorado-0.7.0/lib/libaec.so.0.0.10
7f04030d0000-7f04032cf000 ---p 00007000 00:27 2814749788565789           /nfs/sw/dorado/dorado-0.7.0/lib/libaec.so.0.0.10
7f04032cf000-7f04032d0000 r--p 00006000 00:27 2814749788565789           /nfs/sw/dorado/dorado-0.7.0/lib/libaec.so.0.0.10
7f04032d0000-7f04032d1000 rw-p 00007000 00:27 2814749788565789           /nfs/sw/dorado/dorado-0.7.0/lib/libaec.so.0.0.10
7f04032d1000-7f04032d3000 r-xp 00000000 00:27 46724846155381001          /nfs/sw/dorado/dorado-0.7.0/lib/libsz.so.2.0.1
7f04032d3000-7f04034d2000 ---p 00002000 00:27 46724846155381001          /nfs/sw/dorado/dorado-0.7.0/lib/libsz.so.2.0.1
7f04034d2000-7f04034d3000 r--p 00001000 00:27 46724846155381001          /nfs/sw/dorado/dorado-0.7.0/lib/libsz.so.2.0.1
7f04034d3000-7f04034d4000 rw-p 00002000 00:27 46724846155381001          /nfs/sw/dorado/dorado-0.7.0/lib/libsz.so.2.0.1
7f04034d4000-7f04034dc000 r-xp 00000000 00:27 1125899928393792           /nfs/sw/dorado/dorado-0.7.0/lib/libnvToolsExt.so.1.0.0
7f04034dc000-7f04036dc000 ---p 00008000 00:27 1125899928393792           /nfs/sw/dorado/dorado-0.7.0/lib/libnvToolsExt.so.1.0.0
7f04036dc000-7f04036dd000 r--p 00008000 00:27 1125899928393792           /nfs/sw/dorado/dorado-0.7.0/lib/libnvToolsExt.so.1.0.0
7f04036dd000-7f04036de000 rw-p 00009000 00:27 1125899928393792           /nfs/sw/dorado/dorado-0.7.0/lib/libnvToolsExt.so.1.0.0
7f04036de000-7f04038a2000 r-xp 00000000 fd:00 268845397                  /usr/lib64/libc-2.17.so
7f04038a2000-7f0403aa1000 ---p 001c4000 fd:00 268845397                  /usr/lib64/libc-2.17.so
7f0403aa1000-7f0403aa5000 r--p 001c3000 fd:00 268845397                  /usr/lib64/libc-2.17.so
7f0403aa5000-7f0403aa7000 rw-p 001c7000 fd:00 268845397                  /usr/lib64/libc-2.17.so
7f0403aa7000-7f0403aac000 rw-p 00000000 00:00 0 
7f0403aac000-7f0403ac1000 r-xp 00000000 fd:00 269218767                  /usr/lib64/libgcc_s-4.8.5-20150702.so.1
7f0403ac1000-7f0403cc0000 ---p 00015000 fd:00 269218767                  /usr/lib64/libgcc_s-4.8.5-20150702.so.1
7f0403cc0000-7f0403cc1000 r--p 00014000 fd:00 269218767                  /usr/lib64/libgcc_s-4.8.5-20150702.so.1
7f0403cc1000-7f0403cc2000 rw-p 00015000 fd:00 269218767                  /usr/lib64/libgcc_s-4.8.5-20150702.so.1
7f0403cc2000-7f0403dc3000 r-xp 00000000 fd:00 268845405                  /usr/lib64/libm-2.17.so
7f0403dc3000-7f0403fc2000 ---p 00101000 fd:00 268845405                  /usr/lib64/libm-2.17.so
7f0403fc2000-7f0403fc3000 r--p 00100000 fd:00 268845405                  /usr/lib64/libm-2.17.so
7f0403fc3000-7f0403fc4000 rw-p 00101000 fd:00 268845405                  /usr/lib64/libm-2.17.so
7f0403fc4000-7f04040ad000 r-xp 00000000 fd:00 268846125                  /usr/lib64/libstdc++.so.6.0.19
7f04040ad000-7f04042ad000 ---p 000e9000 fd:00 268846125                  /usr/lib64/libstdc++.so.6.0.19
7f04042ad000-7f04042b5000 r--p 000e9000 fd:00 268846125                  /usr/lib64/libstdc++.so.6.0.19
7f04042b5000-7f04042b7000 rw-p 000f1000 fd:00 268846125                  /usr/lib64/libstdc++.so.6.0.19
7f04042b7000-7f04042cc000 rw-p 00000000 00:00 0 
7f04042cc000-7f04042e1000 r-xp 00000000 fd:00 269714997                  /usr/lib64/libz.so.1.2.7
7f04042e1000-7f04044e0000 ---p 00015000 fd:00 269714997                  /usr/lib64/libz.so.1.2.7
7f04044e0000-7f04044e1000 r--p 00014000 fd:00 269714997                  /usr/lib64/libz.so.1.2.7
7f04044e1000-7f04044e2000 rw-p 00015000 fd:00 269714997                  /usr/lib64/libz.so.1.2.7
7f04044e2000-7f040459c000 r-xp 00000000 00:27 29273397599330239          /nfs/sw/dorado/dorado-0.7.0/lib/libzstd.so.1.5.5
7f040459c000-7f040479b000 ---p 000ba000 00:27 29273397599330239          /nfs/sw/dorado/dorado-0.7.0/lib/libzstd.so.1.5.5
7f040479b000-7f040479c000 r--p 000b9000 00:27 29273397599330239          /nfs/sw/dorado/dorado-0.7.0/lib/libzstd.so.1.5.5
7f040479c000-7f040479d000 rw-p 000ba000 00:27 29273397599330239          /nfs/sw/dorado/dorado-0.7.0/lib/libzstd.so.1.5.5
7f040479d000-7f0404a37000 r-xp 00000000 00:27 56294995363541919          /nfs/sw/dorado/dorado-0.7.0/lib/libhdf5.so.8.0.1
7f0404a37000-7f0404c36000 ---p 0029a000 00:27 56294995363541919          /nfs/sw/dorado/dorado-0.7.0/lib/libhdf5.so.8.0.1
7f0404c36000-7f0404c3b000 r--p 00299000 00:27 56294995363541919          /nfs/sw/dorado/dorado-0.7.0/lib/libhdf5.so.8.0.1
7f0404c3b000-7f0404c42000 rw-p 0029e000 00:27 56294995363541919          /nfs/sw/dorado/dorado-0.7.0/lib/libhdf5.so.8.0.1
7f0404c42000-7f0404c43000 rw-p 00000000 00:00 0 
7f0404c43000-7f0404c5a000 r-xp 00000000 fd:00 268845786                  /usr/lib64/libpthread-2.17.so
7f0404c5a000-7f0404e59000 ---p 00017000 fd:00 268845786                  /usr/lib64/libpthread-2.17.so
7f0404e59000-7f0404e5a000 r--p 00016000 fd:00 268845786                  /usr/lib64/libpthread-2.17.so
7f0404e5a000-7f0404e5b000 rw-p 00017000 fd:00 268845786                  /usr/lib64/libpthread-2.17.so
7f0404e5b000-7f0404e5f000 rw-p 00000000 00:00 0 
7f0404e5f000-7f0405068000 r-xp 00000000 00:27 70368744199081642          /nfs/sw/dorado/dorado-0.7.0/lib/libiomp5.so
7f0405068000-7f0405268000 ---p 00209000 00:27 70368744199081642          /nfs/sw/dorado/dorado-0.7.0/lib/libiomp5.so
7f0405268000-7f040526a000 r--p 00209000 00:27 70368744199081642          /nfs/sw/dorado/dorado-0.7.0/lib/libiomp5.so
7f040526a000-7f0405274000 rw-p 0020b000 00:27 70368744199081642          /nfs/sw/dorado/dorado-0.7.0/lib/libiomp5.so
7f0405274000-7f04052a1000 rw-p 00000000 00:00 0 
7f04052a1000-7f04052a8000 r-xp 00000000 fd:00 268845798                  /usr/lib64/librt-2.17.so
7f04052a8000-7f04054a7000 ---p 00007000 fd:00 268845798                  /usr/lib64/librt-2.17.so
7f04054a7000-7f04054a8000 r--p 00006000 fd:00 268845798                  /usr/lib64/librt-2.17.so
7f04054a8000-7f04054a9000 rw-p 00007000 fd:00 268845798                  /usr/lib64/librt-2.17.so
7f04054a9000-7f0408dcf000 r--p 00000000 00:27 33214047273398690          /nfs/sw/dorado/dorado-0.7.0/lib/libdorado_torch_lib.so
7f0408dcf000-7f0416e18000 r-xp 03926000 00:27 33214047273398690          /nfs/sw/dorado/dorado-0.7.0/lib/libdorado_torch_lib.so
7f0416e18000-7f047047d000 r--p 1196f000 00:27 33214047273398690          /nfs/sw/dorado/dorado-0.7.0/lib/libdorado_torch_lib.so
7f047047d000-7f047047e000 ---p 6afd4000 00:27 33214047273398690          /nfs/sw/dorado/dorado-0.7.0/lib/libdorado_torch_lib.so
7f047047e000-7f047092c000 r--p 6afd4000 00:27 33214047273398690          /nfs/sw/dorado/dorado-0.7.0/lib/libdorado_torch_lib.so
7f047092c000-7f0470fb9000 rw-p 6b482000 00:27 33214047273398690          /nfs/sw/dorado/dorado-0.7.0/lib/libdorado_torch_lib.so
7f0470fb9000-7f0471b5d000 rw-p 00000000 00:00 0 
7f0471b5d000-7f0471b5f000 r-xp 00000000 fd:00 268845403                  /usr/lib64/libdl-2.17.so
7f0471b5f000-7f0471d5f000 ---p 00002000 fd:00 268845403                  /usr/lib64/libdl-2.17.so
7f0471d5f000-7f0471d60000 r--p 00002000 fd:00 268845403                  /usr/lib64/libdl-2.17.so
7f0471d60000-7f0471d61000 rw-p 00003000 fd:00 268845403                  /usr/lib64/libdl-2.17.so
7f0471d61000-7f0471d83000 r-xp 00000000 fd:00 268845390                  /usr/lib64/ld-2.17.so
7f0471e1b000-7f0471ebd000 rw-p 00000000 00:00 0 
7f0471ebf000-7f0471f63000 rw-p 00000000 00:00 0 
7f0471f6f000-7f0471f70000 rw-p 00000000 00:00 0 
7f0471f70000-7f0471f71000 r--s 00000000 00:05 45118                      /dev/nvidia1
7f0471f71000-7f0471f81000 -w-s 00000000 00:05 45118                      /dev/nvidia1
7f0471f81000-7f0471f82000 rw-p 00000000 00:00 0 
7f0471f82000-7f0471f83000 r--p 00021000 fd:00 268845390                  /usr/lib64/ld-2.17.so
7f0471f83000-7f0471f84000 rw-p 00022000 fd:00 268845390                  /usr/lib64/ld-2.17.so
7f0471f84000-7f0471f85000 rw-p 00000000 00:00 0 
7ffed2d7f000-7ffed2d9f000 rwxp 00000000 00:00 0                          [stack]
7ffed2d9f000-7ffed2da3000 rw-p 00000000 00:00 0 
7ffed2dbb000-7ffed2dbd000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
tijyojwad commented 4 months ago

Hi @heruiyang - based on the logs (thanks for posting those!) it looks like it could be coming from reading the data file.

Can you try the same run on cpu (with -x cpu) to rule out GPU as an issue? To minimize the time you can add --max-reads 10 so it'll just run basecalling on first 10 reads.

Do you have any other pod5 to try with as well?

gringer commented 4 months ago

I can't get it working with our direct RNA run either, looking at modified bases:

(base) gringer@musculus:/mnt/p2_temp/2024-05-21_dRNA_seq_MSF001$ x=2024-05-21_dRNA_seq_MSF001; time ~/install/ont/dorado/bin/dorado basecaller -v --no-trim -b 160 -r --device cuda:0 --emit-sam sup,m6A,pseU ${x} | samtools view -b -@ 12 > called_${x}.bam
[2024-05-23 16:55:07.307] [info] Running: "basecaller" "-v" "--no-trim" "-b" "160" "-r" "--device" "cuda:0" "--emit-sam" "sup,m6A,pseU" "2024-05-21_dRNA_seq_MSF001"
[2024-05-23 16:55:07.315] [info]  - downloading rna004_130bps_sup@v5.0.0 with httplib
[2024-05-23 16:55:11.679] [info]  - downloading rna004_130bps_sup@v5.0.0_m6A@v1 with httplib
[2024-05-23 16:55:12.166] [info]  - downloading rna004_130bps_sup@v5.0.0_pseU@v1 with httplib
[2024-05-23 16:55:12.393] [info] Normalised: overlap 500 -> 492
[2024-05-23 16:55:12.393] [info] > Creating basecall pipeline
[2024-05-23 16:55:12.393] [debug] CRFModelConfig { qscale:1.200000 qbias:2.000000 stride:6 bias:1 clamp:0 out_features:4096 state_len:5 outsize:4096 blank_score:0.000000 scale:1.000000 num_features:1 sample_rate:4000 mean_qscore_start_pos:60 SignalNormalisationParams { strategy:pa StandardisationScalingParams { standardise:1 mean:80.875900 stdev:17.269760}} BasecallerParams { chunk_size:18432 overlap:492 batch_size:160} convs: { 0: ConvParams { insize:1 size:64 winlen:5 stride:1 activation:swish} 1: ConvParams { insize:64 size:64 winlen:5 stride:1 activation:swish} 2: ConvParams { insize:64 size:128 winlen:9 stride:3 activation:swish} 3: ConvParams { insize:128 size:128 winlen:9 stride:2 activation:swish} 4: ConvParams { insize:128 size:512 winlen:5 stride:2 activation:swish}} model_type: tx { crf_encoder: CRFEncoderParams { insize:512 n_base:4 state_len:5 scale:5.000000 blank_score:2.000000 expand_blanks:1 permute:1} transformer: TxEncoderParams { d_model:512 nhead:8 depth:18 dim_feedforward:2048 deepnorm_alpha:2.449490}}}
[2024-05-23 16:55:12.398] [info]  - BAM format does not support `U`, so RNA output files will include `T` instead of `U` for all file types.
[2024-05-23 16:55:13.276] [debug] cuda:0 memory available: 11.89GB
[2024-05-23 16:55:13.276] [debug] cuda:0 memory limit 10.89GB
[2024-05-23 16:55:13.276] [debug] cuda:0 maximum safe estimated batch size at chunk size 18432 is 64
[2024-05-23 16:55:13.276] [warning] cuda:0: Requested batch size 160 exceeds maximum safe estimated batch size 64.
[2024-05-23 16:55:13.276] [debug] cuda:0 maximum safe estimated batch size at chunk size 9216 is 160
[2024-05-23 16:55:13.276] [info] cuda:0 using chunk size 18432, batch size 64
[2024-05-23 16:55:13.276] [debug] cuda:0 Model memory 6.85GB
[2024-05-23 16:55:13.276] [debug] cuda:0 Decode memory 0.83GB
[2024-05-23 16:55:13.839] [info] cuda:0 using chunk size 9216, batch size 160
[2024-05-23 16:55:13.839] [debug] cuda:0 Model memory 8.56GB
[2024-05-23 16:55:13.839] [debug] cuda:0 Decode memory 1.04GB
[2024-05-23 16:55:14.632] [debug] BasecallerNode chunk size 18432
[2024-05-23 16:55:14.632] [debug] BasecallerNode chunk size 9216
[2024-05-23 16:55:14.634] [debug] Load reads from file 2024-05-21_dRNA_seq_MSF001/20240521_1323_MN16602_FAX67050_bc42b286/pod5/FAX67050_bc42b286_09d3c77b_10.pod5
malloc(): unaligned tcache chunk detected20s<04h:11m:48s] Basecalling                                                                                                                                                                                                        

real    0m41.253s
user    0m46.517s
sys     0m9.813s                                                                                                                                                         

This error doesn't happen (within the first couple of minutes of basecalling) if I'm just looking at the m6A modified model sup,m6A, just the pseU model sup,pseU, or at no modification.

After I've done calling with pseU, I'll see about subsetting the reads from that file to find one that causes issues.

tijyojwad commented 4 months ago

Ah thanks for the insight @gringer ! That gives us something to look at. @heruiyang do you see something similar, where not running pseU resolves your issue (or not running any mods)?

HalfPhoton commented 4 months ago

Hi @gringer and @heruiyang, Can you please share you system information / GPU info? Can you also please try sup,m6A,pseU with a reduced --chunksize 9216 - this might indicate excessive memory consumption.

gringer commented 4 months ago

A smaller chunksize seems to have fixed most of the problem; it's no longer erroring out in the first minute... but it did dump out before completing everything:

(base) gringer@musculus:/mnt/p2_temp/2024-05-21_dRNA_seq_MSF001$ x=2024-05-21_dRNA_seq_MSF001; time ~/install/ont/dorado/bin/dorado basecaller --reference reference/RCS_spike_merged.fa -v --chunksize 9216 --no-trim -b 160 -r --device cuda:0 --emit-sam sup,m6A,pseU ${x} | samtools view -b -@ 12 > called_debug_${x}.bam
[2024-05-24 12:50:01.331] [info] Running: "basecaller" "--reference" "reference/RCS_spike_merged.fa" "-v" "--chunksize" "9216" "--no-trim" "-b" "160" "-r" "--device" "cuda:0" "--emit-sam" "sup,m6A,pseU" "2024-05-21_dRNA_seq_MSF001"
[2024-05-24 12:50:01.339] [info]  - downloading rna004_130bps_sup@v5.0.0 with httplib
[2024-05-24 12:50:07.144] [info]  - downloading rna004_130bps_sup@v5.0.0_m6A@v1 with httplib
[2024-05-24 12:50:08.995] [info]  - downloading rna004_130bps_sup@v5.0.0_pseU@v1 with httplib
[2024-05-24 12:50:10.679] [info] Normalised: overlap 500 -> 492
[2024-05-24 12:50:10.679] [info] > Creating basecall pipeline
[2024-05-24 12:50:10.679] [debug] CRFModelConfig { qscale:1.200000 qbias:2.000000 stride:6 bias:1 clamp:0 out_features:4096 state_len:5 outsize:4096 blank_score:0.000000 scale:1.000000 num_features:1 sample_rate:4000 mean_qscore_start_pos:60 SignalNormalisationParams { strategy:pa StandardisationScalingParams { standardise:1 mean:80.875900 stdev:17.269760}} BasecallerParams { chunk_size:9216 overlap:492 batch_size:160} convs: { 0: ConvParams { insize:1 size:64 winlen:5 stride:1 activation:swish} 1: ConvParams { insize:64 size:64 winlen:5 stride:1 activation:swish} 2: ConvParams { insize:64 size:128 winlen:9 stride:3 activation:swish} 3: ConvParams { insize:128 size:128 winlen:9 stride:2 activation:swish} 4: ConvParams { insize:128 size:512 winlen:5 stride:2 activation:swish}} model_type: tx { crf_encoder: CRFEncoderParams { insize:512 n_base:4 state_len:5 scale:5.000000 blank_score:2.000000 expand_blanks:1 permute:1} transformer: TxEncoderParams { d_model:512 nhead:8 depth:18 dim_feedforward:2048 deepnorm_alpha:2.449490}}}
[2024-05-24 12:50:10.684] [info]  - BAM format does not support `U`, so RNA output files will include `T` instead of `U` for all file types.
[2024-05-24 12:50:11.559] [debug] cuda:0 memory available: 11.89GB
[2024-05-24 12:50:11.559] [debug] cuda:0 memory limit 10.89GB
[2024-05-24 12:50:11.559] [debug] cuda:0 maximum safe estimated batch size at chunk size 9216 is 160
[2024-05-24 12:50:11.559] [debug] cuda:0 maximum safe estimated batch size at chunk size 4608 is 352
[2024-05-24 12:50:11.559] [info] cuda:0 using chunk size 9216, batch size 160
[2024-05-24 12:50:11.559] [debug] cuda:0 Model memory 8.56GB
[2024-05-24 12:50:11.559] [debug] cuda:0 Decode memory 1.04GB
[2024-05-24 12:50:12.253] [info] cuda:0 using chunk size 4608, batch size 160
[2024-05-24 12:50:12.253] [debug] cuda:0 Model memory 4.28GB
[2024-05-24 12:50:12.253] [debug] cuda:0 Decode memory 0.52GB
[2024-05-24 12:50:12.667] [debug] > Map parameters input by user: dbg print qname=false and aln seq=false.
[2024-05-24 12:50:12.668] [debug] Loaded index with 2 target seqs
[2024-05-24 12:50:12.668] [debug] BasecallerNode chunk size 9216
[2024-05-24 12:50:12.668] [debug] BasecallerNode chunk size 4608
[2024-05-24 12:50:12.670] [debug] Load reads from file 2024-05-21_dRNA_seq_MSF001/20240521_1323_MN16602_FAX67050_bc42b286/pod5/FAX67050_bc42b286_09d3c77b_10.pod5
[2024-05-24 12:57:54.128] [debug] Load reads from file 2024-05-21_dRNA_seq_MSF001/20240521_1323_MN16602_FAX67050_bc42b286/pod5/FAX67050_bc42b286_09d3c77b_5.pod5
[2024-05-24 13:14:01.782] [debug] Load reads from file 2024-05-21_dRNA_seq_MSF001/20240521_1323_MN16602_FAX67050_bc42b286/pod5/FAX67050_bc42b286_09d3c77b_2.pod5
[2024-05-24 13:30:34.355] [debug] Load reads from file 2024-05-21_dRNA_seq_MSF001/20240521_1323_MN16602_FAX67050_bc42b286/pod5/FAX67050_bc42b286_09d3c77b_1.pod5
[██████▋                       ] 22% [46m:17s<02h:41m:25s] Basecalling
real    46m50.346s
user    99m16.700s
sys     13m48.290s

I ran it on just that file which dumped me out, and it quit almost straight away. This looks like it could be a different issue...?

(base) gringer@musculus:/mnt/p2_temp/2024-05-21_dRNA_seq_MSF001$ mkdir dRNA_test
(base) gringer@musculus:/mnt/p2_temp/2024-05-21_dRNA_seq_MSF001$ cp 2024-05-21_dRNA_seq_MSF001/20240521_1323_MN16602_FAX67050_bc42b286/pod5/FAX67050_bc42b286_09d3c77b_1.pod5 dRNA_test
(base) gringer@musculus:/mnt/p2_temp/2024-05-21_dRNA_seq_MSF001$ x=dRNA_test; time ~/install/ont/dorado/bin/dorado basecaller --reference reference/RCS_spike_merged.fa -v --chunksize 9216 --no-trim -b 160 -r --device cuda:0 --emit-sam sup,m6A,pseU ${x} | samtools view -b -@ 12 > called_debug_${x}.bam
[2024-05-24 13:57:54.453] [info] Running: "basecaller" "--reference" "reference/RCS_spike_merged.fa" "-v" "--chunksize" "9216" "--no-trim" "-b" "160" "-r" "--device" "cuda:0" "--emit-sam" "sup,m6A,pseU" "dRNA_test"
[2024-05-24 13:57:54.457] [info]  - downloading rna004_130bps_sup@v5.0.0 with httplib
[2024-05-24 13:57:59.168] [info]  - downloading rna004_130bps_sup@v5.0.0_m6A@v1 with httplib
[2024-05-24 13:57:59.672] [info]  - downloading rna004_130bps_sup@v5.0.0_pseU@v1 with httplib
[2024-05-24 13:57:59.888] [info] Normalised: overlap 500 -> 492
[2024-05-24 13:57:59.888] [info] > Creating basecall pipeline
[2024-05-24 13:57:59.888] [debug] CRFModelConfig { qscale:1.200000 qbias:2.000000 stride:6 bias:1 clamp:0 out_features:4096 state_len:5 outsize:4096 blank_score:0.000000 scale:1.000000 num_features:1 sample_rate:4000 mean_qscore_start_pos:60 SignalNormalisationParams { strategy:pa StandardisationScalingParams { standardise:1 mean:80.875900 stdev:17.269760}} BasecallerParams { chunk_size:9216 overlap:492 batch_size:160} convs: { 0: ConvParams { insize:1 size:64 winlen:5 stride:1 activation:swish} 1: ConvParams { insize:64 size:64 winlen:5 stride:1 activation:swish} 2: ConvParams { insize:64 size:128 winlen:9 stride:3 activation:swish} 3: ConvParams { insize:128 size:128 winlen:9 stride:2 activation:swish} 4: ConvParams { insize:128 size:512 winlen:5 stride:2 activation:swish}} model_type: tx { crf_encoder: CRFEncoderParams { insize:512 n_base:4 state_len:5 scale:5.000000 blank_score:2.000000 expand_blanks:1 permute:1} transformer: TxEncoderParams { d_model:512 nhead:8 depth:18 dim_feedforward:2048 deepnorm_alpha:2.449490}}}
[2024-05-24 13:57:59.888] [info]  - BAM format does not support `U`, so RNA output files will include `T` instead of `U` for all file types.
[2024-05-24 13:58:00.780] [debug] cuda:0 memory available: 11.89GB
[2024-05-24 13:58:00.780] [debug] cuda:0 memory limit 10.89GB
[2024-05-24 13:58:00.780] [debug] cuda:0 maximum safe estimated batch size at chunk size 9216 is 160
[2024-05-24 13:58:00.780] [debug] cuda:0 maximum safe estimated batch size at chunk size 4608 is 352
[2024-05-24 13:58:00.780] [info] cuda:0 using chunk size 9216, batch size 160
[2024-05-24 13:58:00.780] [debug] cuda:0 Model memory 8.56GB
[2024-05-24 13:58:00.780] [debug] cuda:0 Decode memory 1.04GB
[2024-05-24 13:58:01.472] [info] cuda:0 using chunk size 4608, batch size 160
[2024-05-24 13:58:01.472] [debug] cuda:0 Model memory 4.28GB
[2024-05-24 13:58:01.472] [debug] cuda:0 Decode memory 0.52GB
[2024-05-24 13:58:01.881] [debug] > Map parameters input by user: dbg print qname=false and aln seq=false.
[2024-05-24 13:58:01.882] [debug] Loaded index with 2 target seqs
[2024-05-24 13:58:01.883] [debug] BasecallerNode chunk size 9216
[2024-05-24 13:58:01.883] [debug] BasecallerNode chunk size 4608
[2024-05-24 13:58:01.884] [debug] Load reads from file dRNA_test/FAX67050_bc42b286_09d3c77b_1.pod5
terminate called recursively   ] 0% [00m:01s<14m:42s] Basecalling                                                                                                                                            
terminate called after throwing an instance of 'c10::Error'

real    0m24.887s
user    0m8.420s
sys     0m4.285s
gringer commented 4 months ago

This seems to be an issue with the very last read in the pod5 file. When I subset to 200, 2000, or 10,000 reads, it will call the file with no issues. It runs into problems when I call all reads, chucking me out almost immediately. As mentioned previously, the basecaller also has no problem when doing individual modification calling or no-modification calling; this is just an issue with the combined calling.

FWIW, the pod5 file that is causing problems is 377 MB, representing what seems to be two hours worth of sequencing - or possibly one hour; there are only a few reads with a sample start time less than 60 minutes, most are between 60 and 120 minutes. This was from a MinION direct RNA run.

HalfPhoton commented 4 months ago

Are you able to narrow down the problematic reads further and share a few of them privately with us? We can do a deeper dive on this internally and get back to you with a solution.

Kind regards, Rich

gringer commented 4 months ago

@HalfPhoton Yes, I can do this. I've got signal data from 10 reads that I can share; the pod5 file size is about 360 kB. Please let me know how to share these files with you.

One read is longer than the failing read.

gringer commented 4 months ago

Possibly related:

HalfPhoton commented 4 months ago

@gringer, thanks for collecting these reads. I'll send instructions on how to share the pod5 data with us shortly

HalfPhoton commented 4 months ago

@gringer, if you're happy to share these reads publicly you can share them on GH

gringer commented 4 months ago

Sorry, I can't share these reads publicly on GitHub. The issues don't seem to be happening with the RCS/enolase reads we have from the same sequencing run, so I don't think it'd be useful sharing those reads.