sklages commented 1 year ago

I do have an issue with dorado built from source, currently 0.1.1+b1f85dc, but previous versions are affected as well.

On our A10, A100 and V100 I always end up getting a coredump (Segmentation fault) on PROM datasets using modified base calling in step read processing:

DATA=/path/to/podfile

dorado basecaller \
  $(pwd)/dna_r9.4.1_e8_fast@v3.4 \
  $DATA \
  --modified-bases 5mCG \
  --verbose > basecalls.sam

[2023-02-06 13:49:57.548] [debug] - matching modification model found: 
dna_r9.4.1_e8_fast@v3.4_5mCG@v0
[2023-02-06 13:49:57.548] [info] > Creating basecall pipeline
[2023-02-06 13:50:01.445] [debug] - available GPU memory 41GB
[2023-02-06 13:50:05.882] [debug] - selected batchsize 4032
> Reads processed: 5300Speicherzugriffsfehler (core dumped)

or

DATA=/path/to/podfile

dorado basecaller \
  $(pwd)/dna_r9.4.1_e8_fast@v3.4 \
  $DATA \
  --modified-bases-models $(pwd)/dna_r9.4.1_e8_fast@v3.4_5mCG@v0 \
  --verbose > basecalls.sam

[2023-02-06 13:59:12.683] [info] > Creating basecall pipeline
[2023-02-06 13:59:23.457] [debug] - available GPU memory 41GB
[2023-02-06 13:59:28.846] [debug] - selected batchsize 4096
> Reads processed: 24900Speicherzugriffsfehler (core dumped)

"Problem cards":

NVIDIA A100-PCIE-40GB
Tesla V100-PCIE-16GB
NVIDIA A10 (24G)

No problems with "standard" cards, e.g.:

NVIDIA GeForce RTX 2080 (8G, 12G (Ti))

It doesn't matter if supplied with fast, hac or sup models. Pre-compiled binary from github generates the same error. All systems use Driver Version: 510.60.02 and CUDA Version: 11.6.

POD5 files have been created like so:

RUN_ID=20210726_XXX_FLO-PRO002_SQK-LSK109_XXX
pod5 \
  convert \
  fast5 \
  /path/to/$RUN_ID/reads/ \ 
  ${RUN_ID}.pod5

Converting 324 fast5 files..
23800 reads,     3.9 GSamples,   1/324 files,    522.2 MB/s
<..>
1293604 reads,   263.2 GSamples,         324/324 files,  472.8 MB/s
Conversion complete: 263162067678 samples

Creating a POD5 file from a small subset fast5 files (here 4) or so runs fine:

[2023-02-06 15:53:47.026] [info] > Creating basecall pipeline
[2023-02-06 15:53:50.555] [debug] - available GPU memory 41GB
[2023-02-06 15:53:55.601] [debug] - selected batchsize 3984
[2023-02-06 15:54:27.245] [info] > Reads basecalled: 16000
[2023-02-06 15:54:27.246] [info] > Samples/s: 4.900372e+07
[2023-02-06 15:54:27.246] [info] > Finished

Creating one POD5 per one fast5 and providing a directory with 10 such small POD5 files coredumps ..

So I'm pretty clueless where to start looking for the problem. I did see #67 .. I guess I'm missing something very fundamental ...

Any idea?

iiSeymour commented 1 year ago

On our A10, A100 and V100 I always end up getting a coredump (Segmentation fault) on PROM datasets using modified base calling

@sklages do you also see this without modified base calling?

Creating one POD5 per one fast5 and providing a directory with 10 such small POD5 files coredumps ..

Can you share this data here https://nanoporetech.ent.box.com/f/7c4375e2b71b48258ebed29f198b89ab?

sklages commented 1 year ago

On our A10, A100 and V100 I always end up getting a coredump (Segmentation fault) on PROM datasets using modified base calling

@sklages do you also see this without modified base calling?

With or without makes no (big) difference.

# modified bases, small dataset
€ dorado basecaller $(pwd)/dna_r9.4.1_e8_fast@v3.4 $DATA/ --modified-bases-models $(pwd)/dna_r9.4.1_e8_fast@v3.4_5mCG@v0 --verbose > /dev/null
[2023-02-06 20:54:18.867] [info] > Creating basecall pipeline
[2023-02-06 20:54:21.995] [debug] - available GPU memory 41GB
[2023-02-06 20:54:26.620] [debug] - selected batchsize 4096
[2023-02-06 20:55:31.865] [info] > Reads basecalled: 40000
[2023-02-06 20:55:31.865] [info] > Samples/s: 5.685785e+07
[2023-02-06 20:55:31.865] [info] > Finished

# .. waiting 2 minutes, cursur up, return:
€ dorado basecaller $(pwd)/dna_r9.4.1_e8_fast@v3.4 $DATA/ --modified-bases-models $(pwd)/dna_r9.4.1_e8_fast@v3.4_5mCG@v0 --verbose > /dev/null
[2023-02-06 20:56:47.408] [info] > Creating basecall pipeline
[2023-02-06 20:56:50.778] [debug] - available GPU memory 41GB
[2023-02-06 20:56:55.241] [debug] - selected batchsize 4096
> Reads processed: 1500Speicherzugriffsfehler (core dumped)

Further basecalling runs with the same dataset ("cursor up, return") are coredumping as well with different numbers of reads being processed. Waiting a few minutes makes two dorado jobs finish, the third and the fourth ones fail (core dump).

Without modified base calling it is similar:

# no modified bases, small dataset
€ dorado basecaller $(pwd)/dna_r9.4.1_e8_fast@v3.4 $DATA/ --verbose > /dev/null
[2023-02-06 21:33:57.693] [info] > Creating basecall pipeline
[2023-02-06 21:34:00.808] [debug] - available GPU memory 41GB
[2023-02-06 21:34:05.331] [debug] - selected batchsize 4080
> Reads processed: 28800Speicherzugriffsfehler (core dumped)

# .. waiting 2 minutes, cursur up, return:
€ dorado basecaller $(pwd)/dna_r9.4.1_e8_fast@v3.4 $DATA/ --verbose > /dev/null
[2023-02-06 21:36:01.017] [info] > Creating basecall pipeline
[2023-02-06 21:36:04.160] [debug] - available GPU memory 41GB
[2023-02-06 21:36:08.785] [debug] - selected batchsize 4096
[2023-02-06 21:36:41.701] [info] > Reads basecalled: 40000
[2023-02-06 21:36:41.701] [info] > Samples/s: 1.154801e+08
[2023-02-06 21:36:41.701] [info] > Finished

Creating one POD5 per one fast5 and providing a directory with 10 such small POD5 files coredumps ..

Can you share this data here https://nanoporetech.ent.box.com/f/7c4375e2b71b48258ebed29f198b89ab?

Yes, the small 10-files-dataset has been uploaded .. thank you

sklages commented 1 year ago

@iiSeymour - Have you been able to reproduce the behavior of dorado with this dataset?

iiSeymour commented 1 year ago

Hey @sklages I had a look yesterday and unfortunately I haven't been able to reproduce any issues. I tried on A100 and called your set a dozen or so times - I can try some different nodes/GPUs but so far the reads look okay.

How much system RAM does your node have?

sklages commented 1 year ago

@iiSeymour - the small A10 around 256G, the A100 384G. Memory is not an issue, AFAICS. Some smaller GeForce machines have less memory.

sklages commented 1 year ago

@iiSeymour -

A colleague of mine provided me with some info on the core dump of a hac dorado run. Seems that the segmentation fault occurs at/after dorado::RemoraEncoder::get_context (#11).

Core was generated by `build/bin/dorado basecaller models/dna_r9.4.1_e8_hac@v3.3 DATA --num_runners 2'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  __memmove_avx_unaligned () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:232
232     ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: No such file or directory.
[Current thread is 1 (Thread 0x7ff8acfff000 (LWP 17219))]
(gdb) bt
#0  __memmove_avx_unaligned () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:232
#1  0x000000000054242c in std::__copy_move<false, true, std::random_access_iterator_tag>::__copy_m<int> (__result=<optimized out>, __last=0x7fcb3cc31004, __first=0x7fcb3cc30fcc) at /usr/include/c++/10.4.0/bits/stl_algobase.h:560
#2  std::__copy_move_a2<false, int const*, int*> (__result=<optimized out>, __last=0x7fcb3cc31004, __first=0x7fcb3cc30fcc) at /usr/include/c++/10.4.0/bits/stl_algobase.h:472
#3  std::__copy_move_a1<false, int const*, int*> (__result=<optimized out>, __last=0x7fcb3cc31004, __first=0x7fcb3cc30fcc) at /usr/include/c++/10.4.0/bits/stl_algobase.h:506
#4  std::__copy_move_a<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*> (__result=<optimized out>, __last=..., __first=...) at /usr/include/c++/10.4.0/bits/stl_algobase.h:513
#5  std::copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*> (__result=<optimized out>, __last=..., __first=...) at /usr/include/c++/10.4.0/bits/stl_algobase.h:569
#6 std::__uninitialized_copy<true>::__uninit_copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*> (__result=<optimized out>, __last=..., __first=...) at /usr/include/c++/10.4.0/bits/stl_uninitialized.h:109
#7  std::uninitialized_copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*> (__result=<optimized out>, __last=..., __first=...) at /usr/include/c++/10.4.0/bits/stl_uninitialized.h:150
#8  std::__uninitialized_copy_a<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*, int> (__result=<optimized out>, __last=..., __first=...) at /usr/include/c++/10.4.0/bits/stl_uninitialized.h:325
#9  std::vector<int, std::allocator<int>  >::_M_range_initialize<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > > (__last=..., __first=..., this=0x7ff8acfe3600) at /usr/include/c++/10.4.0/bits/stl_vector.h:1585
#10 std::vector<int, std::allocator<int>  >::vector<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, void> (__a=..., __last=..., __first=..., this=0x7ff8acfe3600) at /usr/include/c++/10.4.0/bits/stl_vector.h:657
#11 dorado::RemoraEncoder::get_context (this=this@entry=0x7ff8acfe37f0, seq_pos=seq_pos@entry=1018) at /scratch/local2/xxxx/dorado_chk/v0.1.1.b1f85dc/dorado/modbase/remora_encoder.cpp:97
#12 0x0000000000522e16 in dorado::ModBaseCallerNode::runner_worker_thread (this=0x7ff8b5ad0400, runner_id=<optimized out>) at /scratch/local2/xxxx/dorado_chk/v0.1.1.b1f85dc/dorado/read_pipeline/ModBaseCallerNode.cpp:193
#13 0x00007ffa07f01c10 in std::execute_native_thread_routine (__p=0x7ff879007660) at /dev/shm/xxxx/gcc/gcc-10.4.0-0/source/libstdc++-v3/src/c++11/thread.cc:80
#14 0x00007ff9cfea6fca in start_thread (arg=<optimized out>) at pthread_create.c:442
#15 0x00007ff9cff263dc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Maybe that may help?

sklages commented 1 year ago

Compiled new v0.2.1, considering https://github.com/nanoporetech/dorado/issues/93#issuecomment-1440367038 still raises a core dump ..

€ dorado --version
0.2.1+7bf2be4

€ dorado basecaller \
  $(pwd)/dna_r9.4.1_e8_hac@v3.3 \
  $DATA \
  --modified-bases-models $(pwd)/dna_r9.4.1_e8_hac@v3.3_5mCG@v0 \
  --verbose \
  > /dev/null

[2023-02-27 16:27:37.032] [info] > Creating basecall pipeline
[2023-02-27 16:27:44.750] [debug] - selected batchsize 2304
> Reads processed: 13300Speicherzugriffsfehler (core dumped)

Any ideas here where to start looking for the underlying issue/problem?

sklages commented 1 year ago

@iiSeymour -- dorado has been built with cuda-11.7.1

modified base model, A100, increasing batchsize

€ dorado basecaller $MODELS/dna_r9.4.1_e8_hac@v3.3 $DATA --modified-bases 5mCG --verbose --batchsize 2000 > /dev/null
[2023-03-06 10:46:10.258] [debug] - matching modification model found: dna_r9.4.1_e8_hac@v3.3_5mCG@v0
[2023-03-06 10:46:10.258] [info] > Creating basecall pipeline
[2023-03-06 10:46:12.289] [debug] - selected batchsize 2000
[=========>                    ] 33% [00m:36s<01m:13s] Segmentation fault (core dumped)

€ dorado basecaller $MODELS/dna_r9.4.1_e8_hac@v3.3 $DATA --modified-bases 5mCG --verbose --batchsize 50 > /dev/null
[2023-03-06 11:18:47.116] [debug] - matching modification model found: dna_r9.4.1_e8_hac@v3.3_5mCG@v0
[2023-03-06 11:18:47.116] [info] > Creating basecall pipeline
[2023-03-06 11:18:49.149] [debug] - selected batchsize 50
[================>             ] 56% [10m:04s<07m:54s] Segmentation fault (core dumped)

standard model, A100, increasing batchsize

€ dorado basecaller $MODELS/dna_r9.4.1_e8_hac@v3.3 $DATA --verbose --batchsize 50 > /dev/null
[2023-03-06 11:33:50.700] [info] > Creating basecall pipeline
[2023-03-06 11:33:52.731] [debug] - selected batchsize 50
[2023-03-06 11:52:42.274] [info] > Reads basecalled: 40000
[2023-03-06 11:52:42.274] [info] > Samples/s: 3.203185e+06
[2023-03-06 11:52:42.274] [info] > Finished

€ dorado basecaller $MODELS/dna_r9.4.1_e8_hac@v3.3 $DATA --verbose --batchsize 500 > /dev/null
[2023-03-06 13:19:20.115] [info] > Creating basecall pipeline
[2023-03-06 13:19:22.286] [debug] - selected batchsize 500
[2023-03-06 13:22:43.443] [info] > Reads basecalled: 40000
[2023-03-06 13:22:43.443] [info] > Samples/s: 2.234279e+07
[2023-03-06 13:22:43.443] [info] > Finished

€ dorado basecaller $MODELS/dna_r9.4.1_e8_hac@v3.3 $DATA --verbose --batchsize 1000 > /dev/null
[2023-03-06 15:16:46.582] [info] > Creating basecall pipeline
[2023-03-06 15:16:48.705] [debug] - selected batchsize 1000
[2023-03-06 15:19:11.937] [info] > Reads basecalled: 40000
[2023-03-06 15:19:11.937] [info] > Samples/s: 3.069715e+07
[2023-03-06 15:19:11.937] [info] > Finished

€ dorado basecaller $MODELS/dna_r9.4.1_e8_hac@v3.3 $DATA --verbose --batchsize 2000 > /dev/null
[2023-03-06 15:40:09.114] [info] > Creating basecall pipeline
[2023-03-06 15:40:11.189] [debug] - selected batchsize 2000
[=====>                        ] 19% [00m:18s<01m:18s] Segmentation fault (core dumped)

System

GPU 0: NVIDIA A100-PCIE-40GB (UUID: GPU-<...>)

€ /path/to/dorado --version
0.2.1+b304395

Storage: 15TB free

Dell PowerEdge R7525
32x AMD EPYC 7F32
RAM 386.3 GB
glibc  2.36
kernel 5.15.77

€ ulimit -a
real-time non-blocking time  (microseconds, -R) unlimited
core file size              (blocks, -c) unlimited
data seg size               (kbytes, -d) unlimited
scheduling priority                 (-e) 0
file size                   (blocks, -f) unlimited
pending signals                     (-i) 1545214
max locked memory           (kbytes, -l) 64
max memory size             (kbytes, -m) unlimited
open files                          (-n) 1024
pipe size                (512 bytes, -p) 8
POSIX message queues         (bytes, -q) 819200
real-time priority                  (-r) 0
stack size                  (kbytes, -s) 8192
cpu time                   (seconds, -t) unlimited
max user processes                  (-u) 1545214
virtual memory              (kbytes, -v) unlimited
file locks                          (-x) unlimited

Single NVIDIA GeForce RTX 2080 Ti (11G) runs fine .. even with modified base calling.

Maybe this helps to get an idea what happens on our A100s?

sklages commented 1 year ago

The binary version in "Community Downloads" of the ONT website results in the same error (it still downloads the wrong modified base model 'dna_r9.4.1_e8_hac@v3.4_5mCG@v0').

€ dorado --version
0.2.1+c70423e

€ dorado basecaller $(pwd)/dna_r9.4.1_e8_hac@v3.3 $DATA --modified-bases-models $(pwd)/dna_r9.4.1_e8_hac@v3.3_5mCG@v0 --verbose --batchsize 2000 > /dev/null
[2023-03-06 16:33:52.520] [info] > Creating basecall pipeline
[2023-03-06 16:33:54.576] [debug] - selected batchsize 2000
> Reads processed: 9000Segmentation fault (core dumped)

So it is probably not a problem building dorado from source :-(

sklages commented 1 year ago

@iiSeymour - is there anything you'd suggest in order to find the problem with dorado in combination with our NVIDIA A100-PCIE-40GB?

Some key versions:

Found CUDAToolkit: /pkg/cuda-11.7.1-0/include (found version "11.7.99")
The CUDA compiler identification is NVIDIA 11.7.99
Downloading libkoi-0.2.2-Linux-x86_64-cuda-11.7
Found Python3: /usr/bin/python3 (found version "3.10.8")
TORCH_BUILD_VERSION: 1.13.1+cu117

Nvidia driver version on A100 systems is: 510.60.02 ..

Any hints are appreciated :-)

sklages commented 1 year ago

@iiSeymour - well, a final wild guess .. this is possibly not a GPU issue, as I run into this issue with --device cpu as well.

dataset: subset of 20K sequences. Takes ~1m to finish on a GPU, ~1h45m in "cpu-only mode" (fast models) with modified-bases calling.

AMD EPYC 7F32/AMD EPYC 7343 + NVIDIA A100-PCIE-40GB

--device cpu: 3 of 8 jobs failed, "Segmentation fault (core dumped)"
--device cuda: 2-4 of 12 jobs failed, "Segmentation fault (core dumped)"
without modified-bases the tiny dataset runs fine (12/12 finished), the small 40K dataset fails 5 times (of 12), both on GPU

Intel Xeon Gold 6242 + NVIDIA GeForce RTX 2080 Ti

--device cpu: successfully finished
--device cuda: successfully finished

CPU load is roughly between 300-500% (3-5 cores at a time) on all systems, even when using --device cpu. Segfaults seem to occur randomly ..

I am not an expert on that, but I could imagine the problem might be related with:

multi-threading (on AMD CPUs)?
some special (Intel?) CPU extension used by dorado?

.. there is not much more I could test IMHO ..

addendum (workstations/server without GPU):

no issues (cpu-only mode)

Intel i7-9700
randomly coredumping (cpu-only mode)
AMD EPYC 7601
AMD EPYC 7551

iiSeymour commented 1 year ago

@sklages interesting, thanks for the extra information on AMD vs Intel - I will see if I can reproduce it on an AMD system.

If I still can't reproduce this then I can send you instructions on how to create a debug build for running under the debugger so we can catch exactly where the segfault occurs.

sklages commented 1 year ago

@iiSeymour - Have you been able to reproduce the (randomly occurring) segfaults on an AMD (EPYC) system?

sklages commented 1 year ago

@iiSeymour - I have lowered the optimisation level for building by setting CMAKE_BUILD_TYPE=Debug. The results are the same, in a loop of 20 dorado calls, 7 were segfaulting on an A100 system with AMD EPYC 7F32..

iiSeymour commented 1 year ago

@sklages could you run under gdb and get a backtrace of where the segfault occurs https://stackoverflow.com/a/2876374/1066031

sklages commented 1 year ago

@iiSeymour - this is how the backtrace looks like. Again, run on an A100 system with AMD CPU:

<...>
    sv:Z:quantile   RG:Z:a277d2ff-ab15-4e36-a790-88e8396951c8_dna_r9.4.1_e8_fast@v3.4

Thread 52 "dorado" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ff543bea000 (LWP 4705)]
svb16::decode_sse<short, true, true> (out=<optimized out>, keys=keys@entry=0x7ff4e0126800 "\001", data=0x7ff4e0129fbc "\003\a\f\003\r\030", data@entry=0x7ff4e0126e39 "\312\003\003", count=12739, prev=prev@entry=0) at /builds/ENzUzips/0/minknow/pod5-file-format/c++/pod5_format/svb16/decode_x64.hpp:186
186 /builds/ENzUzips/0/minknow/pod5-file-format/c++/pod5_format/svb16/decode_x64.hpp: No such file or directory.
(gdb)

(gdb) backtrace
#0  svb16::decode_sse<short, true, true> (out=<optimized out>, keys=keys@entry=0x7ff4e0126800 "\001", data=0x7ff4e0129fbc "\003\a\f\003\r\030", data@entry=0x7ff4e0126e39 "\312\003\003", count=12739, prev=prev@entry=0) at /builds/ENzUzips/0/minknow/pod5-file-format/c++/pod5_format/svb16/decode_x64.hpp:186
#1  0x00000000007beb6b in svb16::decode<short, true, true> (prev=0, count=<optimized out>, in=0x7ff4e0126800 "\001", out=<optimized out>) at /builds/ENzUzips/0/minknow/pod5-file-format/c++/pod5_format/svb16/decode.hpp:20
#2  pod5::decompress_signal (compressed_bytes=..., pool=<optimized out>, destination=...) at /builds/ENzUzips/0/minknow/pod5-file-format/c++/pod5_format/signal_compression.cpp:93
#3  0x00000000007c22cf in pod5::SignalTableRecordBatch::extract_signal_row (this=0x7ff543bd6c70, row_index=274, samples=...) at /builds/ENzUzips/0/minknow/pod5-file-format/c++/pod5_format/signal_table_reader.cpp:94
#4  0x00000000007c6452 in pod5::SignalTableReader::extract_samples (this=0x7fff53325418, row_indices=..., output_samples=...) at /builds/ENzUzips/0/minknow/pod5-file-format/c++/pod5_format/signal_table_reader.cpp:242
#5  0x000000000076c151 in pod5::FileReaderImpl::extract_samples (this=<optimized out>, row_indices=..., output_samples=...) at /builds/ENzUzips/0/minknow/pod5-file-format/c++/pod5_format/file_reader.cpp:107
#6  0x0000000000746575 in pod5_get_read_complete_signal (reader=0x7ffe84600050, batch=0x7ff543bd6d60, batch_row=<optimized out>, sample_count=12739, signal=0x7ff543bd6d38) at /builds/ENzUzips/0/minknow/pod5-file-format/c++/pod5_format/c_api.cpp:844 
#7  0x0000000000649120 in (anonymous namespace)::process_pod5_read (row=1379, batch=0x7ffe4bf67da0, file=0x7ffe84600050, path=..., device=...) at /scratch/local2/klages/dorado/dorado/dorado/data_loader/DataLoader.cpp:103
#8  0x000000000065e8e6 in std::__invoke_impl<std::shared_ptr<dorado::Read>, std::shared_ptr<dorado::Read> (*&)(unsigned long, Pod5ReadRecordBatch*, Pod5FileReader*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), unsigned long&, Pod5ReadRecordBatch*&, Pod5FileReader*&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&> (__f=@0x7ffe4d73bbb8: 0x648e9a <(anonymous namespace)::process_pod5_read(size_t, Pod5ReadRecordBatch*, Pod5FileReader*, std::string, std::string)>) at /usr/include/c++/10.4.0/bits/invoke.h:60
#9  0x000000000065e4f1 in std::__invoke<std::shared_ptr<dorado::Read> (*&)(unsigned long, Pod5ReadRecordBatch*, Pod5FileReader*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), unsigned long&, Pod5ReadRecordBatch*&, Pod5FileReader*&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&> (__fn=@0x7ffe4d73bbb8: 0x648e9a <(anonymous namespace)::process_pod5_read(size_t, Pod5ReadRecordBatch*, Pod5FileReader*, std::string, std::string)>) at /usr/include/c++/10.4.0/bits/invoke.h:96
#10 0x000000000065e10a in std::_Bind<std::shared_ptr<dorado::Read> (*(unsigned long, Pod5ReadRecordBatch*, Pod5FileReader*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >))(unsigned long, Pod5ReadRecordBatch*, Pod5FileReader*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)>::__call<std::shared_ptr<dorado::Read>, , 0ul, 1ul, 2ul, 3ul, 4ul>(std::tuple<>&&, std::_Index_tuple<0ul, 1ul, 2ul, 3ul, 4ul>) (this=0x7ffe4d73bbb8, __args=...) at /usr/include/c++/10.4.0/functional:418
#11 0x000000000065dc56 in std::_Bind<std::shared_ptr<dorado::Read> (*(unsigned long, Pod5ReadRecordBatch*, Pod5FileReader*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >))(unsigned long, Pod5ReadRecordBatch*, Pod5FileReader*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)>::operator()<, std::shared_ptr<dorado::Read> >() (this=0x7ffe4d73bbb8) at /usr/include/c++/10.4.0/functional:501
#12 0x000000000065da01 in std::__invoke_impl<std::shared_ptr<dorado::Read>, std::_Bind<std::shared_ptr<dorado::Read> (*(unsigned long, Pod5ReadRecordBatch*, Pod5FileReader*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >))(unsigned long, Pod5ReadRecordBatch*, Pod5FileReader*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)>&>(std::__invoke_other, std::_Bind<std::shared_ptr<dorado::Read> (*(unsigned long, Pod5ReadRecordBatch*, Pod5FileReader*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >))(unsigned long, Pod5ReadRecordBatch*, Pod5FileReader*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)>&) (__f=...) at /usr/include/c++/10.4.0/bits/invoke.h:60
#13 0x000000000065d778 in std::__invoke_r<std::shared_ptr<dorado::Read>, std::_Bind<std::shared_ptr<dorado::Read> (*(unsigned long, Pod5ReadRecordBatch*, Pod5FileReader*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >))(unsigned long, Pod5ReadRecordBatch*, Pod5FileReader*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)>&>(std::_Bind<std::shared_ptr<dorado::Read> (*(unsigned long, Pod5ReadRecordBatch*, Pod5FileReader*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >))(unsigned long, Pod5ReadRecordBatch*, Pod5FileReader*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)>&) (__fn=...) at /usr/include/c++/10.4.0/bits/invoke.h:115
#14 0x000000000065d074 in std::__future_base::_Task_state<std::_Bind<std::shared_ptr<dorado::Read> (*(unsigned long, Pod5ReadRecordBatch*, Pod5FileReader*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >))(unsigned long, Pod5ReadRecordBatch*, Pod5FileReader*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)>, std::allocator<int>, std::shared_ptr<dorado::Read> ()>::_M_run()::{lambda()#1}::operator()() const (this=0x7ffe4d73bb90) at /usr/include/c++/10.4.0/future:1457
#15 0x000000000065e58e in std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<std::shared_ptr<dorado::Read> >, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<std::_Bind<std::shared_ptr<dorado::Read> (*(unsigned long, Pod5ReadRecordBatch*, Pod5FileReader*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >))(unsigned long, Pod5ReadRecordBatch*, Pod5FileReader*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)>, std::allocator<int>, std::shared_ptr<dorado::Read> ()>::_M_run()::{lambda()#1}, std::shared_ptr<dorado::Read> >::operator()() const (this=0x7ff543bd7620) at /usr/include/c++/10.4.0/future:1374
#16 0x000000000065e184 in std::__invoke_impl<std::unique_ptr<std::__future_base::_Result<std::shared_ptr<dorado::Read> >, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<std::shared_ptr<dorado::Read> >, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<std::_Bind<std::shared_ptr<dorado::Read> (*(unsigned long, Pod5ReadRecordBatch*, Pod5FileReader*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >))(unsigned long, Pod5ReadRecordBatch*, Pod5FileReader*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)>, std::allocator<int>, std::shared_ptr<dorado::Read> ()>::_M_run()::{lambda()#1}, std::shared_ptr<dorado::Read> >&>(std::__invoke_other, std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<std::shared_ptr<dorado::Read> >, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<std::_Bind<std::shared_ptr<dorado::Read> (*(unsigned long, Pod5ReadRecordBatch*, Pod5FileReader*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >))(unsigned long, Pod5ReadRecordBatch*, Pod5FileReader*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)>, std::allocator<int>, std::shared_ptr<dorado::Read> ()>::_M_run()::{lambda()#1}, std::shared_ptr<dorado::Read> >&) (__f=...) at /usr/include/c++/10.4.0/bits/invoke.h:60
#17 0x000000000065dd05 in std::__invoke_r<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<std::shared_ptr<dorado::Read> >, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<std::_Bind<std::shared_ptr<dorado::Read> (*(unsigned long, Pod5ReadRecordBatch*, Pod5FileReader*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >))(unsigned long, Pod5ReadRecordBatch*, Pod5FileReader*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)>, std::allocator<int>, std::shared_ptr<dorado::Read> ()>::_M_run()::{lambda()#1}, std::shared_ptr<dorado::Read> >&>(std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<std::shared_ptr<dorado::Read> >, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<std::_Bind<std::shared_ptr<dorado::Read> (*(unsigned long, Pod5ReadRecordBatch*, Pod5FileReader*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >))(unsigned long, Pod5ReadRecordBatch*, Pod5FileReader*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)>, std::allocator<int>, std::shared_ptr<dorado::Read> ()>::_M_run()::{lambda()#1}, std::shared_ptr<dorado::Read> >&) (__fn=...) at /usr/include/c++/10.4.0/bits/invoke.h:115
#18 0x000000000065da83 in std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<std::shared_ptr<dorado::Read> >, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<std::_Bind<std::shared_ptr<dorado::Read> (*(unsigned long, Pod5ReadRecordBatch*, Pod5FileReader*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >))(unsigned long, Pod5ReadRecordBatch*, Pod5FileReader*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)>, std::allocator<int>, std::shared_ptr<dorado::Read> ()>::_M_run()::{lambda()#1}, std::shared_ptr<dorado::Read> > >::_M_invoke(std::_Any_data const&) (__functor=...) at /usr/include/c++/10.4.0/bits/std_function.h:292
#19 0x00000000005e02cb in std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>::operator()() const (this=0x7ff543bd7620) at /usr/include/c++/10.4.0/bits/std_function.h:622
#20 0x00000000005deabd in std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) (this=0x7ffe4d73bb90, __f=0x7ff543bd7620, __did_set=0x7ff543bd7587) at /usr/include/c++/10.4.0/future:572
#21 0x00000000005e30eb in std::__invoke_impl<void, void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::__invoke_memfun_deref, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&) (__f=@0x7ff543bd75a0: (void (std::__future_base::_State_baseV2::*)(std::__future_base::_State_baseV2 * const, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter>()> *, bool *)) 0x5dea96 <std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*)>, __t=@0x7ff543bd7598: 0x7ffe4d73bb90) at /usr/include/c++/10.4.0/bits/invoke.h:73
#22 0x00000000005e1a5b in std::__invoke<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&) (__fn=@0x7ff543bd75a0: (void (std::__future_base::_State_baseV2::*)(std::__future_base::_State_baseV2 * const, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter>()> *, bool *)) 0x5dea96 <std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*)>) at /usr/include/c++/10.4.0/bits/invoke.h:95
#23 0x00000000005dff16 in std::call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&)::{lambda()#1}::operator()() const (this=0x7ff543bd7520) at /usr/include/c++/10.4.0/mutex:717
#24 0x00000000005dff41 in std::call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&)::{lambda()#2}::operator()() const (this=0x0) at /usr/include/c++/10.4.0/mutex:722
#25 0x00000000005dff52 in std::call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&)::{lambda()#2}::_FUN() () at /usr/include/c++/10.4.0/mutex:722
#26 0x00007fff81eabf10 in __pthread_once_slow (once_control=0x7ffe4d73bba8, init_routine=0x42a080 <__once_proxy@plt>) at pthread_once.c:116
#27 0x00000000005de325 in __gthread_once (__once=0x7ffe4d73bba8, __func=0x42a080 <__once_proxy@plt>) at /usr/include/c++/10.4.0/x86_64-pc-linux-gnu/bits/gthr-default.h:700
#28 0x00000000005dffde in std::call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&) (__once=..., __f=@0x7ff543bd75a0: (void (std::__future_base::_State_baseV2::*)(std::__future_base::_State_baseV2 * const, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter>()> *, bool *)) 0x5dea96 <std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*)>) at /usr/include/c++/10.4.0/mutex:729
#29 0x00000000005de713 in std::__future_base::_State_baseV2::_M_set_result(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>, bool) (this=0x7ffe4d73bb90, __res=..., __ignore_failure=false) at /usr/include/c++/10.4.0/future:412
#30 0x000000000065d0d6 in std::__future_base::_Task_state<std::_Bind<std::shared_ptr<dorado::Read> (*(unsigned long, Pod5ReadRecordBatch*, Pod5FileReader*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >))(unsigned long, Pod5ReadRecordBatch*, Pod5FileReader*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)>, std::allocator<int>, std::shared_ptr<dorado::Read> ()>::_M_run() (this=0x7ffe4d73bb90) at /usr/include/c++/10.4.0/future:1459
#31 0x00000000006569ef in std::packaged_task<std::shared_ptr<dorado::Read> ()>::operator()() (this=0x7ffe4d4d24b0) at /usr/include/c++/10.4.0/future:1592
#32 0x00000000006542f6 in cxxpool::thread_pool::push<std::shared_ptr<dorado::Read> (&)(unsigned long, Pod5ReadRecordBatch*, Pod5FileReader*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), unsigned long&, Pod5ReadRecordBatch*&, Pod5FileReader*&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&>(unsigned long, std::shared_ptr<dorado::Read> (&)(unsigned long, Pod5ReadRecordBatch*, Pod5FileReader*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), unsigned long&, Pod5ReadRecordBatch*&, Pod5FileReader*&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)::{lambda()#1}::operator()() const (this=0x7ffe84600000) at /scratch/local2/klages/dorado/dorado/dorado/3rdparty/cxxpool/src/cxxpool.h:308
#33 0x000000000065b9c0 in std::__invoke_impl<void, cxxpool::thread_pool::push<std::shared_ptr<dorado::Read> (&)(unsigned long, Pod5ReadRecordBatch*, Pod5FileReader*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), unsigned long&, Pod5ReadRecordBatch*&, Pod5FileReader*&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&>(unsigned long, std::shared_ptr<dorado::Read> (&)(unsigned long, Pod5ReadRecordBatch*, Pod5FileReader*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), unsigned long&, Pod5ReadRecordBatch*&, Pod5FileReader*&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)::{lambda()#1}&>(std::__invoke_other, cxxpool::thread_pool::push<std::shared_ptr<dorado::Read> (&)(unsigned long, Pod5ReadRecordBatch*, Pod5FileReader*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), unsigned long&, Pod5ReadRecordBatch*&, Pod5FileReader*&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&>(unsigned long, std::shared_ptr<dorado::Read> (&)(unsigned long, Pod5ReadRecordBatch*, Pod5FileReader*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), unsigned long&, Pod5ReadRecordBatch*&, Pod5FileReader*&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)::{lambda()#1}&) (__f=...) at /usr/include/c++/10.4.0/bits/invoke.h:60
#34 0x000000000065b744 in std::__invoke_r<void, cxxpool::thread_pool::push<std::shared_ptr<dorado::Read> (&)(unsigned long, Pod5ReadRecordBatch*, Pod5FileReader*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), unsigned long&, Pod5ReadRecordBatch*&, Pod5FileReader*&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&>(unsigned long, std::shared_ptr<dorado::Read> (&)(unsigned long, Pod5ReadRecordBatch*, Pod5FileReader*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), unsigned long&, Pod5ReadRecordBatch*&, Pod5FileReader*&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)::{lambda()#1}&>(cxxpool::thread_pool::push<std::shared_ptr<dorado::Read> (&)(unsigned long, Pod5ReadRecordBatch*, Pod5FileReader*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), unsigned long&, Pod5ReadRecordBatch*&, Pod5FileReader*&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&>(unsigned long, std::shared_ptr<dorado::Read> (&)(unsigned long, Pod5ReadRecordBatch*, Pod5FileReader*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), unsigned long&, Pod5ReadRecordBatch*&, Pod5FileReader*&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)::{lambda()#1}&) (__fn=...) at /usr/include/c++/10.4.0/bits/invoke.h:110
#35 0x000000000065b379 in std::_Function_handler<void (), cxxpool::thread_pool::push<std::shared_ptr<dorado::Read> (&)(unsigned long, Pod5ReadRecordBatch*, Pod5FileReader*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), unsigned long&, Pod5ReadRecordBatch*&, Pod5FileReader*&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&>(unsigned long, std::shared_ptr<dorado::Read> (&)(unsigned long, Pod5ReadRecordBatch*, Pod5FileReader*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), unsigned long&, Pod5ReadRecordBatch*&, Pod5FileReader*&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)::{lambda()#1}>::_M_invoke(std::_Any_data const&) (__functor=...) at /usr/include/c++/10.4.0/bits/std_function.h:291
#36 0x00000000005df858 in std::function<void ()>::operator()() const (this=0x7ff543bd7750) at /usr/include/c++/10.4.0/bits/std_function.h:622
#37 0x00000000005deeaa in cxxpool::detail::priority_task::operator() (this=0x7ff543bd7750) at /scratch/local2/klages/dorado/dorado/dorado/3rdparty/cxxpool/src/cxxpool.h:173
#38 0x00000000005df488 in cxxpool::thread_pool::worker (this=0x7fffffffc910) at /scratch/local2/klages/dorado/dorado/dorado/3rdparty/cxxpool/src/cxxpool.h:353
#39 0x00000000005e7eaf in std::__invoke_impl<void, void (cxxpool::thread_pool::*)(), cxxpool::thread_pool*> (__f=@0x7ffe7a778bd0: (void (cxxpool::thread_pool::*)(cxxpool::thread_pool * const)) 0x5df3b0 <cxxpool::thread_pool::worker()>, __t=@0x7ffe7a778bc8: 0x7fffffffc910) at /usr/include/c++/10.4.0/bits/invoke.h:73
#40 0x00000000005e7d14 in std::__invoke<void (cxxpool::thread_pool::*)(), cxxpool::thread_pool*> (__fn=@0x7ffe7a778bd0: (void (cxxpool::thread_pool::*)(cxxpool::thread_pool * const)) 0x5df3b0 <cxxpool::thread_pool::worker()>) at /usr/include/c++/10.4.0/bits/invoke.h:95
#41 0x00000000005e7b4d in std::thread::_Invoker<std::tuple<void (cxxpool::thread_pool::*)(), cxxpool::thread_pool*> >::_M_invoke<0ul, 1ul> (this=0x7ffe7a778bc8) at /usr/include/c++/10.4.0/thread:264
#42 0x00000000005e7a80 in std::thread::_Invoker<std::tuple<void (cxxpool::thread_pool::*)(), cxxpool::thread_pool*> >::operator() (this=0x7ffe7a778bc8) at /usr/include/c++/10.4.0/thread:271
#43 0x00000000005e79a6 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (cxxpool::thread_pool::*)(), cxxpool::thread_pool*> > >::_M_run (this=0x7ffe7a778bc0) at /usr/include/c++/10.4.0/thread:215
#44 0x00007fff82701c10 in std::execute_native_thread_routine (__p=0x7ffe7a778bc0) at /dev/shm/bee-package/gcc/gcc-10.4.0-0/source/libstdc++-v3/src/c++11/thread.cc:80
#45 0x00007fff81ea6fca in start_thread (arg=<optimized out>) at pthread_create.c:442
#46 0x00007fff81f263dc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

sklages commented 1 year ago

dorado ist still showing up on GPU, Type C, no processing activity .. it gets killed when leaving the debugger.

iiSeymour commented 1 year ago

Thanks @sklages that points to pod5 but I'm not sure I believe it.. @jorj1988 any ideas?

I just pushed https://github.com/nanoporetech/dorado/releases/tag/v0.2.2 which upgrades our pod5 dependency.

0x55555555 commented 1 year ago

My theory is its something to do with how threading is working differently on the AMD CPU, however I haven't managed to reproduce internally on our infrastructure.

If you can generate a core file: https://stackoverflow.com/questions/3789550/saving-core-file-in-gdb - similar to how you generated a backtrace before (or if ubuntu is writing one already?).

I can try to debug the core here and work out a more detailed view on where things are going wrong.

It would also be useful to know the exact version of dorado you are running?

Thanks,

George

sklages commented 1 year ago

@jorj1988 - I have uploaded a core file (5.6G) using the link provided by @iiSeymour (https://github.com/nanoporetech/dorado/issues/88#issuecomment-1419278264).

This core has been created by OS using dorado 0.2.1+bd62a19 built with CMAKE_BUILD_TYPE=Debug.

I ran the same binary on the same dataset under gdb and created a core dump using generate-core-file. But this core file is huge (46G) compared to the standard one. gdb shows some warnings:

(gdb) generate-core-file
warning: target file /proc/4525/cmdline contained unexpected null characters
warning: Memory read failed for corefile section, 4096 bytes at 0xffffffffff600000.

I haven't uploaded the huge core file.

0x55555555 commented 1 year ago

Right - ok I'm getting somewhere, if its a custom build of dorado, can you also drop the dorado executable itself in the dropbox above - my local build isnt linking against the core file.

sklages commented 1 year ago

I just uploaded the dorado binary. Thanks for having a look 👍

0x55555555 commented 1 year ago

Hi @sklages,

I have found an issue with the help of your core dump - I'm working on a new build now that should resolve your issue!

Thanks,

George

sklages commented 1 year ago

@jorj1988 - that is great! What is the (probable) cause that dorado fails on (most) of our systems?

0x55555555 commented 1 year ago

The issue is with an our of bounds access for certain decompressions with pod5.

I will look to get a pod5 release done today, and then get it into dorado asap.

George

0x55555555 commented 1 year ago

Hi @sklages,

master now has this change, and 0.2.3 will be tagged shortly with the fix.

Please let me know if it doesn't resolve your issue, and thank you for your help with the debugging!

George

iiSeymour commented 1 year ago

v0.2.3 is now available to test @sklages.

sklages commented 1 year ago

Hi @jorj1988 @iiSeymour - I have made some tests with both a custom built binary and the one you are providing. Both result in random segfaults. Pod5 is up-to-date ..

POD5

this is a dataset freshly created with pod5 in current version 0.1.16

€ pod5 --version
Pod5 version: 0.1.16

and

€ pip show pod5
Name: pod5
Version: 0.1.16
Summary: Oxford Nanopore Technologies Pod5 File Format Python API and Tools
<..>

creating POD5 input


€ pod5 convert \
fast5 $FAST5_DATA \
--output MySample.pod5

€ pod5 inspect summary \ MySample.pod5 \

MySample.pod5.summary.txt


.. but somehow the file is written and tested as version `0.1.15`:

€ cat MySample.pod5.summary.txt
File version in memory 0.1.15, read table version ReadTableVersion.V3.
File version on disk 0.1.15.
Batch 1, 10000 reads
Batch 2, 6000 reads
Found 2 batches, 16000 reads

dorado

I have built dorado on my own (4ed609d) and additionally downloaded the binary build from the github site (4ed609d). Both resulted in ramdom segfaults.

starting the process and failure is almost the same custom <-> ONT-binary
```
€ gdb /project/ontsoft/packages/dorado/v0.2.3-Debug/bin/dorado
```

(gdb) run basecaller /path/to/dorado/v0.2.3-Debug/models/dna_r9.4.1_e8_sup@v3.3 /path/to/data --modified-bases 5mCG --device cuda:all --verbose --batchsize 0 > /dev/null Starting program: /path/to/dorado-0.2.3-linux-x64/bin/dorado basecaller /path/to/dorado/v0.2.3-Debug/models/dna_r9.4.1_e8_sup@v3.3 /path/to/data --modified-bases 5mCG --device cuda:all --verbose --batchsize 0 > /dev/null [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/libthread_db.so.1". [2023-04-07 19:31:21.047] [debug] - matching modification model found: dna_r9.4.1_e8_sup@v3.3_5mCG@v0 [2023-04-07 19:31:21.047] [info] > Creating basecall pipeline [New Thread 0x7fff5ab8f000 (LWP 26071)] <..> [New Thread 0x7ff6e9283000 (LWP 26132)] [==> ] 8% [00m:10s<01m:57s] Thread 43 "dorado" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7ffe64fff000 (LWP 26124)] __memmove_avx_unaligned () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:232 232 ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: No such file or directory. (gdb)


- backtrace from dorado-0.2.3-linux-x64 (github binary)

(gdb) backtrace

0 __memmove_avx_unaligned () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:232

1 0x0000000000553274 in dorado::RemoraEncoder::get_context(unsigned long) const ()

2 0x00000000005400db in dorado::ModBaseCallerNode::runner_worker_thread(unsigned long) ()

3 0x00007ffff7f9a5a0 in execute_native_thread_routine () from /path/to/dorado-0.2.3-linux-x64/bin/../lib/libtorch.so

4 0x00007fff892d1fca in start_thread (arg=) at pthread_create.c:442

5 0x00007fff893513dc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

(gdb)


- backtrace from custom build

(gdb) backtrace

0 __memmove_avx_unaligned () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:232

1 0x00000000004b6995 in std::copy_move<false, true, std::random_access_iterator_tag>::copy_m (first=0x7ff683804fc4, last=0x7ff683805004, __result=0x7ff6d8838340) at /usr/include/c++/10.4.0/bits/stl_algobase.h:426

2 0x00000000005f0cce in std::copy_move_a2<false, int const, int> (first=0x7ff683804fc4, last=0x7ff683805004, result=0x7ff6d8838340) at /usr/include/c++/10.4.0/bits/stl_algobase.h:472

3 0x00000000005f07c4 in std::copy_move_a1<false, int const, int> (first=0x7ff683804fc4, last=0x7ff683805004, result=0x7ff6d8838340) at /usr/include/c++/10.4.0/bits/stl_algobase.h:506

4 0x00000000005f1091 in std::copy_move_a<false, gnu_cxx::__normal_iterator<int const, std::vector<int, std::allocator > >, int> (first=..., last=..., __result=0x7ff6d8838340) at /usr/include/c++/10.4.0/bits/stl_algobase.h:513

5 0x00000000005f0f53 in std::copy<gnu_cxx::__normal_iterator<int const, std::vector<int, std::allocator > >, int> (first=..., last=..., result=0x7ff6d8838340) at /usr/include/c++/10.4.0/bits/stl_algobase.h:569

6 0x00000000005f0d6c in std::uninitialized_copy::__uninit_copy<gnu_cxx::__normal_iterator<int const, std::vector<int, std::allocator > >, int> (first=..., last=..., __result=0x7ff6d8838340) at /usr/include/c++/10.4.0/bits/stl_uninitialized.h:109

7 0x00000000005f098d in std::uninitialized_copy<gnu_cxx::__normal_iterator<int const, std::vector<int, std::allocator > >, int> (first=..., last=..., result=0x7ff6d8838340) at /usr/include/c++/10.4.0/bits/stl_uninitialized.h:150

8 0x00000000005f01e6 in std::uninitialized_copy_a<gnu_cxx::__normal_iterator<int const, std::vector<int, std::allocator > >, int, int> (first=..., last=..., __result=0x7ff6d8838340) at /usr/include/c++/10.4.0/bits/stl_uninitialized.h:325

9 0x00000000005ef8b0 in std::vector<int, std::allocator >::_M_range_initialize<**gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator > > > ( this=0x7ff7327e62b0,** first=..., __last=...) at /usr/include/c++/10.4.0/bits/stl_vector.h:1585

10 0x00000000005ef0a7 in std::vector<int, std::allocator >::vector<**gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator > >, void> (this=0x7ff7327e62b0, first=..., last=...,** a=...) at /usr/include/c++/10.4.0/bits/stl_vector.h:657

11 0x00000000005edea4 in dorado::RemoraEncoder::get_context (this=0x7ff7327e6480, seq_pos=1016) at /scratch/local2/build/dorado/dorado/dorado/modbase/remora_encoder.cpp:97 #12 0x00000000005c3f3a in dorado::ModBaseCallerNode::runner_worker_thread (this=0x7fff4d886180, runner_id=1) at /scratch/local2/build/dorado/dorado/dorado/read_pipeline/ModBaseCallerNode.cpp:191

13 0x00000000005ce2c5 in std::__invoke_impl<void, void (dorado::ModBaseCallerNode::)(unsigned long), dorado::ModBaseCallerNode, unsigned long> (**f= @0x7ffe4bab1408: (void (dorado::ModBaseCallerNode::)(dorado::ModBaseCallerNode const, unsigned long)) 0x5c3924 <dorado::ModBaseCallerNode::runner_worker_thread(unsigned long)>,** t=@0x7ffe4bab1400: 0x7fff4d886180) at /usr/include/c++/10.4.0/bits/invoke.h:73

14 0x00000000005ce133 in std::**invoke<void (dorado::ModBaseCallerNode::)(unsigned long), dorado::ModBaseCallerNode, unsigned long> (**fn=@0x7ffe4bab1408: (void (dorado::ModBaseCallerNode::)(dorado::ModBaseCallerNode const, unsigned long)) 0x5c3924 <dorado::ModBaseCallerNode::runner_worker_thread(unsigned long)>) at /usr/include/c++/10.4.0/bits/invoke.h:95

15 0x00000000005ce01b in std::thread::_Invoker<std::tuple<void (dorado::ModBaseCallerNode::)(unsigned long), dorado::ModBaseCallerNode, unsigned long> >::_M_invoke<0ul, 1ul, 2ul> (this=0x7ffe4bab13f8) at /usr/include/c++/10.4.0/thread:264

16 0x00000000005cdeb8 in std::thread::_Invoker<std::tuple<void (dorado::ModBaseCallerNode::)(unsigned long), dorado::ModBaseCallerNode, unsigned long> >::operator() (this=0x7ffe4bab13f8) at /usr/include/c++/10.4.0/thread:271

17 0x00000000005cddfc in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (dorado::ModBaseCallerNode::)(unsigned long), dorado::ModBaseCallerNode, unsigned long> > >::_M_run (this=0x7ffe4bab13f0) at /usr/include/++/10.4.0/thread:215

18 0x00007fff82701c10 in std::execute_native_thread_routine (__p=0x7ffe4bab13f0) at /dev/shm/bee-package/gcc/gcc-10.4.0-0/source/libstdc++-v3/src/c++11/thread.cc:80

19 0x00007fff81ea6fca in start_thread (arg=) at pthread_create.c:442

20 0x00007fff81f263dc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81



Any thoughts? There is probably something wrong with how the the POD5 file is written here in my environment? I'd expect version `1.16.0` to be written ..

sklages commented 1 year ago

Well, just a side note .. I wanted to run pod5 convert on a larger dataset (PROM, 915G) on a server with 384G RAM and it fails with a "Out of memory" error? I wouldn't expect a conversion tool to be so memory demanding .. no other user or any other memory hungry process there ..

An unexpected error occurred: Out of memory: malloc of size 204800000 failed
Converting 1883 Fast5s:  52%|#########################   | 3932963/7527463 [34:21<2:42:53, 367.78Reads/s]
<..>
RuntimeError: Out of memory: malloc of size 204800000 failed

Maybe this issue is somehow related to the dorado issue?

€ pod5 --version
Pod5 version: 0.1.16

€ pip show pod5
Name: pod5
Version: 0.1.16
Summary: Oxford Nanopore Technologies Pod5 File Format Python API and Tools
Home-page:
Author:
Author-email: "Oxford Nanopore Technologies, Limited" <support@nanoporetech.com>
License:
Location: /path/to/lib/python3.10/site-packages
Requires: h5py, iso8601, jsonschema, lib-pod5, more-itertools, numpy, packaging, pandas, pyarrow, pytz, tqdm, vbz-h5py-plugin
Required-by:

0x55555555 commented 1 year ago

Hello,

Seems like you have a couple of issues here - one the crash in dorado::RemoraEncoder::get_context, the other is to do with pod5 - can you report that issue on the pod5 project and I can investigate there?

You're right - it shouldnt happen when converting, it should not use much memory at all.

There is probably something wrong with how the the POD5 file is written here in my environment? I'd expect version 1.16.0 to be written ..

I think this is a misdirect - its odd - and I'll investigate, but there isn't a difference in binary format between 0.1.15 and 0.1.16, so shouldn't cause an issue.

Thanks,

George

0x55555555 commented 1 year ago

Hi @sklages,

Are you able to provide the pod5 file youre using to test now? we can run it internally and see what we can do.

Thanks,

George

HalfPhoton commented 1 year ago

Hi @sklages,

Could you please report the lib_pod5 version in your environment? If the lib_pod5 version is not 0.1.17 it is updatable with pip install -U lib_pod5==0.1.17.

The reported file version 0.1.15 might be due to this. The data on disk should be no different as @jorj1988 said above.

Kind regards, Rich

sklages commented 1 year ago

Hi @sklages,

Are you able to provide the pod5 file youre using to test now? we can run it internally and see what we can do.

Hi George, I have just uploaded the POD5 file, 16K seqs, MySample.pod5

sklages commented 1 year ago

@HalfPhoton - indeed, this is version 0.1.15 here.

pip show lib_pod5
Name: lib-pod5
Version: 0.1.15
Summary: Python bindings for the POD5 file format
<..>

But I am just able to update to 0.1.16,

ERROR: Could not find a version that satisfies the requirement lib_pod5==0.1.17 (from versions: 0.0.43, 0.1.0, 0.1.4, 0.1.5, 0.1.10, 0.1.11, 0.1.12, 0.1.13, 0.1.15, 0.1.16)
ERROR: No matching distribution found for lib_pod5==0.1.17

.. but that is probably correct, as current pod5 release is 0.1.16?

HalfPhoton commented 1 year ago

Hi @sklages,

Ah yes, sorry just a typo on my part there.

That should resolve the reported file version issue you were seeing.

Thanks for checking this for us. We're adding checks on our end to ensure that this doesn't happen again.

Kind regards, Rich

iiSeymour commented 1 year ago

@sklages thanks for provided the backtraces @malton-ont fixed the out of bounds in the RemoraEncoder::get_context - can you try 92ef398874e9f4d09c7a57e5d979d4e704a12a74?

sklages commented 1 year ago

@iiSeymour @malton-ont - thanks for the fix, looks good. At least my small dataset finished successfully in 20 of 20 cases! So I am optimistic :-)

I will need to test on some larger datasets though .. when pod5 is fixed.

sklages commented 1 year ago

@iiSeymour @jorj1988 - I ran dorado on a 962G pod5 file (~17m reads) I had from some previous tests, and it finished successfully on one of our A100 (40G) (with AMD EPYC CPUs) after appr 37h. 👍

Just two things I am curious about:

the fix in RemoraEncoder::get_context doesn't seem to be CPU-specific, at least for me as a layman in C++. Any idea why we got so many segfaults on AMD systems but not on Intel systems?
with the above dataset (pod5: 963G, fast5: 4347 files) what do you think what is fastest or most efficient? What are your experiences? Running dorado on:
- one single (huge) POD5 - GPU-mode on a single card
- per file POD5 (1 fast5 -> 1 pod5), GPU-mode, given we have a handful of A100 cards
- per file POD5 (1 fast5 -> 1 pod5), CPU-mode, given we have a few thousand CPUs available

0x55555555 commented 1 year ago

HI @sklages,

I think itll be a hard thing to talk about without some benchmarking. I'd expect a smaller test set running in both environments to scale up to a larger set.

My bet would ber GPU mode though.

Thanks,

George

sklages commented 1 year ago

@jorj1988 @iiSeymour - thanks.

For now I think we can close this issue as the problem seems to be solved. In the case of coredumps occurring with the same error I'd reopen the issue..

nanoporetech / dorado

dorado basecaller -> Reads processed: 24900Segmentation fault (core dumped) #88

modified base model, A100, increasing batchsize

standard model, A100, increasing batchsize

System

AMD EPYC 7F32/AMD EPYC 7343 + NVIDIA A100-PCIE-40GB

Intel Xeon Gold 6242 + NVIDIA GeForce RTX 2080 Ti

addendum (workstations/server without GPU):

no issues (cpu-only mode)

randomly coredumping (cpu-only mode)

POD5

dorado

0 __memmove_avx_unaligned () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:232

1 0x0000000000553274 in dorado::RemoraEncoder::get_context(unsigned long) const ()

2 0x00000000005400db in dorado::ModBaseCallerNode::runner_worker_thread(unsigned long) ()

3 0x00007ffff7f9a5a0 in execute_native_thread_routine () from /path/to/dorado-0.2.3-linux-x64/bin/../lib/libtorch.so

4 0x00007fff892d1fca in start_thread (arg=) at pthread_create.c:442

5 0x00007fff893513dc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

0 __memmove_avx_unaligned () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:232

1 0x00000000004b6995 in std::copy_move<false, true, std::random_access_iterator_tag>::copy_m (first=0x7ff683804fc4, last=0x7ff683805004, __result=0x7ff6d8838340) at /usr/include/c++/10.4.0/bits/stl_algobase.h:426

2 0x00000000005f0cce in std::copy_move_a2<false, int const, int> (first=0x7ff683804fc4, last=0x7ff683805004, result=0x7ff6d8838340) at /usr/include/c++/10.4.0/bits/stl_algobase.h:472

3 0x00000000005f07c4 in std::copy_move_a1<false, int const, int> (first=0x7ff683804fc4, last=0x7ff683805004, result=0x7ff6d8838340) at /usr/include/c++/10.4.0/bits/stl_algobase.h:506

4 0x00000000005f1091 in std::copy_move_a<false, gnu_cxx::__normal_iterator<int const, std::vector<int, std::allocator > >, int> (first=..., last=..., __result=0x7ff6d8838340) at /usr/include/c++/10.4.0/bits/stl_algobase.h:513

5 0x00000000005f0f53 in std::copy<gnu_cxx::__normal_iterator<int const, std::vector<int, std::allocator > >, int> (first=..., last=..., result=0x7ff6d8838340) at /usr/include/c++/10.4.0/bits/stl_algobase.h:569

6 0x00000000005f0d6c in std::uninitialized_copy::__uninit_copy<gnu_cxx::__normal_iterator<int const, std::vector<int, std::allocator > >, int> (first=..., last=..., __result=0x7ff6d8838340) at /usr/include/c++/10.4.0/bits/stl_uninitialized.h:109

7 0x00000000005f098d in std::uninitialized_copy<gnu_cxx::__normal_iterator<int const, std::vector<int, std::allocator > >, int> (first=..., last=..., result=0x7ff6d8838340) at /usr/include/c++/10.4.0/bits/stl_uninitialized.h:150

8 0x00000000005f01e6 in std::uninitialized_copy_a<gnu_cxx::__normal_iterator<int const, std::vector<int, std::allocator > >, int, int> (first=..., last=..., __result=0x7ff6d8838340) at /usr/include/c++/10.4.0/bits/stl_uninitialized.h:325

9 0x00000000005ef8b0 in std::vector<int, std::allocator >::_M_range_initialize<**gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator > > > ( this=0x7ff7327e62b0,** first=..., __last=...) at /usr/include/c++/10.4.0/bits/stl_vector.h:1585

10 0x00000000005ef0a7 in std::vector<int, std::allocator >::vector<**gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator > >, void> (this=0x7ff7327e62b0, first=..., last=...,** a=...) at /usr/include/c++/10.4.0/bits/stl_vector.h:657

15 0x00000000005ce01b in std::thread::_Invoker<std::tuple<void (dorado::ModBaseCallerNode::)(unsigned long), dorado::ModBaseCallerNode, unsigned long> >::_M_invoke<0ul, 1ul, 2ul> (this=0x7ffe4bab13f8) at /usr/include/c++/10.4.0/thread:264

16 0x00000000005cdeb8 in std::thread::_Invoker<std::tuple<void (dorado::ModBaseCallerNode::)(unsigned long), dorado::ModBaseCallerNode, unsigned long> >::operator() (this=0x7ffe4bab13f8) at /usr/include/c++/10.4.0/thread:271

17 0x00000000005cddfc in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (dorado::ModBaseCallerNode::)(unsigned long), dorado::ModBaseCallerNode, unsigned long> > >::_M_run (this=0x7ffe4bab13f0) at /usr/include/++/10.4.0/thread:215

18 0x00007fff82701c10 in std::execute_native_thread_routine (__p=0x7ffe4bab13f0) at /dev/shm/bee-package/gcc/gcc-10.4.0-0/source/libstdc++-v3/src/c++11/thread.cc:80

19 0x00007fff81ea6fca in start_thread (arg=) at pthread_create.c:442

20 0x00007fff81f263dc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

nanoporetech / dorado

dorado basecaller -> Reads processed: 24900Segmentation fault (core dumped) #88

modified base model, A100, increasing batchsize

standard model, A100, increasing batchsize

System

AMD EPYC 7F32/AMD EPYC 7343 + NVIDIA A100-PCIE-40GB

Intel Xeon Gold 6242 + NVIDIA GeForce RTX 2080 Ti

addendum (workstations/server without GPU):

no issues (cpu-only mode)

randomly coredumping (cpu-only mode)

POD5

dorado

0 __memmove_avx_unaligned () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:232

1 0x0000000000553274 in dorado::RemoraEncoder::get_context(unsigned long) const ()

2 0x00000000005400db in dorado::ModBaseCallerNode::runner_worker_thread(unsigned long) ()

3 0x00007ffff7f9a5a0 in execute_native_thread_routine () from /path/to/dorado-0.2.3-linux-x64/bin/../lib/libtorch.so

4 0x00007fff892d1fca in start_thread (arg=) at pthread_create.c:442

5 0x00007fff893513dc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

0 __memmove_avx_unaligned () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:232

1 0x00000000004b6995 in std::copy_move<false, true, std::random_access_iterator_tag>::copy_m (first=0x7ff683804fc4, last=0x7ff683805004, __result=0x7ff6d8838340) at /usr/include/c++/10.4.0/bits/stl_algobase.h:426

2 0x00000000005f0cce in std::copy_move_a2<false, int const, int> (first=0x7ff683804fc4, last=0x7ff683805004, result=0x7ff6d8838340) at /usr/include/c++/10.4.0/bits/stl_algobase.h:472

3 0x00000000005f07c4 in std::copy_move_a1<false, int const, int> (first=0x7ff683804fc4, last=0x7ff683805004, result=0x7ff6d8838340) at /usr/include/c++/10.4.0/bits/stl_algobase.h:506

4 0x00000000005f1091 in std::copy_move_a<false, gnu_cxx::__normal_iterator<int const, std::vector<int, std::allocator > >, int> (first=..., last=..., __result=0x7ff6d8838340) at /usr/include/c++/10.4.0/bits/stl_algobase.h:513

5 0x00000000005f0f53 in std::copy<gnu_cxx::__normal_iterator<int const, std::vector<int, std::allocator > >, int> (first=..., last=..., result=0x7ff6d8838340) at /usr/include/c++/10.4.0/bits/stl_algobase.h:569

6 0x00000000005f0d6c in std::uninitialized_copy::__uninit_copy<gnu_cxx::__normal_iterator<int const, std::vector<int, std::allocator > >, int> (first=..., last=..., __result=0x7ff6d8838340) at /usr/include/c++/10.4.0/bits/stl_uninitialized.h:109

7 0x00000000005f098d in std::uninitialized_copy<gnu_cxx::__normal_iterator<int const, std::vector<int, std::allocator > >, int> (first=..., last=..., result=0x7ff6d8838340) at /usr/include/c++/10.4.0/bits/stl_uninitialized.h:150

8 0x00000000005f01e6 in std::uninitialized_copy_a<gnu_cxx::__normal_iterator<int const, std::vector<int, std::allocator > >, int, int> (first=..., last=..., __result=0x7ff6d8838340) at /usr/include/c++/10.4.0/bits/stl_uninitialized.h:325

9 0x00000000005ef8b0 in std::vector<int, std::allocator >::_M_range_initialize<gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator > > > ( this=0x7ff7327e62b0, first=..., __last=...) at /usr/include/c++/10.4.0/bits/stl_vector.h:1585

10 0x00000000005ef0a7 in std::vector<int, std::allocator >::vector<gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator > >, void> (this=0x7ff7327e62b0, first=..., last=..., a=...) at /usr/include/c++/10.4.0/bits/stl_vector.h:657

15 0x00000000005ce01b in std::thread::_Invoker<std::tuple<void (dorado::ModBaseCallerNode::)(unsigned long), dorado::ModBaseCallerNode, unsigned long> >::_M_invoke<0ul, 1ul, 2ul> (this=0x7ffe4bab13f8) at /usr/include/c++/10.4.0/thread:264

16 0x00000000005cdeb8 in std::thread::_Invoker<std::tuple<void (dorado::ModBaseCallerNode::)(unsigned long), dorado::ModBaseCallerNode, unsigned long> >::operator() (this=0x7ffe4bab13f8) at /usr/include/c++/10.4.0/thread:271

17 0x00000000005cddfc in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (dorado::ModBaseCallerNode::)(unsigned long), dorado::ModBaseCallerNode, unsigned long> > >::_M_run (this=0x7ffe4bab13f0) at /usr/include/++/10.4.0/thread:215

18 0x00007fff82701c10 in std::execute_native_thread_routine (__p=0x7ffe4bab13f0) at /dev/shm/bee-package/gcc/gcc-10.4.0-0/source/libstdc++-v3/src/c++11/thread.cc:80

19 0x00007fff81ea6fca in start_thread (arg=) at pthread_create.c:442

20 0x00007fff81f263dc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

9 0x00000000005ef8b0 in std::vector<int, std::allocator >::_M_range_initialize<**gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator > > > ( this=0x7ff7327e62b0,** first=..., __last=...) at /usr/include/c++/10.4.0/bits/stl_vector.h:1585

10 0x00000000005ef0a7 in std::vector<int, std::allocator >::vector<**gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator > >, void> (this=0x7ff7327e62b0, first=..., last=...,** a=...) at /usr/include/c++/10.4.0/bits/stl_vector.h:657