nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
495 stars 59 forks source link

Dorado Tests Failed After Successful Compile #763

Closed pbpearman closed 4 months ago

pbpearman commented 5 months ago

Issue Report

Please describe the issue:

I compiled Dorado from source and it finished without errors. I then proceed to tests and they both fail.

sampajanna:~/Downloads/dorado$ ctest --test-dir cmake-build Internal ctest changing into directory: /home/peter/Downloads/dorado/cmake-build Test project /home/peter/Downloads/dorado/cmake-build Start 1: dorado_tests 1/2 Test #1: dorado_tests ..................... Passed 1.45 sec Start 2: dorado_smoke_tests 2/2 Test #2: dorado_smoke_tests ...............***Failed 9.98 sec

50% tests passed, 1 tests failed out of 2

Dorado runs from the command line in its temporary directory and produces valid output for --version and --help

In the LastTest.log file, it appears that only 7 of 8 tests in Test 1 actually passed. Of the assertions (Test 2?), 2 of 11322 failed. It would be great if you could give me some suggestions on what I should try next. Thanks in advance.

What I tried:

I tried killing the dorado-basecall server from previous runds; i rebooted the machine.

Repeatability

It fails the same way every time.

Run environment:

Logs

Here is the stdout from produced with > ctest --test-dir cmake-build --output-on-failure

Internal ctest changing into directory: /home/peter/Downloads/dorado/cmake-build Test project /home/peter/Downloads/dorado/cmake-build Start 1: dorado_tests 1/2 Test #1: dorado_tests ..................... Passed 1.39 sec Start 2: dorado_smoke_tests 2/2 Test #2: dorado_smoke_tests ...............***Failed 9.68 sec [2024-04-21 18:55:30.814] [info] - downloading dna_r10.4.1_e8.2_400bps_fast@v4.2.0 with httplib [2024-04-21 18:55:31.163] [info] cuda:0 using chunk size 9996, batch size 128 [2024-04-21 18:55:31.217] [info] cuda:0 using chunk size 4998, batch size 128 [2024-04-21 18:55:31.282] [info] - downloading rna004_130bps_fast@v3.0.1 with httplib [2024-04-21 18:55:31.498] [info] cuda:0 using chunk size 10000, batch size 128 [2024-04-21 18:55:31.501] [info] cuda:0 using chunk size 5000, batch size 128 [2024-04-21 18:55:31.579] [info] - downloading dna_r10.4.1_e8.2_400bps_fast@v4.2.0 with httplib [2024-04-21 18:55:31.813] [info] cuda:0 using chunk size 9996, batch size 128 [2024-04-21 18:55:31.818] [info] cuda:0 using chunk size 4998, batch size 128 [2024-04-21 18:55:31.886] [info] - downloading rna004_130bps_fast@v3.0.1 with httplib [2024-04-21 18:55:32.110] [info] cuda:0 using chunk size 10000, batch size 128 [2024-04-21 18:55:32.113] [info] cuda:0 using chunk size 5000, batch size 128 [2024-04-21 18:55:32.189] [info] - downloading dna_r10.4.1_e8.2_400bps_fast@v4.2.0 with httplib [2024-04-21 18:55:32.657] [info] - downloading rna004_130bps_fast@v3.0.1 with httplib [2024-04-21 18:55:33.130] [info] - downloading dna_r10.4.1_e8.2_400bps_fast@v4.2.0 with httplib [2024-04-21 18:55:33.561] [info] - downloading rna004_130bps_fast@v3.0.1 with httplib [2024-04-21 18:55:34.039] [info] - downloading dna_r10.4.1_e8.2_400bps_fast@v4.2.0_5mCG_5hmCG@v2 with httplib [2024-04-21 18:55:34.273] [info] - downloading dna_r10.4.1_e8.2_400bps_sup@v4.2.0_6mA@v3 with httplib [2024-04-21 18:55:34.804] [info] - downloading dna_r10.4.1_e8.2_400bps_fast@v4.2.0 with httplib


dorado_smoke_tests is a Catch v2.13.8 host application.
Run with -? for options

-------------------------------------------------------------------------------
SmokeTest: ModBaseCallerNode
-------------------------------------------------------------------------------
/home/peter/Downloads/dorado/tests/NodeSmokeTest.cpp:233
...............................................................................

/home/peter/Downloads/dorado/tests/NodeSmokeTest.cpp:236: FAILED:
  {Unknown expression after the reported line}
due to unexpected exception with messages:
  gpu := true
  pipeline_restart := false
  cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
  Exception raised from _cudnn_rnn at /pytorch/pynew/aten/src/ATen/native/
  cudnn/RNN.cpp:1102 (most recent call first):
  frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string
  <char, std::char_traits<char>, std::allocator<char> >) + 0x6b (0x7f89f0d2029b
  in /home/peter/Downloads/dorado/cmake-build/libdorado_torch_lib.so)
  frame #1: at::native::_cudnn_rnn(at::Tensor const&, c10::ArrayRef<at::Tensor>
  , long, c10::optional<at::Tensor> const&, at::Tensor const&, c10::optional
  <at::Tensor> const&, long, long, long, long, bool, double, bool, bool, c10::
  ArrayRef<long>, c10::optional<at::Tensor> const&) + 0x1581 (0x7f89eeddd2e1 in
  /home/peter/Downloads/dorado/cmake-build/libdorado_torch_lib.so)
  frame #2: <unknown function> + 0x9f7f6b5 (0x7f89f09086b5 in /home/peter/
  Downloads/dorado/cmake-build/libdorado_torch_lib.so)
  frame #3: <unknown function> + 0x9f9e907 (0x7f89f0927907 in /home/peter/
  Downloads/dorado/cmake-build/libdorado_torch_lib.so)
  frame #4: <unknown function> + 0x4bf8d2a (0x7f89eb581d2a in /home/peter/
  Downloads/dorado/cmake-build/libdorado_torch_lib.so)
  frame #5: at::_ops::_cudnn_rnn::call(at::Tensor const&, c10::ArrayRef<at::
  Tensor>, long, c10::optional<at::Tensor> const&, at::Tensor const&, c10::
  optional<at::Tensor> const&, long, c10::SymInt, c10::SymInt, long, bool,
  double, bool, bool, c10::ArrayRef<c10::SymInt>, c10::optional<at::Tensor>
  const&) + 0x3ca (0x7f89eb4ec86a in /home/peter/Downloads/dorado/cmake-build/
  libdorado_torch_lib.so)
  frame #6: <unknown function> + 0x845962f (0x7f89eede262f in /home/peter/
  Downloads/dorado/cmake-build/libdorado_torch_lib.so)
  frame #7: <unknown function> + 0x84522c5 (0x7f89eeddb2c5 in /home/peter/
  Downloads/dorado/cmake-build/libdorado_torch_lib.so)
  frame #8: <unknown function> + 0x8452c60 (0x7f89eeddbc60 in /home/peter/
  Downloads/dorado/cmake-build/libdorado_torch_lib.so)
  frame #9: at::native::lstm(at::Tensor const&, c10::ArrayRef<at::Tensor>, c10:
  :ArrayRef<at::Tensor>, bool, long, double, bool, bool, bool) + 0x333
  (0x7f89eaf00073 in /home/peter/Downloads/dorado/cmake-build/
  libdorado_torch_lib.so)
  frame #10: <unknown function> + 0x554662c (0x7f89ebecf62c in /home/peter/
  Downloads/dorado/cmake-build/libdorado_torch_lib.so)
  frame #11: at::_ops::lstm_input::call(at::Tensor const&, c10::ArrayRef<at::
  Tensor>, c10::ArrayRef<at::Tensor>, bool, long, double, bool, bool, bool) +
  0x276 (0x7f89eb6c1676 in /home/peter/Downloads/dorado/cmake-build/
  libdorado_torch_lib.so)
  frame #12: torch::nn::LSTMImpl::forward_helper(at::Tensor const&, at::Tensor
  const&, at::Tensor const&, long, c10::optional<std::tuple<at::Tensor, at::
  Tensor> >) + 0x6a3 (0x7f89ee558ef3 in /home/peter/Downloads/dorado/cmake-
  build/libdorado_torch_lib.so)
  frame #13: torch::nn::LSTMImpl::forward(at::Tensor const&, c10::optional<std:
  :tuple<at::Tensor, at::Tensor> >) + 0xc6 (0x7f89ee5590e6 in /home/peter/
  Downloads/dorado/cmake-build/libdorado_torch_lib.so)
  frame #14: <unknown function> + 0x10207c6 (0x562157c687c6 in /home/peter/
  Downloads/dorado/cmake-build/tests/dorado_smoke_tests)
  frame #15: <unknown function> + 0x1026786 (0x562157c6e786 in /home/peter/
  Downloads/dorado/cmake-build/tests/dorado_smoke_tests)
  frame #16: <unknown function> + 0x101144a (0x562157c5944a in /home/peter/
  Downloads/dorado/cmake-build/tests/dorado_smoke_tests)
  frame #17: <unknown function> + 0x10115ef (0x562157c595ef in /home/peter/
  Downloads/dorado/cmake-build/tests/dorado_smoke_tests)
  frame #18: <unknown function> + 0x100f2a7 (0x562157c572a7 in /home/peter/
  Downloads/dorado/cmake-build/tests/dorado_smoke_tests)
  frame #19: <unknown function> + 0x100f975 (0x562157c57975 in /home/peter/
  Downloads/dorado/cmake-build/tests/dorado_smoke_tests)
  frame #20: <unknown function> + 0xf84690 (0x562157bcc690 in /home/peter/
  Downloads/dorado/cmake-build/tests/dorado_smoke_tests)
  frame #21: <unknown function> + 0xeb2675 (0x562157afa675 in /home/peter/
  Downloads/dorado/cmake-build/tests/dorado_smoke_tests)
  frame #22: <unknown function> + 0xe38512 (0x562157a80512 in /home/peter/
  Downloads/dorado/cmake-build/tests/dorado_smoke_tests)
  frame #23: <unknown function> + 0xe30a61 (0x562157a78a61 in /home/peter/
  Downloads/dorado/cmake-build/tests/dorado_smoke_tests)
  frame #24: <unknown function> + 0xe4ee4a (0x562157a96e4a in /home/peter/
  Downloads/dorado/cmake-build/tests/dorado_smoke_tests)
  frame #25: <unknown function> + 0xe6cd08 (0x562157ab4d08 in /home/peter/
  Downloads/dorado/cmake-build/tests/dorado_smoke_tests)
  frame #26: <unknown function> + 0xe7346c (0x562157abb46c in /home/peter/
  Downloads/dorado/cmake-build/tests/dorado_smoke_tests)
  frame #27: <unknown function> + 0xe79a1a (0x562157ac1a1a in /home/peter/
  Downloads/dorado/cmake-build/tests/dorado_smoke_tests)
  frame #28: <unknown function> + 0xe7a04f (0x562157ac204f in /home/peter/
  Downloads/dorado/cmake-build/tests/dorado_smoke_tests)
  frame #29: <unknown function> + 0x9146ef (0x56215755c6ef in /home/peter/
  Downloads/dorado/cmake-build/tests/dorado_smoke_tests)
  frame #30: __libc_start_main + 0xf3 (0x7f89e5f36083 in /lib/x86_64-linux-gnu/
  libc.so.6)
  frame #31: <unknown function> + 0xe303ee (0x562157a783ee in /home/peter/
  Downloads/dorado/cmake-build/tests/dorado_smoke_tests)

[2024-04-21 18:55:35.078] [info]  - downloading dna_r10.4.1_e8.2_400bps_fast@v4.2.0_5mCG_5hmCG@v2 with httplib
[2024-04-21 18:55:35.308] [info]  - downloading dna_r10.4.1_e8.2_400bps_sup@v4.2.0_6mA@v3 with httplib
[2024-04-21 18:55:35.861] [info]  - downloading dna_r10.4.1_e8.2_400bps_fast@v4.2.0 with httplib
-------------------------------------------------------------------------------
SmokeTest: ModBaseCallerNode
-------------------------------------------------------------------------------
/home/peter/Downloads/dorado/tests/NodeSmokeTest.cpp:233
...............................................................................

/home/peter/Downloads/dorado/tests/NodeSmokeTest.cpp:236: FAILED:
  {Unknown expression after the reported line}
due to unexpected exception with messages:
  gpu := true
  pipeline_restart := true
  cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
  Exception raised from _cudnn_rnn at /pytorch/pynew/aten/src/ATen/native/
  cudnn/RNN.cpp:1102 (most recent call first):
  frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string
  <char, std::char_traits<char>, std::allocator<char> >) + 0x6b (0x7f89f0d2029b
  in /home/peter/Downloads/dorado/cmake-build/libdorado_torch_lib.so)
  frame #1: at::native::_cudnn_rnn(at::Tensor const&, c10::ArrayRef<at::Tensor>
  , long, c10::optional<at::Tensor> const&, at::Tensor const&, c10::optional
  <at::Tensor> const&, long, long, long, long, bool, double, bool, bool, c10::
  ArrayRef<long>, c10::optional<at::Tensor> const&) + 0x1581 (0x7f89eeddd2e1 in
  /home/peter/Downloads/dorado/cmake-build/libdorado_torch_lib.so)
  frame #2: <unknown function> + 0x9f7f6b5 (0x7f89f09086b5 in /home/peter/
  Downloads/dorado/cmake-build/libdorado_torch_lib.so)
  frame #3: <unknown function> + 0x9f9e907 (0x7f89f0927907 in /home/peter/
  Downloads/dorado/cmake-build/libdorado_torch_lib.so)
  frame #4: <unknown function> + 0x4bf8d2a (0x7f89eb581d2a in /home/peter/
  Downloads/dorado/cmake-build/libdorado_torch_lib.so)
  frame #5: at::_ops::_cudnn_rnn::call(at::Tensor const&, c10::ArrayRef<at::
  Tensor>, long, c10::optional<at::Tensor> const&, at::Tensor const&, c10::
  optional<at::Tensor> const&, long, c10::SymInt, c10::SymInt, long, bool,
  double, bool, bool, c10::ArrayRef<c10::SymInt>, c10::optional<at::Tensor>
  const&) + 0x3ca (0x7f89eb4ec86a in /home/peter/Downloads/dorado/cmake-build/
  libdorado_torch_lib.so)
  frame #6: <unknown function> + 0x845962f (0x7f89eede262f in /home/peter/
  Downloads/dorado/cmake-build/libdorado_torch_lib.so)
  frame #7: <unknown function> + 0x84522c5 (0x7f89eeddb2c5 in /home/peter/
  Downloads/dorado/cmake-build/libdorado_torch_lib.so)
  frame #8: <unknown function> + 0x8452c60 (0x7f89eeddbc60 in /home/peter/
  Downloads/dorado/cmake-build/libdorado_torch_lib.so)
  frame #9: at::native::lstm(at::Tensor const&, c10::ArrayRef<at::Tensor>, c10:
  :ArrayRef<at::Tensor>, bool, long, double, bool, bool, bool) + 0x333
  (0x7f89eaf00073 in /home/peter/Downloads/dorado/cmake-build/
  libdorado_torch_lib.so)
  frame #10: <unknown function> + 0x554662c (0x7f89ebecf62c in /home/peter/
  Downloads/dorado/cmake-build/libdorado_torch_lib.so)
  frame #11: at::_ops::lstm_input::call(at::Tensor const&, c10::ArrayRef<at::
  Tensor>, c10::ArrayRef<at::Tensor>, bool, long, double, bool, bool, bool) +
  0x276 (0x7f89eb6c1676 in /home/peter/Downloads/dorado/cmake-build/
  libdorado_torch_lib.so)
  frame #12: torch::nn::LSTMImpl::forward_helper(at::Tensor const&, at::Tensor
  const&, at::Tensor const&, long, c10::optional<std::tuple<at::Tensor, at::
  Tensor> >) + 0x6a3 (0x7f89ee558ef3 in /home/peter/Downloads/dorado/cmake-
  build/libdorado_torch_lib.so)
  frame #13: torch::nn::LSTMImpl::forward(at::Tensor const&, c10::optional<std:
  :tuple<at::Tensor, at::Tensor> >) + 0xc6 (0x7f89ee5590e6 in /home/peter/
  Downloads/dorado/cmake-build/libdorado_torch_lib.so)
  frame #14: <unknown function> + 0x10207c6 (0x562157c687c6 in /home/peter/
  Downloads/dorado/cmake-build/tests/dorado_smoke_tests)
  frame #15: <unknown function> + 0x1026786 (0x562157c6e786 in /home/peter/
  Downloads/dorado/cmake-build/tests/dorado_smoke_tests)
  frame #16: <unknown function> + 0x101144a (0x562157c5944a in /home/peter/
  Downloads/dorado/cmake-build/tests/dorado_smoke_tests)
  frame #17: <unknown function> + 0x10115ef (0x562157c595ef in /home/peter/
  Downloads/dorado/cmake-build/tests/dorado_smoke_tests)
  frame #18: <unknown function> + 0x100f2a7 (0x562157c572a7 in /home/peter/
  Downloads/dorado/cmake-build/tests/dorado_smoke_tests)
  frame #19: <unknown function> + 0x100f975 (0x562157c57975 in /home/peter/
  Downloads/dorado/cmake-build/tests/dorado_smoke_tests)
  frame #20: <unknown function> + 0xf84690 (0x562157bcc690 in /home/peter/
  Downloads/dorado/cmake-build/tests/dorado_smoke_tests)
  frame #21: <unknown function> + 0xeb2675 (0x562157afa675 in /home/peter/
  Downloads/dorado/cmake-build/tests/dorado_smoke_tests)
  frame #22: <unknown function> + 0xe38512 (0x562157a80512 in /home/peter/
  Downloads/dorado/cmake-build/tests/dorado_smoke_tests)
  frame #23: <unknown function> + 0xe30a61 (0x562157a78a61 in /home/peter/
  Downloads/dorado/cmake-build/tests/dorado_smoke_tests)
  frame #24: <unknown function> + 0xe4ee4a (0x562157a96e4a in /home/peter/
  Downloads/dorado/cmake-build/tests/dorado_smoke_tests)
  frame #25: <unknown function> + 0xe6cd08 (0x562157ab4d08 in /home/peter/
  Downloads/dorado/cmake-build/tests/dorado_smoke_tests)
  frame #26: <unknown function> + 0xe7346c (0x562157abb46c in /home/peter/
  Downloads/dorado/cmake-build/tests/dorado_smoke_tests)
  frame #27: <unknown function> + 0xe79a1a (0x562157ac1a1a in /home/peter/
  Downloads/dorado/cmake-build/tests/dorado_smoke_tests)
  frame #28: <unknown function> + 0xe7a04f (0x562157ac204f in /home/peter/
  Downloads/dorado/cmake-build/tests/dorado_smoke_tests)
  frame #29: <unknown function> + 0x9146ef (0x56215755c6ef in /home/peter/
  Downloads/dorado/cmake-build/tests/dorado_smoke_tests)
  frame #30: __libc_start_main + 0xf3 (0x7f89e5f36083 in /lib/x86_64-linux-gnu/
  libc.so.6)
  frame #31: <unknown function> + 0xe303ee (0x562157a783ee in /home/peter/
  Downloads/dorado/cmake-build/tests/dorado_smoke_tests)

[2024-04-21 18:55:36.095] [info]  - downloading dna_r10.4.1_e8.2_400bps_fast@v4.2.0_5mCG_5hmCG@v2 with httplib
[2024-04-21 18:55:36.333] [info]  - downloading dna_r10.4.1_e8.2_400bps_sup@v4.2.0_6mA@v3 with httplib
[2024-04-21 18:55:36.869] [info]  - downloading dna_r10.4.1_e8.2_400bps_fast@v4.2.0 with httplib
[2024-04-21 18:55:38.132] [info]  - downloading dna_r10.4.1_e8.2_400bps_fast@v4.2.0_5mCG_5hmCG@v2 with httplib
[2024-04-21 18:55:38.373] [info]  - downloading dna_r10.4.1_e8.2_400bps_sup@v4.2.0_6mA@v3 with httplib
[2024-04-21 18:55:38.903] [info]  - downloading dna_r10.4.1_e8.2_400bps_fast@v4.2.0 with httplib
===============================================================================
test cases:     8 |     7 passed | 1 failed
assertions: 11322 | 11320 passed | 2 failed

[LastTest.log](https://github.com/nanoporetech/dorado/files/15052761/LastTest.log)
tijyojwad commented 5 months ago

Hi @pbpearman - are you able to use one of the prebuilt binaries? that would be the recommended way.

Can you also share what cuda toolkit and Nvidia driver version you have on your system?

pbpearman commented 5 months ago

Sure. Here you go: $ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2024 NVIDIA Corporation Built on Thu_Mar_28_02:18:24_PDT_2024 Cuda compilation tools, release 12.4, V12.4.131 Build cuda_12.4.r12.4/compiler.34097967_0

and... the first line from nvidia-smi NVIDIA-SMI 535.161.08 Driver Version: 535.161.08 CUDA Version: 12.2

Could you please direct me to the location of the binaries? For AMD64 Linux? TiAdv. @pbpearman BTW...While I was waiting around, I decided to give the compiled version a test drive with the duplex option and sup model, as in dorado duplex sup ./pod5 > ./duplex.bam. It's chugging right along, producing output.

pbpearman commented 5 months ago

@tijyojwad If there are pre-built binaries, it is a pretty well-kept secret. I see no mention of them in the existing documentation and my Googl-ing turns up nothing. The only way I see to install the command line version is from source. If there is a set of binaries, please direct me to them :-) The run of the installed version I compiled eventually crashed after producing 2GB of output to the .bam file.

tijyojwad commented 5 months ago

Hi @pbpearman you can find prebuilt binaries here - https://github.com/nanoporetech/dorado?tab=readme-ov-file#installation

pbpearman commented 5 months ago

Hi. I ran the pre-built binary. Interestingly, it crashed too after a while. I then copied that binary into the bin/ where the version I compiled was. I ran ctest --test-dir cmake-build with the pre-built binary. The results are similar to those of the test for the binary I compiled: ~/Downloads/dorado$ ctest --test-dir cmake-build Internal ctest changing into directory: /home/peter/Downloads/dorado/cmake-build Test project /home/peter/Downloads/dorado/cmake-build Start 1: dorado_tests 1/2 Test #1: dorado_tests ..................... Passed 2.75 sec Start 2: dorado_smoke_tests 2/2 Test #2: dorado_smoke_tests ...............***Failed 10.71 sec

50% tests passed, 1 tests failed out of 2

Total Test time (real) = 13.47 sec

The following tests FAILED: 2 - dorado_smoke_tests (Failed) Errors while running CTest Output from these tests are in: /home/peter/Downloads/dorado/cmake-build/Testing/Temporary/LastTest.log Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.

So I attach the file with the output from the tests: LastTest.log

The first reported exception mentions pytorch. I'm not a python developer, or any other kind of developer, so this is foreign ground for me. But if you have some suggestions, I would be happy to try them. I need the command line Dorado version for its ability to call duplex reads, which I can't do with the P2Solo M flow cell and basecalling in Minknow. Peter

tijyojwad commented 5 months ago

I then copied that binary into the bin/ where the version I compiled was. I ran ctest --test-dir cmake-build with the pre-built binary

This won't use any dependencies from the pre-built binary since dorado is statically linked. So you're still running the old test.

Interestingly, it crashed too after a while.

Can you post the following details?

Steps to reproduce the issue:

Please list any steps to reproduce the issue.

Run environment:

Logs

pbpearman commented 5 months ago

@tijyojwad OK, can do all that. But first, I wondered if the crashes might be specific to the duplex option. I ran the same command w/o duplex and it ran successfully overnight, until i killed it. That is much longer than it had been running with duplex. Then I wondered if the crashes depend on which .pod5 file is provided. I ran dorado with a couple of smaller .pod5 files and it did not crash. That suggests that the crashes might be specific to a particular .pod5 file. Do you know the order in which dorado reads .pod5 files from a directory? I can probably find the file it crashes on because it is probably the first file.

tijyojwad commented 5 months ago

Filling in the template I posted will help answer your questions :) .

e.g. how large is your dataset for duplex? were you running without or without alignment (i.e. what's your command line)? what was the log output from your run (i.e. when it crashed, what did it say? was it Killed? or a CUDA error? or a segmentation fault?

There's no fixed in order of traversal of POD5s.

pbpearman commented 5 months ago

Using the pre-compiled version of Dorado, I tested it on a small set of .pod5 files. Before starting, I killed the instance of the dorado_basecall_server that I found with the command nvidia-smi. I issued this command a couple of times within 20 seconds, and on the last time, I found that there was again a process labeled dorado_basecall_server. So, I set up the call to the pre-compiled dorado version, then started it quickly after killing the server process. The pre-compiled dorado version finished without error, successfully processing the .pod5 files I had provided. Immediate problem solved. I have now successfully run the pre-compiled dorado on many .pod5 files over the last two days and each time it completed successfully.

While the current issue seems to be resolved, it appears that the version of dorado_basecall_server (that is being re-started) is not compatible with the pre-compiled command-line version of Dorado (and perhaps not with the version I compiled myself, although I didn't test it). Minknow is also installed on this machine. How can I find out what is re-starting dorado_basecall_server, and disable it, hopefully without damaging the minknow installation? Killing the running server and quickly starting dorado is not a very elegant solution.

malton-ont commented 5 months ago

@pbpearman,

Minknow ships with a system daemon that maintains an active basecall server. You can stop this service using:

sudo systemctl stop doradod

(and likewise start to restart it). Note that live basecalling in Minknow will not be available while the server is stopped.