nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
446 stars 54 forks source link

dorado_basecall_server --version hangs indefinitely and MinKNOW installation fails #765

Closed linsalrob closed 1 month ago

linsalrob commented 2 months ago

Issue Report

Please describe the issue:

During installation of MinKNOW or restarting the computer, /opt/ont/dorado/bin/dorado_basecall_server --version hangs indefinitely and the minknow service does not start.

Steps to reproduce the issue:

System information: Distributor ID: Ubuntu Description: Ubuntu 20.04.4 LTS Release: 20.04 Codename: focal

During a clean install of minknow-gpu-release, the installation will hang awaitingdorado --version. For example:

apt install ont-standalone-minknow-gpu-release

Run environment:

$ /opt/ont/dorado/bin/dorado_basecall_server --version

hangs indefinitely. Version is 7.3.9

$ /opt/ont/dorado/bin/dorado_basecall_server --version

Logs

Selecting previously unselected package ont-doradod-for-minion.
Preparing to unpack .../11-ont-doradod-for-minion_7.3.9-1~focal_all.deb ...
Unpacking ont-doradod-for-minion (7.3.9-1~focal) ...
Selecting previously unselected package ont-kingfisher-ui-minion.
Preparing to unpack .../12-ont-kingfisher-ui-minion_5.9.17-1~focal_all.deb ...
Unpacking ont-kingfisher-ui-minion (5.9.17-1~focal) ...
Selecting previously unselected package ont-run-report.
Preparing to unpack .../13-ont-run-report_5.9.6_amd64.deb ...
Unpacking ont-run-report (5.9.6) ...
Selecting previously unselected package ont-vbz-hdf-plugin.
Preparing to unpack .../14-ont-vbz-hdf-plugin_1.0.8-1~focal_amd64.deb ...
Unpacking ont-vbz-hdf-plugin (1.0.8-1~focal) ...
Selecting previously unselected package ont-standalone-minknow-gpu-release.
Preparing to unpack .../15-ont-standalone-minknow-gpu-release_24.02.10~focal_amd64.deb ...
Unpacking ont-standalone-minknow-gpu-release (24.02.10~focal) ...
Setting up libnorm1:amd64 (1.5.8+dfsg2-2build1) ...
Setting up ont-python (3.10.13-0) ...
Setting up ont-run-report (5.9.6) ...
Setting up ont-dorado-models-for-minion (7.3.9-1) ...
Setting up libdbusmenu-gtk4:amd64 (16.04.1+18.10.20180917-0ubuntu6) ...
Setting up ont-vbz-hdf-plugin (1.0.8-1~focal) ...
Setting up libpgm-5.2-0:amd64 (5.2.122~dfsg-3ubuntu1) ...
Setting up libzmq5:amd64 (4.3.2-2ubuntu1) ...
Setting up minknow-core-minion-nc (5.9.7) ...
<<< Hangs here until timeout received >>>
Job for minknow.service failed because a timeout was exceeded.
See "systemctl status minknow.service" and "journalctl -xe" for details.
Setting up ont-dorado-server-for-minion (7.3.9-1~focal) ...
Setting up libappindicator1 (12.10.1+20.04.20200408.1-0ubuntu1) ...
Setting up ont-doradod-for-minion (7.3.9-1~focal) ...
Created symlink /etc/systemd/system/doradod.service → /lib/systemd/system/doradod.service.
Created symlink /etc/systemd/system/multi-user.target.wants/doradod.service → /lib/systemd/system/doradod.service.
Setting up ont-kingfisher-ui-minion (5.9.17-1~focal) ...
Setting up ont-bream4-minion (7.9.4-1~focal) ...
Setting up ont-configuration-customer-minion (5.9.12-1~focal) ...
Setting up ont-standalone-minknow-gpu-release (24.02.10~focal) ...
Job for minknow.service failed because a timeout was exceeded.
See "systemctl status minknow.service" and "journalctl -xe" for details.
Processing triggers for libc-bin (2.31-0ubuntu9.15) ...
/sbin/ldconfig.real: /opt/ont/dorado/lib/libnvToolsExt.so.1 is not a symbolic link
Processing triggers for desktop-file-utils (0.24-1ubuntu3) ...
Processing triggers for mime-support (3.64ubuntu1) ...
Processing triggers for hicolor-icon-theme (0.17-2) ...
Processing triggers for gnome-menus (3.36.0-1ubuntu1) ...

Output from systemctl status minknow.service

$ systemctl status minknow.service
● minknow.service - MinKNOW Instrument Software for MinION (daemon)
     Loaded: loaded (/lib/systemd/system/minknow.service; enabled; vendor preset: enabled)
     Active: failed (Result: timeout) since Tue 2024-04-23 16:05:51 ACST; 14min ago
    Process: 4285 ExecStartPre=/bin/sleep 15 (code=exited, status=0/SUCCESS)
    Process: 4289 ExecStart=/opt/ont/minknow/bin/mk_manager_svc (code=killed, signal=TERM)
   Main PID: 4289 (code=killed, signal=TERM)
      Tasks: 0 (limit: 153937)
     Memory: 4.0K
     CGroup: /system.slice/minknow.service

Apr 23 16:04:06 7pzlxr3-l2 systemd[1]: Starting MinKNOW Instrument Software for MinION (daemon)...
Apr 23 16:05:51 7pzlxr3-l2 systemd[1]: minknow.service: start operation timed out. Terminating.
Apr 23 16:05:51 7pzlxr3-l2 systemd[1]: minknow.service: Killing process 4292 (dorado_basecall) with signal SIGKILL.
Apr 23 16:05:51 7pzlxr3-l2 systemd[1]: minknow.service: Failed with result 'timeout'.
Apr 23 16:05:51 7pzlxr3-l2 systemd[1]: Failed to start MinKNOW Instrument Software for MinION (daemon).

Please note also Issue #390 is not resolved, per the lines in bold

MarkBicknellONT commented 2 months ago

Hi @linsalrob ,

Thanks for the bug report - this is an odd one that we've not seen before. Can I ask you to try a couple of diagnostic things so we can try to narrow down what's going on:

Thanks, Mark

HalfPhoton commented 1 month ago

Closing as stale - please re-open if needed