nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
445 stars 54 forks source link

Installation Issue in Nvidia Jetson AGX Orin Developer kit #786

Closed jetson24 closed 3 weeks ago

jetson24 commented 2 months ago

Issue Report

Please describe the issue:

So I am trying to use the pre-compiled version of Dorado (downloaded from https://cdn.oxfordnanoportal.com/software/analysis/dorado-0.6.1-linux-arm64.tar.gz ) for basecalling.

Errors in pre-compiled version

However, when I untar the file and try to run dorado from the ./dorado-0.6.1-linux-arm64.tar.gz/bin/ it is showing the following error:

jetson@jetson-desktop:~/dorado-0.6.1-linux-arm64/bin$ ./dorado -h
./dorado: error while loading shared libraries: libcupti.so.11.4: cannot open shared object file: No such file or directory

For this, I also tried the solution given on the github to update the LD_LIBRARY_PATH to /dorado-0.6.1-linux-arm64/lib but the error was still persisting.

Thus, I went to other online forums where people had identified the issue as CUDA update and suggested to compile the tool from the source. Thus, I started compiling dorado as per the instructions on github,

$ apt-get update && apt-get install -y --no-install-recommends \
        curl \
        git \
        ca-certificates \
        build-essential \
        nvidia-cuda-toolkit \
        libhdf5-dev \
        libssl-dev \
        libzstd-dev \
        cmake \
        autoconf \
        automake
$ git clone https://github.com/nanoporetech/dorado.git dorado
$ cd dorado
$ cmake -S . -B cmake-build
$ cmake --build cmake-build --config Release -j

which also had some errors along the way but got resolved after installing and finally I was able to compile it with gcc-9.

Errors during compilation:

The errors were:

  1. gmake[2]: *** No rule to make target '/usr/lib64/libcufft_static_nocallback.a', needed by 'tests/dorado_smoke_tests'.  Stop.
    gmake[1]: *** [CMakeFiles/Makefile2:1073: tests/CMakeFiles/dorado_smoke_tests.dir/all] Error 2
    gmake[1]: *** Waiting for unfinished jobs....
    gmake[2]: *** No rule to make target '/usr/lib64/libcufft_static_nocallback.a', needed by 'bin/dorado'.  Stop.
    gmake[1]: *** [CMakeFiles/Makefile2:507: CMakeFiles/dorado.dir/all] Error 2

    Solved via introducing a link of libcufft_static_nocallback.a to /usr/lib64.

  2. gmake[2]: *** No rule to make target '/usr/lib64/liblapack_static.a', needed by 'tests/dorado_tests'.  Stop.
    gmake[2]: *** Waiting for unfinished jobs....

    Solved via installing liblapack-dev

  3. /usr/bin/ld: cannot find -lnuma: No such file or directory
    collect2: error: ld returned 1 exit status
    gmake[2]: *** [CMakeFiles/dorado.dir/build.make:296: bin/dorado] Error 1
    gmake[1]: *** [CMakeFiles/Makefile2:507: CMakeFiles/dorado.dir/all] Error 2
    gmake: *** [Makefile:166: all] Error 2

solved via installing libnuma-dev and configuring the ld with ldconfig.

  1. and the last one was with libstdc++.so.6
    /usr/bin/ld: ../dorado/models/libdorado_models_lib.a(model_downloader.cpp.o): undefined reference to symbol '_ZNSt15basic_streambufIcSt11char_traitsIcEE8overflowEi@@GLIBCXX_3.4'
    /usr/bin/ld: /lib/aarch64-linux-gnu/libstdc++.so.6: error adding symbols: DSO missing from command line
    collect2: error: ld returned 1 exit status
    gmake[2]: *** [tests/CMakeFiles/dorado_smoke_tests.dir/build.make:156: tests/dorado_smoke_tests] Error 1
    gmake[1]: *** [CMakeFiles/Makefile2:1073: tests/CMakeFiles/dorado_smoke_tests.dir/all] Error 2
    gmake[1]: *** Waiting for unfinished jobs....
    /usr/bin/ld: ../dorado/models/libdorado_models_lib.a(model_downloader.cpp.o): undefined reference to symbol '_ZNSt15basic_streambufIcSt11char_traitsIcEE8overflowEi@@GLIBCXX_3.4'
    /usr/bin/ld: /lib/aarch64-linux-gnu/libstdc++.so.6: error adding symbols: DSO missing from command line
    collect2: error: ld returned 1 exit status
    gmake[2]: *** [tests/CMakeFiles/dorado_tests.dir/build.make:1020: tests/dorado_tests] Error 1
    gmake[1]: *** [CMakeFiles/Makefile2:1031: tests/CMakeFiles/dorado_tests.dir/all] Error 2
    /usr/bin/ld: dorado/models/libdorado_models_lib.a(model_downloader.cpp.o): undefined reference to symbol '_ZNSt15basic_streambufIcSt11char_traitsIcEE8overflowEi@@GLIBCXX_3.4'
    /usr/bin/ld: /lib/aarch64-linux-gnu/libstdc++.so.6: error adding symbols: DSO missing from command line
    collect2: error: ld returned 1 exit status
    gmake[2]: *** [CMakeFiles/dorado.dir/build.make:296: bin/dorado] Error 1
    gmake[1]: *** [CMakeFiles/Makefile2:507: CMakeFiles/dorado.dir/all] Error 2
    gmake: *** [Makefile:166: all] Error 2

    Solved via introducing -lstdc++ in cMakelist.txtfile.

    target_link_libraries(dorado_models_lib
    PRIVATE
        dorado_utils
        elzip
        spdlog::spdlog
        -lstdc++
    )

    Problem persists

    However, even after compiling while running the ctest --test-dir cmake-build, the test is failing. The error log for the same is as follows and there are multiple errors for which what to do I am not able to understand.

LastLog.txt

Help in this or in the pre-compiled version's error is very much needed.

Thanking all in advance!

malton-ont commented 2 months ago

Hi @jetson24,

Apologies for the issues you are seeing. We are currently discussing the level of support for Jetson devices - our main priority is TX2 due to minKNOW's Mk1C variant.

This missing libcupti appears to be a packaging error in this distribution. You can install the cuda-cupti-11-4 apt package yourself to create this file - you may need to update your LD_LIBRARY_PATH to include /usr/local/cuda-11.4/targets/aarch64-linux/lib.

Regarding the compile errors: libcufft_static_nocallback - please use CMake >= 3.24 liblapack_static.a - this should be provided by libcusolver-dev-11-4 apt pacakge, which should come as part of the cuda toolkit -lnuma - I'm not aware that we require the numa libs - possibly this is due to one of your other dependency versions? -lstdc++ - this should be automatically detected, possibly a CMake version issue as well?

Regarding the unit test errors:

/home/jay/dorado/tests/gpu_monitor_test.cpp:359: FAILED:
  REQUIRE( info->current_throttling_reason.has_value() )
with expansion:
  false
with message:
  info->current_throttling_reason_error := "Not Supported"

This appears to be an oversight in our test setup - we don't run these specific tests on the Jetson devices, and Jetsons don't support these queries. These errors can be safely ignored.

/home/jay/dorado/tests/IndexFileAccessTest.cpp:256: FAILED:
  REQUIRE( header == EXPECTED_REF_FILE_HEADER )
with expansion:
  "@SQ  SN:read_0   LN:1,898"
  ==
  "@SQ  SN:read_0   LN:1898"

This appears to be a locale issue. Please try setting export LC_ALL=C. We will look at enforcing this in future versions (I'm not clear how e.g. samtools handles thousand-separators in sam files?)

[2024-05-01 11:07:31.508] [error] CuBLAS error 15

This is the concerning one for me. This indicates an internal failure in cublas - possibly an Out of Memory, or some misconfiguration with the drivers. Note that we build this variant internally using Cuda 11.4. Unless you also see this during basecalling, I would not worry about it for the time-being. If it also occurs during basecalling, please try reducing the selected batchsize using the -b parameter.

jetson24 commented 2 months ago

Alright! Thank you very much! I will try the suggestions in both pre-built as well as compiled version and keep you posted!

Thanks a ton, again!

JMencius commented 1 month ago

Hi @malton-ont I wonder if it is possible to run dorado on Jetson TX2 with basecalling of SUP mode?

malton-ont commented 1 month ago

Hi @JMencius,

No, sup basecalling isn't supported on TX2.

jetson24 commented 3 weeks ago

hi @malton-ont, I tried your suggetions, and the the latest versions are working well on my Jetson AGX orin.

Thanks again!