nanoporetech / bonito

A PyTorch Basecaller for Oxford Nanopore Reads
https://nanoporetech.com/
Other
394 stars 121 forks source link

install CUDA 10.2 ? #60

Closed pmenzel closed 2 years ago

pmenzel commented 4 years ago

Hi, is there an easy way to install the CUDA 10.2 packages for bonitio 0.3.0? Seems like that Ubuntu only has 10.1 in the package manager..

I went the "venv3" route for installing bonio, in case it matters.

thanks, Peter

iiSeymour commented 4 years ago

Hey @pmenzel

If you want to use your preinstalled 10.1 version of CUDA then -

Sorry it's a bit of pain - I hope to figure out a better way to not depend of a specific version of CUDA soon.

HTH

Chris.

pmenzel commented 4 years ago

Thanks, that worked! For posterity, the commands are:

git clone https://github.com/davidcpage/seqdist.git
# change seqdist/settings.ini
pip install ./seqdist/ 
pmenzel commented 4 years ago

I am getting ~ 20 reads / sec on 16S amplicon reads, using a GeForce RTX 2060 Super.

pmenzel commented 4 years ago

Hey Chris, I just noticed that the output fastq file does not contain quality scores and the summary.tsv has a 0.0 in the mean_qscore_template column. Are quality scores not yet supported?

iiSeymour commented 4 years ago

That's right - we can probably get mean qscores values in the summary file easily enough but per base qscores will be tricky.

pmenzel commented 4 years ago

Alright, mean qscores are already useful for filtering, so that would be a nice to have!

gaworj commented 4 years ago

Hi,

I have encountered similar problem with cupy cuda version.

Followed instructions provided in this topic but as a result I've got:

Traceback (most recent call last):
  File "/home/jang/biosoft/bonito/venv3/lib/python3.5/site-packages/cupy/__init__.py", line 20, in <module>
    from cupy import core  # NOQA
  File "/home/jang/biosoft/bonito/venv3/lib/python3.5/site-packages/cupy/core/__init__.py", line 1, in <module>
    from cupy.core import core  # NOQA
  File "cupy/core/core.pyx", line 1, in init cupy.core.core
  File "/home/jang/biosoft/bonito/venv3/lib/python3.5/site-packages/cupy/cuda/__init__.py", line 5, in <module>
    from cupy.cuda import compiler  # NOQA
  File "/home/jang/biosoft/bonito/venv3/lib/python3.5/site-packages/cupy/cuda/compiler.py", line 10, in <module>
    from cupy.cuda import device
  File "cupy/cuda/device.pyx", line 10, in init cupy.cuda.device
ImportError: /home/jang/biosoft/bonito/venv3/lib/python3.5/site-packages/cupy_backends/cuda/libs/cusparse.cpython-35m-x86_64-linux-gnu.so: symbol cusparseConstrainedGeMM_bufferSize, version libcusparse.so.10 not defined in file libcusparse.so.10 with link time reference

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/jang/biosoft/bonito/venv3/bin/bonito", line 9, in <module>
    load_entry_point('ont-bonito', 'console_scripts', 'bonito')()
  File "/home/jang/biosoft/bonito/bonito/__init__.py", line 39, in main
    args.func(args)
  File "/home/jang/biosoft/bonito/bonito/cli/basecaller.py", line 26, in main
    model = load_model(args.model_directory, args.device, weights=int(args.weights))
  File "/home/jang/biosoft/bonito/bonito/util.py", line 283, in load_model
    Model = load_symbol(config, "Model")
  File "/home/jang/biosoft/bonito/bonito/util.py", line 250, in load_symbol
    imported = import_module(config['model']['package'])
  File "/usr/lib/python3.5/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 986, in _gcd_import
  File "<frozen importlib._bootstrap>", line 969, in _find_and_load
  File "<frozen importlib._bootstrap>", line 958, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 673, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 665, in exec_module
  File "<frozen importlib._bootstrap>", line 222, in _call_with_frames_removed
  File "/home/jang/biosoft/bonito/bonito/crf/__init__.py", line 1, in <module>
    from .model import Model
  File "/home/jang/biosoft/bonito/bonito/crf/model.py", line 10, in <module>
    import seqdist.sparse
  File "/home/jang/biosoft/bonito/venv3/lib/python3.5/site-packages/seqdist/sparse.py", line 9, in <module>
    import cupy as cp
  File "/home/jang/biosoft/bonito/venv3/lib/python3.5/site-packages/cupy/__init__.py", line 41, in <module>
    raise ImportError(_msg) from e
ImportError: CuPy is not correctly installed.

If you are using wheel distribution (cupy-cudaXX), make sure that the version of CuPy you installed matches with the version of CUDA on your host. Also, confirm that only one CuPy package is installed: $ pip freeze

If you are building CuPy from source, please check your environment, uninstall CuPy and reinstall it with: $ pip install cupy --no-cache-dir -vvvv

Any suggestions?

Thanks in advance! Jan

iiSeymour commented 4 years ago

Hey @gaworj

Do you know what CUDA versions you have on your machine? If not check the ouput of nvidia-smi or look in /usr/local/cuda*. cupy have prebuilt packages that match your CUDA version so if you have CUDA 10.0 you will want cupy-cuda100, or for 10.1 then cupy-101, ect.

cupy (8.0.0)              - CuPy: NumPy-like API accelerated with CUDA
cupy-cuda90 (8.0.0)       - CuPy: NumPy-like API accelerated with CUDA
cupy-cuda92 (8.0.0)       - CuPy: NumPy-like API accelerated with CUDA
cupy-cuda100 (8.0.0)      - CuPy: NumPy-like API accelerated with CUDA
cupy-cuda101 (8.0.0)      - CuPy: NumPy-like API accelerated with CUDA
cupy-cuda102 (8.0.0)      - CuPy: NumPy-like API accelerated with CUDA
cupy-cuda110 (8.0.0)      - CuPy: NumPy-like API accelerated with CUDA

There is also a source distribution cupy which will build at install time but it's quite slow to build so I would recommend going with a prebuilt one.

gaworj commented 4 years ago

Hi,

I have installed proper version of cupy and that is a problem:

a) after pip freeze I got:

alembic==1.4.3 certifi==2020.6.20 chardet==3.0.4 cliff==3.1.0 cmd2==0.8.9 colorlog==4.4.0 crf-beam==0.0.1a0 cupy-cuda101==8.0.0 Cython==0.29.21 fast-ctc-decode==0.2.5 fastrlock==0.5 future==0.18.2 h5py==2.10.0 idna==2.8 joblib==0.14.1 Mako==1.1.3 mappy==2.17 MarkupSafe==1.1.1 numpy==1.18.5 -e git+https://github.com/nanoporetech/bonito.git@07f885ee9a1c0fef66e8177f00615c12128f453d#egg=ont_bonito ont-fast5-api==3.1.6 optuna==1.1.0 packaging==20.4 parasail==1.2 pbr==5.5.1 pkg-resources==0.0.0 prettytable==0.7.2 progressbar33==2.4 pyparsing==2.4.7 pyperclip==1.8.1 python-dateutil==2.8.1 python-editor==1.0.4 PyYAML==5.3.1 requests==2.22.0 scipy==1.4.1 seqdist @ file:///home/jang/biosoft/seqdist six==1.15.0 SQLAlchemy==1.3.20 stevedore==1.32.0 toml==0.10.0 torch==1.5.0 tqdm==4.31.1 urllib3==1.25.11 wcwidth==0.2.5
Thu Oct 29 10:26:24 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 105...  Off  | 00000000:01:00.0  On |                  N/A |
|  0%   31C    P0    N/A / 120W |   1240MiB /  4038MiB |      1%      Default |
+-------------------------------+----------------------+----------------------

I have already installed cupy cuda 10.1 via seqdist and .ini file modification.

Bests, Jan

iiSeymour commented 4 years ago

The error you are seeing from cupy is:

symbol cusparseConstrainedGeMM_bufferSize, version libcusparse.so.10 not defined in file libcusparse.so.10 with link time reference

Do you have more than one CUDA SDK installed? Can you check for me that /usr/local/cuda is a symlink to CUDA 10.1 on your system.

On my system I see /usr/local/cuda points at my 10.2 installation, I have a working nvcc compiler, and cusparseConstrainedGeMM_bufferSize is defined in libcusparse.so.10.

$ ll -d /usr/local/cuda*
lrwxrwxrwx  1 root root    9 Feb 22  2020 /usr/local/cuda -> cuda-10.2/
drwxr-xr-x 16 root root 4096 Feb 22  2020 /usr/local/cuda-10.2/
$
$ /usr/local/cuda/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89
$
$ ls -l /usr/local/cuda/lib64/libcusparse.so.10
lrwxrwxrwx 1 root root 24 Nov 13  2019 /usr/local/cuda/lib64/libcusparse.so.10 -> libcusparse.so.10.3.1.89
$
$ nm -gD /usr/local/cuda/lib64/libcusparse.so.10 | grep cusparseConstrainedGeMM_bufferSize
0000000000080a00 T cusparseConstrainedGeMM_bufferSize

If you see the same thing on your system then you could try using cupy instead of the prebuilt cupy-cuda101 but I think we need to raise this over on the cupy repo.

gaworj commented 4 years ago

Here are the results:

(venv3) jang@jang-MS-7B18:~/biosoft/bonito$ ll -d /usr/local/cuda* lrwxrwxrwx 1 root root 9 lip 3 2019 /usr/local/cuda -> cuda-10.1/ drwxr-xr-x 18 root root 4096 lip 3 2019 /usr/local/cuda-10.1/

(venv3) jang@jang-MS-7B18:~/biosoft/bonito$ /usr/local/cuda/bin/nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2019 NVIDIA Corporation Built on Wed_Apr_24_19:10:27_PDT_2019 Cuda compilation tools, release 10.1, V10.1.168

(venv3) jang@jang-MS-7B18:~/biosoft/bonito$ ls -l /usr/local/cuda/lib64/libcusparse.so.10 lrwxrwxrwx 1 root root 23 maj 7 2019 /usr/local/cuda/lib64/libcusparse.so.10 -> libcusparse.so.10.1.168

(venv3) jang@jang-MS-7B18:~/biosoft/bonito$ nm -gD /usr/local/cuda/lib64/libcusparse.so.10 | grep cusparseConstrainedGeMM_bufferSize

After running last command nothing happens.

iiSeymour commented 4 years ago

The cupy message is accurate then about the missing symbol - sorry @gaworj but I don't think I can provide any more insight.

Can you raise it over the cupy repo and/or update cuda?

iiSeymour commented 4 years ago

@vellamike @anaruse do you have further advice on the issue @gaworj is seeing?

vellamike commented 4 years ago

Agree this looks like a possible Cupy issue. We are looking into it.

anaruse commented 4 years ago

I think you are using CUDA 10.1 upgrade 1, correct?. If so, would it be possible to try upgrading to CUDA 10.1 upgrade 2 or CUDA 10.2? https://github.com/cupy/cupy/issues/3918

gaworj commented 3 years ago

Hi,

@anaruse thanks for suggestion. After CUDA upgarde to 10.2 everythins works fine until:

RuntimeError: CUDA out of memory. Tried to allocate 500.00 MiB (GPU 0; 3.94 GiB total capacity; 1.70 GiB already allocated; 485.56 MiB free; 2.47 GiB reserved in total by PyTorch)

I've tried to run bonito on GTX 1050Ti. Guppy basecaller works on it so I thought that in this case would be no problems.

Bests, Jan

iiSeymour commented 3 years ago

Okay, that one is a bonito issue - the default configuration will allocate more than 4GB. If you turn down batchsize and split_read_length it bonito/crf/basecall.py it should run:

--- a/bonito/crf/basecall.py
+++ b/bonito/crf/basecall.py
@@ -80,11 +80,11 @@ def decode_int8(scores, seqdist, scale=127/5, beamsize=40, beamcut=100.0):
         return ""

-def basecall(model, reads, aligner=None, beamsize=40, chunksize=4000, overlap=500, batchsize=32, qscores=False):
+def basecall(model, reads, aligner=None, beamsize=40, chunksize=4000, overlap=500, batchsize=16, qscores=False):
     """
     Basecalls a set of reads.
     """
-    split_read_length=400000
+    split_read_length=100000
     _stitch = partial(
         stitch,
         start=overlap // 2 // model.stride,
gaworj commented 3 years ago

I've followed your suggestionns and now it works! Thanks a lot!

adbeggs commented 3 years ago

Just to feedback I have set up Bonito 0.3.0 using CUDA10.1 on an old GPU (Geforce GTX 1060) and it seems to be working perfectly basecalling merrily at 16-20 reads/seq on amplicon data - so thanks for the advice and the cool basecaller!

dgiguer commented 3 years ago

Hi,

I have encountered similar problem with cupy cuda version.

Followed instructions provided in this topic but as a result I've got:

Traceback (most recent call last):
  File "/home/jang/biosoft/bonito/venv3/lib/python3.5/site-packages/cupy/__init__.py", line 20, in <module>
    from cupy import core  # NOQA
  File "/home/jang/biosoft/bonito/venv3/lib/python3.5/site-packages/cupy/core/__init__.py", line 1, in <module>
    from cupy.core import core  # NOQA
  File "cupy/core/core.pyx", line 1, in init cupy.core.core
  File "/home/jang/biosoft/bonito/venv3/lib/python3.5/site-packages/cupy/cuda/__init__.py", line 5, in <module>
    from cupy.cuda import compiler  # NOQA
  File "/home/jang/biosoft/bonito/venv3/lib/python3.5/site-packages/cupy/cuda/compiler.py", line 10, in <module>
    from cupy.cuda import device
  File "cupy/cuda/device.pyx", line 10, in init cupy.cuda.device
ImportError: /home/jang/biosoft/bonito/venv3/lib/python3.5/site-packages/cupy_backends/cuda/libs/cusparse.cpython-35m-x86_64-linux-gnu.so: symbol cusparseConstrainedGeMM_bufferSize, version libcusparse.so.10 not defined in file libcusparse.so.10 with link time reference

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/jang/biosoft/bonito/venv3/bin/bonito", line 9, in <module>
    load_entry_point('ont-bonito', 'console_scripts', 'bonito')()
  File "/home/jang/biosoft/bonito/bonito/__init__.py", line 39, in main
    args.func(args)
  File "/home/jang/biosoft/bonito/bonito/cli/basecaller.py", line 26, in main
    model = load_model(args.model_directory, args.device, weights=int(args.weights))
  File "/home/jang/biosoft/bonito/bonito/util.py", line 283, in load_model
    Model = load_symbol(config, "Model")
  File "/home/jang/biosoft/bonito/bonito/util.py", line 250, in load_symbol
    imported = import_module(config['model']['package'])
  File "/usr/lib/python3.5/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 986, in _gcd_import
  File "<frozen importlib._bootstrap>", line 969, in _find_and_load
  File "<frozen importlib._bootstrap>", line 958, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 673, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 665, in exec_module
  File "<frozen importlib._bootstrap>", line 222, in _call_with_frames_removed
  File "/home/jang/biosoft/bonito/bonito/crf/__init__.py", line 1, in <module>
    from .model import Model
  File "/home/jang/biosoft/bonito/bonito/crf/model.py", line 10, in <module>
    import seqdist.sparse
  File "/home/jang/biosoft/bonito/venv3/lib/python3.5/site-packages/seqdist/sparse.py", line 9, in <module>
    import cupy as cp
  File "/home/jang/biosoft/bonito/venv3/lib/python3.5/site-packages/cupy/__init__.py", line 41, in <module>
    raise ImportError(_msg) from e
ImportError: CuPy is not correctly installed.

If you are using wheel distribution (cupy-cudaXX), make sure that the version of CuPy you installed matches with the version of CUDA on your host. Also, confirm that only one CuPy package is installed: $ pip freeze

If you are building CuPy from source, please check your environment, uninstall CuPy and reinstall it with: $ pip install cupy --no-cache-dir -vvvv

Any suggestions?

Thanks in advance! Jan

I had this error too, I found the easiest way was to just install cudatoolkit through conda.

conda install -c anaconda cudatoolkit=10.2
crysclitheroe commented 3 years ago

Dear @iiSeymour

I messaged you earlier on twitter, thanks for getting back to me. I also have this issue. It seems I do not have permission to install or upgrade CUDA 10.2 to our local gpu cluster, and as it's the long weekend I will have to wait 3 painful days before my request to do this gets looked at by HPC support.

In the meantime Im thinking to try install the earlier version of bonito (0.3.0?) like @adbeggs above. Is there a chance this will be compatible with cuda/10.0.130?

iiSeymour commented 3 years ago

If you can get PyTorch working with Cuda 10 then bonito will be fine, you'll just have to follow the steps to build seqdist with cupy-cuda100 as described at the top of this thread. Alternatively, as @dgiguer suggests you could use conda to install cudatoolkit=10.2.

crysclitheroe commented 3 years ago

So I tried the seq dist route but still got an explicit err about my nvidia driver being to old. I used conda to create and install CUDA toolkit 10.2 - and its there in conda list, along with installing the conda-forge nvcc_linux-64 but when I try

ll -d /usr/local/cuda*

I do not have the correct link:

lrwxrwxrwx 1 root root 9 Jan 24 2019 /usr/local/cuda -> cuda-10.0 drwxr-xr-x 16 root root 4096 Jan 24 2019 /usr/local/cuda-10.0

Dear @dgiguer what is the best way to set up conda env so that bonito can run inside on the correct driver?

iiSeymour commented 3 years ago

Conda won't be installing the CUDA toolkit under /usr/local (you'd need root todo that).

I don't use conda myself but I suspect if you have successfully installed cudatoolkit=10.2 then you should be able to install ont-bonito into the same environment and start basecalling.

crysclitheroe commented 3 years ago

Apologies in advance for the messy codeblocks, Im not used to the formatting here. Still not sure where Im going wrong with this. @dgiguer could you show me how is your conda env setup successfully?

For example, On an interactive gpu session, Ive set up a conda env called bonito,conda installed cudatoolkit=10.2, pytorch, python=3.7.3, and within that pip installed bonito. In the bonito env, conda list returns:

(bonito) bash-4.2$ conda list
# packages in environment at /home/c/crystal/miniconda3/envs/bonito:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
blas                      1.0                         mkl  
ca-certificates           2020.10.14                    0  
certifi                   2020.6.20          pyhd3eb1b0_3  
cudatoolkit               10.2.89              hfd86e86_1  
cupy-cuda102              8.1.0                    pypi_0    pypi
freetype                  2.10.4               h5ab3b9f_0  
intel-openmp              2020.2                      254  
jpeg                      9b                   h024ee3a_2  
lcms2                     2.11                 h396b838_0  
libedit                   3.1.20191231         h14c3975_1  
libffi                    3.2.1             hf484d3e_1007  
libgcc-ng                 9.1.0                hdf63c60_0  
libpng                    1.6.37               hbc83047_0  
libstdcxx-ng              9.1.0                hdf63c60_0  
libtiff                   4.1.0                h2733197_1  
libuv                     1.40.0               h7b6447c_0  
lz4-c                     1.9.2                heb0550a_3  
mkl                       2020.2                      256  
mkl-service               2.3.0            py37he904b0f_0  
mkl_fft                   1.2.0            py37h23d657b_0  
mkl_random                1.1.1            py37h0573a6f_0  
ncurses                   6.2                  he6710b0_1  
ninja                     1.10.1           py37hfd86e86_0  
numpy                     1.19.4                   pypi_0    pypi
numpy-base                1.19.2           py37hfa32c7d_0  
olefile                   0.46                     py37_0  
openssl                   1.1.1h               h7b6447c_0  
pillow                    8.0.1            py37he98fc37_0  
pip                       20.2.4           py37h06a4308_0  
python                    3.7.3                h0371630_0  
pytorch                   1.7.0           py3.7_cuda10.2.89_cudnn7.6.5_0    pytorch
readline                  7.0                  h7b6447c_5  
setuptools                50.3.1           py37h06a4308_1  
six                       1.15.0           py37h06a4308_0  
sqlite                    3.33.0               h62c20be_0  
tk                        8.6.10               hbc83047_0  
torchvision               0.8.1                py37_cu102    pytorch
typing_extensions         3.7.4.3                    py_0  
wheel                     0.35.1             pyhd3eb1b0_0  
xz                        5.2.5                h7b6447c_0  
zlib                      1.2.11               h7b6447c_3  
zstd                      1.4.5                h9ceee32_0  `

but nvidia-smi command returns

(bonito) bash-4.2$ nvidia-smi
Mon Nov 23 17:15:18 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.79       Driver Version: 410.79       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-SXM2...  On   | 00000000:04:00.0 Off |                    0 |
| N/A   31C    P0    32W / 300W |      0MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+`

When I run bonito basecaller on my raw data I get:

File "/apps/unit/BourguignonU/bonito/0.3.1/lib/python3.7/site-packages/torch/cuda/__init__.py", line 63, in _check_driver
    of the CUDA driver.""".format(str(torch._C._cuda_getDriverVersion())))
AssertionError: 
The NVIDIA driver on your system is too old (found version 10000).
Please update your GPU driver by downloading and installing a new
version from the URL: http://www.nvidia.com/Download/index.aspx
Alternatively, go to: https://pytorch.org to install
a PyTorch version that has been compiled with your version
of the CUDA driver.

Thanks in advance! Crystal

PS: I also tried uninstalling everything and then conda installing pytorch torchvision with the cudatoolkit=10.0, ie that matches our driver. It meant rebuilding seqdist as above (which i could only do as user, possibly creating problems when using conda), with a matching cupy-cuda100. Then reinstalling bonito, but upon running the bassecaller still get exactly same err as above.

dgiguer commented 3 years ago

Hi @Tipplynne ,

I arbitrarily chose python 3.6 to start when I tried this last week (which seems to have been lucky). I tried with 3.7 and it looks like I get the memory error mentioned above. The fast work-around for now may be just to create a python 3.6 environment.

conda create -n test python=3.6
conda activate test
conda install -c anaconda cudatoolkit=10.2

pip install ont-bonito

It may be that your conda environment is not the first place searched in your path, it seems that you are calling Bonito from a different location than your conda env:

# your bonito path
"/apps/unit/BourguignonU/bonito/0.3.1/lib/python3.7/site-packages/torch/cuda/__init__.py"

# your conda env
/home/c/crystal/miniconda3/envs/bonito

You may need to delete the versions of Bonito that are outside your conda env or modify your path to search the conda env first. I believe when you call bonito it should be called from your conda environment that you just created.

(test) dgiguer@gru:~$ which bonito
/opt/miniconda3/envs/test/bin/bonito

Hope this helps!

crysclitheroe commented 3 years ago

Hi @dgiguer

Thank you, that was helpful, but in the end it turns out our primary problem is our hardware's nvidia driver being incompatible with anything >cuda 10.0 :( - and will probably only see an upgrade to our system next April.

For now i give up on conda because it seems that using pip inside conda here is creating problems with installation of dependencies across/ outside conda envs and generally breaking things so I had to rm -rf it all out entirely and start again.

Dear @iiSeymour so now Im going back to trying to get everything to run on previous versions (this time in tried in both versions of python first 3.6 then, uninstall, cleanup, and reinstall with 3.7.3).

I first try to force my environment to use a version of torch that I know is compatible with our system (thanks to one of our admin):

pip3 install --user torch==1.4.0+cu100 torchvision==0.5.0+cu100 -f https://download.pytorch.org/whl/torch_stable.html

The problem is I cannot use pip3 to then install the modified build due to permissions:

    running install_lib
    creating /usr/lib/python3.6/site-packages/seqdist
    error: could not create '/usr/lib/python3.6/site-packages/seqdist': Permission denied
    ----------------------------------------
    Command "/usr/bin/python3.6 -u -c "import setuptools, tokenize;__file__='/scratch/pip-5c1cup0t- 
    /setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, 
    __file__, 'exec'))" install --record /scratch/pip-ltsj2mlp-record/install-record.txt --single-version-externally-managed -- 
    compile" failed with error code 1 in /scratch/pip-5c1cup0t-build/

using pip to install seqdist normally (--user) shows me that seqdist is leaving the right version of torch, but reinstalling cupy-cuda102.

Is there another way to install the modified seqdist build without using pip? Or perhaps modify seqdist to use cupy-cuda100 after installing with pip, so that bonito will accept it and not try reinstall it?

Sorry for all the questions!

iiSeymour commented 3 years ago

@Tipplynne bonito will be fine with torch==1.4.0 and either py3.6 or py3.7. You don't want to install seqdist with pip, you need to check out a copy of seqdist and change the requirement https://github.com/nanoporetech/bonito/issues/60#issuecomment-717995692, then install locally inside your conda/virtual environment by running the python setup.py install in the seqdist directory.

crysclitheroe commented 3 years ago

So I tried condas again a bit more carefully this time. Was able to successfully install bonito (and modified seqdist) into a python 3.7.3 env, but running into a new err now:

Recap of the env-installation setup:

conda create -n bonito python=3.7.3
conda activate bonito
pip3 install --user torch==1.4.0+cu100 torchvision==0.5.0+cu100 -f https://download.pytorch.org/whl/torch_stable.html
git clone https://github.com/davidcpage/seqdist.git
# change seqdist/settings.ini
pip3 install ./seqdist/ 
pip3 install ont-bonito # with a thousand warnings
# some additions after every time bonito -h returns errors:
pip3 install scipy
pip3 install ont_fast5_api
pip3 install crf-beam
# bonito -h runs and can download models

now bonito basecaller complains

> loading model
Traceback (most recent call last):
  File "/home/c/crystal/miniconda3/envs/bonito/bin/bonito", line 33, in <module>
    sys.exit(load_entry_point('ont-bonito==0.3.1', 'console_scripts', 'bonito')())
  File "/home/c/crystal/miniconda3/envs/bonito/lib/python3.7/site-packages/bonito/__init__.py", line 39, in main
    args.func(args)
  File "/home/c/crystal/miniconda3/envs/bonito/lib/python3.7/site-packages/bonito/cli/basecaller.py", line 26, in main
    model = load_model(args.model_directory, args.device, weights=int(args.weights))
  File "/home/c/crystal/miniconda3/envs/bonito/lib/python3.7/site-packages/bonito/util.py", line 283, in load_model
    Model = load_symbol(config, "Model")
  File "/home/c/crystal/miniconda3/envs/bonito/lib/python3.7/site-packages/bonito/util.py", line 250, in load_symbol
    imported = import_module(config['model']['package'])
  File "/home/c/crystal/miniconda3/envs/bonito/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/c/crystal/miniconda3/envs/bonito/lib/python3.7/site-packages/bonito/crf/__init__.py", line 1, in <module>
    from .model import Model
  File "/home/c/crystal/miniconda3/envs/bonito/lib/python3.7/site-packages/bonito/crf/model.py", line 10, in <module>
    import seqdist.sparse
  File "/home/c/crystal/miniconda3/envs/bonito/lib/python3.7/site-packages/seqdist/sparse.py", line 14, in <module>
    from .ctc import interleave_blanks, generate_sample_inputs, loss_pytorch, benchmark_fwd_bwd, report, compare_fwd_bwd
  File "/home/c/crystal/miniconda3/envs/bonito/lib/python3.7/site-packages/seqdist/ctc.py", line 120, in <module>
    (torch.float32, Log): load_cupy_func('cuda/ctc.cu', 'fwd_bwd_logspace', FLOAT='float',  SUM='logsumexp3', MUL='add', ZERO='{:E}'.format(Log.zero)),
  File "/home/c/crystal/miniconda3/envs/bonito/lib/python3.7/site-packages/seqdist/utils.py", line 71, in load_cupy_func
    return add_checks(cp.RawKernel(code, name))
AttributeError: module 'cupy' has no attribute 'RawKernel'

I did also try to fix this with conda install -c conda-forge cupy cudatoolkit=10.0 and rebuild bonito, but still getting the same error.

iiSeymour commented 3 years ago

Okay, you are close now, can you confirm the version of cupy installed with pip freeze | grep cupy.

crysclitheroe commented 3 years ago

Thanks! pip freeze | grep cupy returns cupy-cuda100==8.1.0

and conda list returns

cudatoolkit               10.0.130                      0    anaconda
cudnn                     7.6.5                cuda10.0_0  
crysclitheroe commented 3 years ago

Oh My Greatness! Basecalling at ~5reads per second on very, old very finicky hardware :)

What is wrong with our system is that it is really old but essentially had to build both seqdist and cupy from older versions direct to gpu nodes to be compatible; we use CentOS 7.6 which has a much older 3.10 kernel that only support Glibc up to 3.4.19. cupy and bonito were looking for something called /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21'.

Thanks so much for your help and looking forward to playing with bonito and hopefully building my own models soon!

iiSeymour commented 3 years ago

Well done @Tipplynne that's great!

zbz-cool commented 3 years ago

That's right - we can probably get mean qscores values in the summary file easily enough but per base qscores will be tricky.

Hi Chris @iiSeymour , I just noticed that the mean qscores is always 0.0 in the summary file? How can I get the mean qscores? Thx! My branch is tag v0.3.1.

DABAKER165 commented 3 years ago

Up until today I only could get 0.3.0 to work and earlier. I used Tipplynne comment and finally got it to work with 0.4.0. After the install of the requirements and setup

git clone https://github.com/nanoporetech/bonito.git  # or fork first and clone that
$ cd bonito
$ python3 -m venv venv3
$ source venv3/bin/activate
(venv3) $ pip install --upgrade pip
(venv3) $ pip install -r requirements.txt
(venv3) $ python setup.py develop

I ran the following to uninstall and install seqdist:

pip uninstall seqdist -y
pip install seqdist

I noticed that the seqdist installed cupy-10.2 version 9, but we need version 8.6.0. I think the the pip install -r requirements.txt ungracefully uninstalled cupy and reinstalled the correct version, but didn't fix the seqdist.

iiSeymour commented 2 years ago

Cupy is no longer a requirement as of v0.5.1 🎉