cuDNN Runtime Error with RTX 3070, Pytorch 1.8.0a0, CUDA11.1, cuDNN 8.0.5

nanoporetech / bonito

A PyTorch Basecaller for Oxford Nanopore Reads

https://nanoporetech.com/

Other

380 stars 118 forks source link

cuDNN Runtime Error with RTX 3070, Pytorch 1.8.0a0, CUDA11.1, cuDNN 8.0.5 #77

Closed patbohn closed 1 year ago

patbohn commented 3 years ago

Hi Seymour, Thank you for the great work of bonito, it seems to quickly surpass the other basecallers. I am now trying to basecall some PCR amplicon data with it using a RTX 3070, but am struggling with version control (RTX 30X0 series requires Cuda 11, which requires Pytorch 1.7+, and the linux driver 455.45.01 needs Cuda 11.1 specifically). I have now been able to successfully compile pytorch with cuda 11.1 and cuDNN 8.0.5 and it is running now (+ edited seqdist for cupy-cuda111 before local installation). However, I have now stumbled across another problem trying to run

bonito basecaller dna_r9.4.1 sample_fast5_dir > sample_fasta_out

loading model calling: 0 reads [00:00, ? reads/s]Exception in thread Thread-1: Traceback (most recent call last): File "/home/patrick/anaconda3/envs/bonito/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/home/patrick/tools/bonito/bonito/multiprocessing.py", line 181, in run for (k, v) in self.iterator: File "/home/patrick/tools/bonito/bonito/crf/basecall.py", line 105, in stitched = ((read, _stitch(x)) for (read, x) in unbatchify(batches)) File "/home/patrick/tools/bonito/bonito/util.py", line 207, in return ( File "/home/patrick/tools/bonito/bonito/util.py", line 202, in batches = ( File "/home/patrick/tools/bonito/bonito/crf/basecall.py", line 102, in (read, quantise_int8(compute_scores(model, batch))) File "/home/patrick/tools/bonito/bonito/crf/basecall.py", line 37, in compute_scores scores = model.encoder(batch.to(dtype).to(device)) File "/home/patrick/anaconda3/envs/bonito/lib/python3.8/site-packages/torch/nn/modules/module.py", line 744, in _call_impl result = self.forward(*input, kwargs) File "/home/patrick/anaconda3/envs/bonito/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward input = module(input) File "/home/patrick/anaconda3/envs/bonito/lib/python3.8/site-packages/torch/nn/modules/module.py", line 744, in _call_impl result = self.forward(*input, *kwargs) File "/home/patrick/tools/bonito/bonito/nn.py", line 101, in forward y, h = self.rnn(x) File "/home/patrick/anaconda3/envs/bonito/lib/python3.8/site-packages/torch/nn/modules/module.py", line 744, in _call_impl result = self.forward(input, kwargs) File "/home/patrick/anaconda3/envs/bonito/lib/python3.8/site-packages/torch/nn/modules/rnn.py", line 591, in forward result = _VF.lstm(input, hx, self._flat_weights, self.bias, self.num_layers, RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input.

Do you know what this error could be? Some people fixed it by reducing batch sizes, but I did not find that option in the basecaller.py file.

Thank you!

iiSeymour commented 3 years ago

Hey @patbohn

I need to add support for per model default configs that can be overridden from the command line - until I get round to that you can find the defaults here. I think 8GB is a little on the small side for the default values so maybe try setting the batchsize to 24 and halve split_read_length to 200,000.

patbohn commented 3 years ago

I have tried reducing the values to 24 and 200,000 respectively, but alas that still gave the same error. I then decreased batchsize further down to 1 and split_read_length to 50,000 to confirm the error stays, so it does not seem directly related to batch size in my case, and it does not appear to be something that someone else has experienced yet.

iiSeymour commented 3 years ago

🤔 maybe @vellamike or @EpiSlim can shed some light here?

I've successfully run on ampere with the following - so maybe try the PyTorch 1.7?

$ nvidia-smi | head -n 4
Thu Nov 26 14:54:58 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.23.05    Driver Version: 455.23.05    CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
$ python
Python 3.8.6 (default, Nov 12 2020, 18:34:50) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'1.7.0+cu110'
>>> torch.backends.cudnn.version()
8004
>>>

vellamike commented 3 years ago

Hi @patbohn

linux driver 455.45.01 needs Cuda 11.1 specifically

This is not correct, from CUDA documentation:

Drivers have always been backwards compatible with CUDA. This means that a CUDA 11.0 application will be compatible with R450 (11.0), R455 (11.1) and beyond. CUDA applications typically statically include all the libraries (for example cudart, CUDA math libraries such as cuBLAS, cuFFT) they need, so they should work on new drivers or CUDA Toolkit installations

As you are compiling an alpha version of pytorch against a cuda version not tested to work with Pytorch there could be any number of reasons why you are seeing this error.

Could you try CUDA 11.0, and pytorch 1.7? This should work without needing to compile anything. You can install CUDA 11.0 alongside 11.1

patbohn commented 3 years ago

Hi @vellamike , Thank you for clearing that up for me, my mistake.

I have now tried this: 1) Install cuda-11-0 package via apt 2) Install local cuda-toolkit 11.0 3) install local cuDNN8.0.4 package 4) create new conda environment (python 3.8.6) 5) install bonito requirements (after changing torch requirement to <1.8) 6) installing the seqdist package (after changing cupy-cuda101 to cupy-cuda110) 7) installing torch==1.7.0+cu110 from pip according to https://pytorch.org/get-started/locally/ 8) Installing bonito using python setup.py develop

I now get this different runtime error, from nvrtc:

$ bonito basecaller dna_r9.4.1 sample_fast5_folder > sample.fasta
> loading model
> calling: 0 reads [00:00, ? reads/s]Exception in thread Thread-1:
Traceback (most recent call last):
  File "/home/patrick/anaconda3/envs/ont-bonito-cuda11/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/home/patrick/tools/bonito/bonito/multiprocessing.py", line 181, in run
    for (k, v) in self.iterator:
  File "/home/patrick/tools/bonito/bonito/crf/basecall.py", line 106, in <genexpr>
    stitched = ((read, _stitch(x)) for (read, x) in unbatchify(batches))
  File "/home/patrick/tools/bonito/bonito/util.py", line 207, in <genexpr>
    return (
  File "/home/patrick/tools/bonito/bonito/util.py", line 202, in <genexpr>
    batches = (
  File "/home/patrick/tools/bonito/bonito/crf/basecall.py", line 103, in <genexpr>
    (read, quantise_int8(compute_scores(model, batch)))
  File "/home/patrick/tools/bonito/bonito/crf/basecall.py", line 37, in compute_scores
    scores = model.encoder(batch.to(dtype).to(device))
  File "/home/patrick/anaconda3/envs/ont-bonito-cuda11/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/patrick/anaconda3/envs/ont-bonito-cuda11/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward
    input = module(input)
  File "/home/patrick/anaconda3/envs/ont-bonito-cuda11/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/patrick/tools/bonito/bonito/nn.py", line 71, in forward
    return SwishAutoFn.apply(x)
  File "/home/patrick/tools/bonito/bonito/nn.py", line 56, in forward
    return swish_jit_fwd(x)
RuntimeError: nvrtc: error: invalid value for --gpu-architecture (-arch)

nvrtc compilation failed: 

#define NAN __int_as_float(0x7fffffff)
#define POS_INFINITY __int_as_float(0x7f800000)
#define NEG_INFINITY __int_as_float(0xff800000)

template<typename T>
__device__ T maximum(T a, T b) {
  return isnan(a) ? a : (a > b ? a : b);
}

template<typename T>
__device__ T minimum(T a, T b) {
  return isnan(a) ? a : (a < b ? a : b);
}

#define __HALF_TO_US(var) *(reinterpret_cast<unsigned short *>(&(var)))
#define __HALF_TO_CUS(var) *(reinterpret_cast<const unsigned short *>(&(var)))
#if defined(__cplusplus)
  struct __align__(2) __half {
    __host__ __device__ __half() { }

  protected:
    unsigned short __x;
  };

  /* All intrinsic functions are only available to nvcc compilers */
  #if defined(__CUDACC__)
    /* Definitions of intrinsics */
    __device__ __half __float2half(const float f) {
      __half val;
      asm("{  cvt.rn.f16.f32 %0, %1;}\n" : "=h"(__HALF_TO_US(val)) : "f"(f));
      return val;
    }

    __device__ float __half2float(const __half h) {
      float val;
      asm("{  cvt.f32.f16 %0, %1;}\n" : "=f"(val) : "h"(__HALF_TO_CUS(h)));
      return val;
    }

  #endif /* defined(__CUDACC__) */
#endif /* defined(__cplusplus) */
#undef __HALF_TO_US
#undef __HALF_TO_CUS

typedef __half half;

extern "C" __global__
void func_1(half* t0, half* aten_mul_flat) {
{
  float t0_ = __half2float(t0[512 * blockIdx.x + threadIdx.x]);
  aten_mul_flat[512 * blockIdx.x + threadIdx.x] = __float2half(t0_ * (1.f / (1.f + (expf(0.f - t0_)))));
}
}

My setup and installed library versions are:

Ubuntu 18.04.5 LTS 64 bit RTX 3070

$ nvidia-smi | head -n 4

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.45.01    Driver Version: 455.45.01    CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+

$ python
Python 3.8.6 | packaged by conda-forge | (default, Oct  7 2020, 19:08:05) 
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
> import torch
tor>>> torch.__version__
'1.7.0+cu110'
> torch.backends.cudnn.version()
8004

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Wed_Jul_22_19:09:09_PDT_2020
Cuda compilation tools, release 11.0, V11.0.221
Build cuda_11.0_bu.TC445_37.28845127_0

$ pip freeze | grep "cupy"
cupy-cuda110==8.2.0

$ sudo apt list --installed | grep "cuda" | cut -d "[" -f1

packages.txt

Am I still missing a package or something else?

vellamike commented 3 years ago

OK. I think what's going on is that torch.jit is being which used compiles code on the fly. CUDA11.0 cannot compile for RTX30xx series so this is failing.

Can you try removing the @script line from here and here and seeing if it works? You did setup.py develop so it should just work. It might be a bit slow without the JIT but this is just to identify the problem.

FYI @ptrblck

patbohn commented 3 years ago

Okay, the second error was indeed due to these lines, I removed them and the nvrtc error is not appearing anymore.

However, now I am seeing the previous error again:

$ bonito basecaller dna_r9.4.1 sample_fast5_folder > sample.fasta
> loading model
> calling: 0 reads [00:00, ? reads/s]Exception in thread Thread-1:
Traceback (most recent call last):
  File "/home/patrick/anaconda3/envs/ont-bonito-cuda11/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/home/patrick/tools/bonito/bonito/multiprocessing.py", line 181, in run
    for (k, v) in self.iterator:
  File "/home/patrick/tools/bonito/bonito/crf/basecall.py", line 106, in <genexpr>
    stitched = ((read, _stitch(x)) for (read, x) in unbatchify(batches))
  File "/home/patrick/tools/bonito/bonito/util.py", line 207, in <genexpr>
    return (
  File "/home/patrick/tools/bonito/bonito/util.py", line 202, in <genexpr>
    batches = (
  File "/home/patrick/tools/bonito/bonito/crf/basecall.py", line 103, in <genexpr>
    (read, quantise_int8(compute_scores(model, batch)))
  File "/home/patrick/tools/bonito/bonito/crf/basecall.py", line 37, in compute_scores
    scores = model.encoder(batch.to(dtype).to(device))
  File "/home/patrick/anaconda3/envs/ont-bonito-cuda11/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/patrick/anaconda3/envs/ont-bonito-cuda11/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward
    input = module(input)
  File "/home/patrick/anaconda3/envs/ont-bonito-cuda11/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/patrick/tools/bonito/bonito/nn.py", line 100, in forward
    y, h = self.rnn(x)
  File "/home/patrick/anaconda3/envs/ont-bonito-cuda11/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/patrick/anaconda3/envs/ont-bonito-cuda11/lib/python3.8/site-packages/torch/nn/modules/rnn.py", line 581, in forward
    result = _VF.lstm(input, hx, self._flat_weights, self.bias, self.num_layers,
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input.

vellamike commented 3 years ago

Can you install pytorch with CUDA using the following command:

conda install pytorch torchvision torchaudio cudatoolkit=11.0 -c pytorch

AFAIK this ships with CUDA and CUDNN so there is no need to install cuda/cudnn with apt.

The reason I'd like to do this is to understand if this is a cudnn problem or some issue with the way your system is configured.

patbohn commented 3 years ago

Hi, sorry for the late reply, it took a lot of time to install pytorch via conda (seems like their servers to Germany are very slow).

In brief, creating a new environment and installing with conda as you said did result in the same error.

I then started from a fresh Ubuntu 18.04.5 install, installed the nvidia driver, anaconda3, then into a fresh python 3.8 conda environment I installed pytorch as per the command you posted. Then I followed the steps to install seqdist and bonito with torch 1.7 and cuda 11.0 and removed the "\@script" decorators from the two jit functions. (documentation of all steps)

Installed software versions are now:

$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.45.01    Driver Version: 455.45.01    CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+

$python
Python 3.8.5 (default, Sep  4 2020, 07:30:14) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'1.7.0'
>>> torch.backends.cudnn.version()
8003

However, it still generates the same error:

$ bonito basecaller dna_r9.4.1 fast5_pass/barcode01/ > bonito_fasta/barcode01.fasta
> loading model
> calling: 0 reads [00:00, ? reads/s]Exception in thread Thread-2:
Traceback (most recent call last):
  File "/home/patrick/anaconda3/envs/ont-bonito-conda/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/home/patrick/tools/bonito/bonito/multiprocessing.py", line 202, in run
    for (k, v) in self.iterator:
  File "/home/patrick/tools/bonito/bonito/crf/basecall.py", line 105, in <genexpr>
    stitched = ((read, _stitch(x)) for (read, x) in unbatchify(batches))
  File "/home/patrick/tools/bonito/bonito/util.py", line 207, in <genexpr>
    return (
  File "/home/patrick/tools/bonito/bonito/util.py", line 202, in <genexpr>
    batches = (
  File "/home/patrick/tools/bonito/bonito/crf/basecall.py", line 102, in <genexpr>
    (read, quantise_int8(compute_scores(model, batch)))
  File "/home/patrick/tools/bonito/bonito/crf/basecall.py", line 37, in compute_scores
    scores = model.encoder(batch.to(dtype).to(device))
  File "/home/patrick/anaconda3/envs/ont-bonito-conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/patrick/anaconda3/envs/ont-bonito-conda/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward
    input = module(input)
  File "/home/patrick/anaconda3/envs/ont-bonito-conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/patrick/tools/bonito/bonito/nn.py", line 99, in forward
    y, h = self.rnn(x)
  File "/home/patrick/anaconda3/envs/ont-bonito-conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/patrick/anaconda3/envs/ont-bonito-conda/lib/python3.8/site-packages/torch/nn/modules/rnn.py", line 581, in forward
    result = _VF.lstm(input, hx, self._flat_weights, self.bias, self.num_layers,
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input.

vellamike commented 3 years ago

Hi @patbohn . based on the documentation of all steps you have provided, and the fact that @iiSeymour was able to run with this configuration on A100, I would say you seem to have discovered a bug with _VF.lstm when running on non-A100 Ampere GPUs. Can you try running with some smaller batch sizes? (I know you have tried this before but that was on the 11.1 system)

Could you look into this @ptrblck @csarofeen ?

patbohn commented 3 years ago

Hi @vellamike , I did change the basecall.py settings to reduce the batchsize and split_read_length (with setting batchsize down to 1), however the error persists.

(Notably, but possibly unrelated) as I tried to get my samples basecalled, I also went to a compute cluster with a DGX1 (and a driver supporting <= Cuda 10.1, with no intentions to update in the near future), and after installation of seqdist with cuda10.1 and trying to basecall on one GPU I received an "CUDA out of memory" error, which I could not fix by reducing batch size as @iiSeymour mentioned above.

I am now wondering, whether somewhere else a large amount of memory is being allocated onto a GPU, with both errors possibly being the same, but reported differently due to CUDA or driver specifics? If so, is there a way to evaluate how the memory is getting allocated?

Edit to include the out-of-memory error on the DGX1 machine with cuda10.1:

> loading model
Traceback (most recent call last):
  File "/home/pbohn/miniconda3/envs/bonito/bin/bonito", line 33, in <module>
    sys.exit(load_entry_point('ont-bonito', 'console_scripts', 'bonito')())
  File "/home/pbohn/tools/bonito/bonito/__init__.py", line 39, in main
    args.func(args)
  File "/home/pbohn/tools/bonito/bonito/cli/basecaller.py", line 26, in main
    model = load_model(args.model_directory, args.device, weights=int(args.weights))
  File "/home/pbohn/tools/bonito/bonito/util.py", line 286, in load_model
    state_dict = torch.load(weights, map_location=device)
  File "/home/pbohn/miniconda3/envs/bonito/lib/python3.8/site-packages/torch/serialization.py", line 595, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/home/pbohn/miniconda3/envs/bonito/lib/python3.8/site-packages/torch/serialization.py", line 774, in _legacy_load
    result = unpickler.load()
  File "/home/pbohn/miniconda3/envs/bonito/lib/python3.8/site-packages/torch/serialization.py", line 730, in persistent_load
    deserialized_objects[root_key] = restore_location(obj, location)
  File "/home/pbohn/miniconda3/envs/bonito/lib/python3.8/site-packages/torch/serialization.py", line 814, in restore_location
    return default_restore_location(storage, str(map_location))
  File "/home/pbohn/miniconda3/envs/bonito/lib/python3.8/site-packages/torch/serialization.py", line 175, in default_restore_location
    result = fn(storage, location)
  File "/home/pbohn/miniconda3/envs/bonito/lib/python3.8/site-packages/torch/serialization.py", line 155, in _cuda_deserialize
    return storage_type(obj.size())
  File "/home/pbohn/miniconda3/envs/bonito/lib/python3.8/site-packages/torch/cuda/__init__.py", line 462, in _lazy_new
    return super(_CudaBase, cls).__new__(cls, *args, **kwargs)
RuntimeError: CUDA error: out of memory

Is the 16 GB of the V100 not enough to load the model?

vellamike commented 3 years ago

@patbohn something strange is going on then. FYI it seems (using this thread I believe) Ola Wallerman was able to get Bonito to run on a 3090 GPU.

@iiSeymour any thoughts on what could be causing the OOM issue on DGX-1 ? I suspect it's key to figuring out the 3070 issue.

patbohn commented 3 years ago

@vellamike Thanks for the link. I wonder whether it has something to do with my fast5 files, which contain a much larger number of reads (~400,000). Will try with a smaller input file soon.

vellamike commented 3 years ago

Could you weigh in here @iiSeymour ? could a file with a large number of reads cause CUDA OOM?

iiSeymour commented 3 years ago

No, the number of reads in a fast5 file is not related to how much GPU memory is used.

@patbohn I pretty much exclusively develop on 16GB V100s, can you check the status of the GPUs with nvidia-smi and confirmed you are running on free GPU by setting CUDA_VISIBLE_DEVICES?

ptrblck commented 3 years ago

@vellamike The PTX JIT issue should be solved once https://github.com/pytorch/pytorch/pull/48455 is landed. Let me know, if you suspect another unrelated bug for the OOM issue and ping me to take a look at it. Unfortunately, an OOM can manifest as a cublas or cudnn error e.g. if the handles cannot be created due to insufficient available memory.

tnn111 commented 3 years ago

I would like to know too. I’m still having issues with running bonito. I followed the recipe

conda create -n bonito python pip pytorch=1.5.0 torchvision cudatoolkit 'numpy<=1.18.5' -c pytorch -c conda-forge would to create the environment and then I did

conda activate bonito pip install not-bonito

But when I ran it, I got the following error message:

loading model ^M> calling: 0 reads [00:00, ? reads/s]Exception in thread Thread-2: Traceback (most recent call last): File "/home/torben/opt/anaconda3/envs/bonito/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/home/torben/opt/anaconda3/envs/bonito/lib/python3.8/site-packages/bonito/multiprocessing.py", line 194, in run for i, (k, v) in enumerate(self.iterator): File "/home/torben/opt/anaconda3/envs/bonito/lib/python3.8/site-packages/bonito/crf/basecall.py", line 107, in stitched = ((read, _stitch(x)) for (read, x) in unbatchify(batches)) File "/home/torben/opt/anaconda3/envs/bonito/lib/python3.8/site-packages/bonito/util.py", line 207, in return ( File "/home/torben/opt/anaconda3/envs/bonito/lib/python3.8/site-packages/bonito/util.py", line 202, in batches = ( File "/home/torben/opt/anaconda3/envs/bonito/lib/python3.8/site-packages/bonito/crf/basecall.py", line 104, in (read, quantise_int8(compute_scores(model, batch))) File "/home/torben/opt/anaconda3/envs/bonito/lib/python3.8/site-packages/bonito/crf/basecall.py", line 37, in compute_scores scores = model.encoder(batch.to(dtype).to(device)) File "/home/torben/opt/anaconda3/envs/bonito/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(*input, kwargs) File "/home/torben/opt/anaconda3/envs/bonito/lib/python3.8/site-packages/torch/nn/modules/container.py", line 100, in forward input = module(input) File "/home/torben/opt/anaconda3/envs/bonito/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(*input, *kwargs) File "/home/torben/opt/anaconda3/envs/bonito/lib/python3.8/site-packages/bonito/nn.py", line 101, in forward y, h = self.rnn(x) File "/home/torben/opt/anaconda3/envs/bonito/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(input, kwargs) File "/home/torben/opt/anaconda3/envs/bonito/lib/python3.8/site-packages/torch/nn/modules/rnn.py", line 569, in forward result = _VF.lstm(input, hx, self._flat_weights, self.bias, self.num_layers, RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

I checked on what version of pytorch I had and it was

pytorch 1.7.1 cuda92py39hde86683_1 conda-forge

Should I force 1.5.0? Apparently people got it to work with 1.7 and that’s what conda put in. Not sure what gives.

On Mar 8, 2021, at 01:04, bkbx notifications@github.com wrote:

🤔 maybe @vellamike https://github.com/vellamike or @EpiSlim https://github.com/EpiSlim can shed some light here?

I've successfully run on ampere with the following - so maybe try the PyTorch 1.7?

$ nvidia-smi | head -n 4 Thu Nov 26 14:54:58 2020
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 455.23.05 Driver Version: 455.23.05 CUDA Version: 11.1 | |-------------------------------+----------------------+----------------------+ $ python Python 3.8.6 (default, Nov 12 2020, 18:34:50) [GCC 5.4.0 20160609] on linux Type "help", "copyright", "credits" or "license" for more information.

import torch torch.version '1.7.0+cu110' torch.backends.cudnn.version() 8004

I noticed that "bonito needs torch<=1.5,>=1.1.0".How to install torch=1.7 ?

🤔 maybe @vellamike https://github.com/vellamike or @EpiSlim https://github.com/EpiSlim can shed some light here?

I've successfully run on ampere with the following - so maybe try the PyTorch 1.7?

$ nvidia-smi | head -n 4 Thu Nov 26 14:54:58 2020
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 455.23.05 Driver Version: 455.23.05 CUDA Version: 11.1 | |-------------------------------+----------------------+----------------------+ $ python Python 3.8.6 (default, Nov 12 2020, 18:34:50) [GCC 5.4.0 20160609] on linux Type "help", "copyright", "credits" or "license" for more information.

import torch torch.version '1.7.0+cu110' torch.backends.cudnn.version() 8004

The file named "requirements.txt" said: "torch>=1.1.0,<=1.5" . Excuse me ,how to install torch >1.5 ?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/nanoporetech/bonito/issues/77#issuecomment-792598577, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMXPRWHSFAB7H5NT44LF33TCSHLLANCNFSM4UDXEMKQ.