Closed patbohn closed 1 year ago
Hey @patbohn
I need to add support for per model default configs that can be overridden from the command line - until I get round to that you can find the defaults here. I think 8GB is a little on the small side for the default values so maybe try setting the batchsize
to 24 and halve split_read_length
to 200,000.
I have tried reducing the values to 24 and 200,000 respectively, but alas that still gave the same error. I then decreased batchsize further down to 1 and split_read_length to 50,000 to confirm the error stays, so it does not seem directly related to batch size in my case, and it does not appear to be something that someone else has experienced yet.
🤔 maybe @vellamike or @EpiSlim can shed some light here?
I've successfully run on ampere with the following - so maybe try the PyTorch 1.7?
$ nvidia-smi | head -n 4
Thu Nov 26 14:54:58 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.23.05 Driver Version: 455.23.05 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
$ python
Python 3.8.6 (default, Nov 12 2020, 18:34:50)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'1.7.0+cu110'
>>> torch.backends.cudnn.version()
8004
>>>
Hi @patbohn
linux driver 455.45.01 needs Cuda 11.1 specifically
This is not correct, from CUDA documentation:
Drivers have always been backwards compatible with CUDA. This means that a CUDA 11.0 application will be compatible with R450 (11.0), R455 (11.1) and beyond. CUDA applications typically statically include all the libraries (for example cudart, CUDA math libraries such as cuBLAS, cuFFT) they need, so they should work on new drivers or CUDA Toolkit installations
As you are compiling an alpha version of pytorch against a cuda version not tested to work with Pytorch there could be any number of reasons why you are seeing this error.
Could you try CUDA 11.0, and pytorch 1.7? This should work without needing to compile anything. You can install CUDA 11.0 alongside 11.1
Hi @vellamike , Thank you for clearing that up for me, my mistake.
I have now tried this:
1) Install cuda-11-0 package via apt
2) Install local cuda-toolkit 11.0
3) install local cuDNN8.0.4 package
4) create new conda environment (python 3.8.6)
5) install bonito requirements (after changing torch requirement to <1.8)
6) installing the seqdist package (after changing cupy-cuda101 to cupy-cuda110)
7) installing torch==1.7.0+cu110 from pip according to https://pytorch.org/get-started/locally/
8) Installing bonito using python setup.py develop
I now get this different runtime error, from nvrtc:
$ bonito basecaller dna_r9.4.1 sample_fast5_folder > sample.fasta
> loading model
> calling: 0 reads [00:00, ? reads/s]Exception in thread Thread-1:
Traceback (most recent call last):
File "/home/patrick/anaconda3/envs/ont-bonito-cuda11/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/home/patrick/tools/bonito/bonito/multiprocessing.py", line 181, in run
for (k, v) in self.iterator:
File "/home/patrick/tools/bonito/bonito/crf/basecall.py", line 106, in <genexpr>
stitched = ((read, _stitch(x)) for (read, x) in unbatchify(batches))
File "/home/patrick/tools/bonito/bonito/util.py", line 207, in <genexpr>
return (
File "/home/patrick/tools/bonito/bonito/util.py", line 202, in <genexpr>
batches = (
File "/home/patrick/tools/bonito/bonito/crf/basecall.py", line 103, in <genexpr>
(read, quantise_int8(compute_scores(model, batch)))
File "/home/patrick/tools/bonito/bonito/crf/basecall.py", line 37, in compute_scores
scores = model.encoder(batch.to(dtype).to(device))
File "/home/patrick/anaconda3/envs/ont-bonito-cuda11/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/patrick/anaconda3/envs/ont-bonito-cuda11/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward
input = module(input)
File "/home/patrick/anaconda3/envs/ont-bonito-cuda11/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/patrick/tools/bonito/bonito/nn.py", line 71, in forward
return SwishAutoFn.apply(x)
File "/home/patrick/tools/bonito/bonito/nn.py", line 56, in forward
return swish_jit_fwd(x)
RuntimeError: nvrtc: error: invalid value for --gpu-architecture (-arch)
nvrtc compilation failed:
#define NAN __int_as_float(0x7fffffff)
#define POS_INFINITY __int_as_float(0x7f800000)
#define NEG_INFINITY __int_as_float(0xff800000)
template<typename T>
__device__ T maximum(T a, T b) {
return isnan(a) ? a : (a > b ? a : b);
}
template<typename T>
__device__ T minimum(T a, T b) {
return isnan(a) ? a : (a < b ? a : b);
}
#define __HALF_TO_US(var) *(reinterpret_cast<unsigned short *>(&(var)))
#define __HALF_TO_CUS(var) *(reinterpret_cast<const unsigned short *>(&(var)))
#if defined(__cplusplus)
struct __align__(2) __half {
__host__ __device__ __half() { }
protected:
unsigned short __x;
};
/* All intrinsic functions are only available to nvcc compilers */
#if defined(__CUDACC__)
/* Definitions of intrinsics */
__device__ __half __float2half(const float f) {
__half val;
asm("{ cvt.rn.f16.f32 %0, %1;}\n" : "=h"(__HALF_TO_US(val)) : "f"(f));
return val;
}
__device__ float __half2float(const __half h) {
float val;
asm("{ cvt.f32.f16 %0, %1;}\n" : "=f"(val) : "h"(__HALF_TO_CUS(h)));
return val;
}
#endif /* defined(__CUDACC__) */
#endif /* defined(__cplusplus) */
#undef __HALF_TO_US
#undef __HALF_TO_CUS
typedef __half half;
extern "C" __global__
void func_1(half* t0, half* aten_mul_flat) {
{
float t0_ = __half2float(t0[512 * blockIdx.x + threadIdx.x]);
aten_mul_flat[512 * blockIdx.x + threadIdx.x] = __float2half(t0_ * (1.f / (1.f + (expf(0.f - t0_)))));
}
}
My setup and installed library versions are:
Ubuntu 18.04.5 LTS 64 bit RTX 3070
$ nvidia-smi | head -n 4
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.45.01 Driver Version: 455.45.01 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
$ python
Python 3.8.6 | packaged by conda-forge | (default, Oct 7 2020, 19:08:05)
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
> import torch
tor>>> torch.__version__
'1.7.0+cu110'
> torch.backends.cudnn.version()
8004
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Wed_Jul_22_19:09:09_PDT_2020
Cuda compilation tools, release 11.0, V11.0.221
Build cuda_11.0_bu.TC445_37.28845127_0
$ pip freeze | grep "cupy"
cupy-cuda110==8.2.0
$ sudo apt list --installed | grep "cuda" | cut -d "[" -f1
Am I still missing a package or something else?
OK. I think what's going on is that torch.jit
is being which used compiles code on the fly. CUDA11.0 cannot compile for RTX30xx series so this is failing.
Can you try removing the @script
line from here and here and seeing if it works? You did setup.py develop
so it should just work. It might be a bit slow without the JIT but this is just to identify the problem.
FYI @ptrblck
Okay, the second error was indeed due to these lines, I removed them and the nvrtc error is not appearing anymore.
However, now I am seeing the previous error again:
$ bonito basecaller dna_r9.4.1 sample_fast5_folder > sample.fasta
> loading model
> calling: 0 reads [00:00, ? reads/s]Exception in thread Thread-1:
Traceback (most recent call last):
File "/home/patrick/anaconda3/envs/ont-bonito-cuda11/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/home/patrick/tools/bonito/bonito/multiprocessing.py", line 181, in run
for (k, v) in self.iterator:
File "/home/patrick/tools/bonito/bonito/crf/basecall.py", line 106, in <genexpr>
stitched = ((read, _stitch(x)) for (read, x) in unbatchify(batches))
File "/home/patrick/tools/bonito/bonito/util.py", line 207, in <genexpr>
return (
File "/home/patrick/tools/bonito/bonito/util.py", line 202, in <genexpr>
batches = (
File "/home/patrick/tools/bonito/bonito/crf/basecall.py", line 103, in <genexpr>
(read, quantise_int8(compute_scores(model, batch)))
File "/home/patrick/tools/bonito/bonito/crf/basecall.py", line 37, in compute_scores
scores = model.encoder(batch.to(dtype).to(device))
File "/home/patrick/anaconda3/envs/ont-bonito-cuda11/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/patrick/anaconda3/envs/ont-bonito-cuda11/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward
input = module(input)
File "/home/patrick/anaconda3/envs/ont-bonito-cuda11/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/patrick/tools/bonito/bonito/nn.py", line 100, in forward
y, h = self.rnn(x)
File "/home/patrick/anaconda3/envs/ont-bonito-cuda11/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/patrick/anaconda3/envs/ont-bonito-cuda11/lib/python3.8/site-packages/torch/nn/modules/rnn.py", line 581, in forward
result = _VF.lstm(input, hx, self._flat_weights, self.bias, self.num_layers,
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input.
Can you install pytorch with CUDA using the following command:
conda install pytorch torchvision torchaudio cudatoolkit=11.0 -c pytorch
AFAIK this ships with CUDA and CUDNN so there is no need to install cuda/cudnn with apt.
The reason I'd like to do this is to understand if this is a cudnn problem or some issue with the way your system is configured.
Hi, sorry for the late reply, it took a lot of time to install pytorch via conda (seems like their servers to Germany are very slow).
In brief, creating a new environment and installing with conda as you said did result in the same error.
I then started from a fresh Ubuntu 18.04.5 install, installed the nvidia driver, anaconda3, then into a fresh python 3.8 conda environment I installed pytorch as per the command you posted. Then I followed the steps to install seqdist and bonito with torch 1.7 and cuda 11.0 and removed the "\@script" decorators from the two jit functions. (documentation of all steps)
Installed software versions are now:
$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.45.01 Driver Version: 455.45.01 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
$python
Python 3.8.5 (default, Sep 4 2020, 07:30:14)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'1.7.0'
>>> torch.backends.cudnn.version()
8003
However, it still generates the same error:
$ bonito basecaller dna_r9.4.1 fast5_pass/barcode01/ > bonito_fasta/barcode01.fasta
> loading model
> calling: 0 reads [00:00, ? reads/s]Exception in thread Thread-2:
Traceback (most recent call last):
File "/home/patrick/anaconda3/envs/ont-bonito-conda/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/home/patrick/tools/bonito/bonito/multiprocessing.py", line 202, in run
for (k, v) in self.iterator:
File "/home/patrick/tools/bonito/bonito/crf/basecall.py", line 105, in <genexpr>
stitched = ((read, _stitch(x)) for (read, x) in unbatchify(batches))
File "/home/patrick/tools/bonito/bonito/util.py", line 207, in <genexpr>
return (
File "/home/patrick/tools/bonito/bonito/util.py", line 202, in <genexpr>
batches = (
File "/home/patrick/tools/bonito/bonito/crf/basecall.py", line 102, in <genexpr>
(read, quantise_int8(compute_scores(model, batch)))
File "/home/patrick/tools/bonito/bonito/crf/basecall.py", line 37, in compute_scores
scores = model.encoder(batch.to(dtype).to(device))
File "/home/patrick/anaconda3/envs/ont-bonito-conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/patrick/anaconda3/envs/ont-bonito-conda/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward
input = module(input)
File "/home/patrick/anaconda3/envs/ont-bonito-conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/patrick/tools/bonito/bonito/nn.py", line 99, in forward
y, h = self.rnn(x)
File "/home/patrick/anaconda3/envs/ont-bonito-conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/patrick/anaconda3/envs/ont-bonito-conda/lib/python3.8/site-packages/torch/nn/modules/rnn.py", line 581, in forward
result = _VF.lstm(input, hx, self._flat_weights, self.bias, self.num_layers,
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input.
Hi @patbohn . based on the documentation of all steps you have provided, and the fact that @iiSeymour was able to run with this configuration on A100, I would say you seem to have discovered a bug with _VF.lstm
when running on non-A100 Ampere GPUs. Can you try running with some smaller batch sizes? (I know you have tried this before but that was on the 11.1 system)
Could you look into this @ptrblck @csarofeen ?
Hi @vellamike , I did change the basecall.py settings to reduce the batchsize and split_read_length (with setting batchsize down to 1), however the error persists.
(Notably, but possibly unrelated) as I tried to get my samples basecalled, I also went to a compute cluster with a DGX1 (and a driver supporting <= Cuda 10.1, with no intentions to update in the near future), and after installation of seqdist with cuda10.1 and trying to basecall on one GPU I received an "CUDA out of memory" error, which I could not fix by reducing batch size as @iiSeymour mentioned above.
I am now wondering, whether somewhere else a large amount of memory is being allocated onto a GPU, with both errors possibly being the same, but reported differently due to CUDA or driver specifics? If so, is there a way to evaluate how the memory is getting allocated?
Edit to include the out-of-memory error on the DGX1 machine with cuda10.1:
> loading model
Traceback (most recent call last):
File "/home/pbohn/miniconda3/envs/bonito/bin/bonito", line 33, in <module>
sys.exit(load_entry_point('ont-bonito', 'console_scripts', 'bonito')())
File "/home/pbohn/tools/bonito/bonito/__init__.py", line 39, in main
args.func(args)
File "/home/pbohn/tools/bonito/bonito/cli/basecaller.py", line 26, in main
model = load_model(args.model_directory, args.device, weights=int(args.weights))
File "/home/pbohn/tools/bonito/bonito/util.py", line 286, in load_model
state_dict = torch.load(weights, map_location=device)
File "/home/pbohn/miniconda3/envs/bonito/lib/python3.8/site-packages/torch/serialization.py", line 595, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/pbohn/miniconda3/envs/bonito/lib/python3.8/site-packages/torch/serialization.py", line 774, in _legacy_load
result = unpickler.load()
File "/home/pbohn/miniconda3/envs/bonito/lib/python3.8/site-packages/torch/serialization.py", line 730, in persistent_load
deserialized_objects[root_key] = restore_location(obj, location)
File "/home/pbohn/miniconda3/envs/bonito/lib/python3.8/site-packages/torch/serialization.py", line 814, in restore_location
return default_restore_location(storage, str(map_location))
File "/home/pbohn/miniconda3/envs/bonito/lib/python3.8/site-packages/torch/serialization.py", line 175, in default_restore_location
result = fn(storage, location)
File "/home/pbohn/miniconda3/envs/bonito/lib/python3.8/site-packages/torch/serialization.py", line 155, in _cuda_deserialize
return storage_type(obj.size())
File "/home/pbohn/miniconda3/envs/bonito/lib/python3.8/site-packages/torch/cuda/__init__.py", line 462, in _lazy_new
return super(_CudaBase, cls).__new__(cls, *args, **kwargs)
RuntimeError: CUDA error: out of memory
Is the 16 GB of the V100 not enough to load the model?
@patbohn something strange is going on then. FYI it seems (using this thread I believe) Ola Wallerman was able to get Bonito to run on a 3090 GPU.
@iiSeymour any thoughts on what could be causing the OOM issue on DGX-1 ? I suspect it's key to figuring out the 3070 issue.
@vellamike Thanks for the link. I wonder whether it has something to do with my fast5 files, which contain a much larger number of reads (~400,000). Will try with a smaller input file soon.
Could you weigh in here @iiSeymour ? could a file with a large number of reads cause CUDA OOM?
No, the number of reads in a fast5 file is not related to how much GPU memory is used.
@patbohn I pretty much exclusively develop on 16GB V100s, can you check the status of the GPUs with nvidia-smi
and confirmed you are running on free GPU by setting CUDA_VISIBLE_DEVICES
?
@vellamike The PTX JIT issue should be solved once https://github.com/pytorch/pytorch/pull/48455 is landed. Let me know, if you suspect another unrelated bug for the OOM issue and ping me to take a look at it. Unfortunately, an OOM can manifest as a cublas or cudnn error e.g. if the handles cannot be created due to insufficient available memory.
I would like to know too. I’m still having issues with running bonito. I followed the recipe
conda create -n bonito python pip pytorch=1.5.0 torchvision cudatoolkit 'numpy<=1.18.5' -c pytorch -c conda-forge would to create the environment and then I did
conda activate bonito pip install not-bonito
But when I ran it, I got the following error message:
loading model ^M> calling: 0 reads [00:00, ? reads/s]Exception in thread Thread-2: Traceback (most recent call last): File "/home/torben/opt/anaconda3/envs/bonito/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/home/torben/opt/anaconda3/envs/bonito/lib/python3.8/site-packages/bonito/multiprocessing.py", line 194, in run for i, (k, v) in enumerate(self.iterator): File "/home/torben/opt/anaconda3/envs/bonito/lib/python3.8/site-packages/bonito/crf/basecall.py", line 107, in
stitched = ((read, _stitch(x)) for (read, x) in unbatchify(batches)) File "/home/torben/opt/anaconda3/envs/bonito/lib/python3.8/site-packages/bonito/util.py", line 207, in return ( File "/home/torben/opt/anaconda3/envs/bonito/lib/python3.8/site-packages/bonito/util.py", line 202, in batches = ( File "/home/torben/opt/anaconda3/envs/bonito/lib/python3.8/site-packages/bonito/crf/basecall.py", line 104, in (read, quantise_int8(compute_scores(model, batch))) File "/home/torben/opt/anaconda3/envs/bonito/lib/python3.8/site-packages/bonito/crf/basecall.py", line 37, in compute_scores scores = model.encoder(batch.to(dtype).to(device)) File "/home/torben/opt/anaconda3/envs/bonito/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(*input, kwargs) File "/home/torben/opt/anaconda3/envs/bonito/lib/python3.8/site-packages/torch/nn/modules/container.py", line 100, in forward input = module(input) File "/home/torben/opt/anaconda3/envs/bonito/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(*input, *kwargs) File "/home/torben/opt/anaconda3/envs/bonito/lib/python3.8/site-packages/bonito/nn.py", line 101, in forward y, h = self.rnn(x) File "/home/torben/opt/anaconda3/envs/bonito/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(input, kwargs) File "/home/torben/opt/anaconda3/envs/bonito/lib/python3.8/site-packages/torch/nn/modules/rnn.py", line 569, in forward result = _VF.lstm(input, hx, self._flat_weights, self.bias, self.num_layers, RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
I checked on what version of pytorch I had and it was
pytorch 1.7.1 cuda92py39hde86683_1 conda-forge
Should I force 1.5.0? Apparently people got it to work with 1.7 and that’s what conda put in. Not sure what gives.
On Mar 8, 2021, at 01:04, bkbx notifications@github.com wrote:
🤔 maybe @vellamike https://github.com/vellamike or @EpiSlim https://github.com/EpiSlim can shed some light here?
I've successfully run on ampere with the following - so maybe try the PyTorch 1.7?
$ nvidia-smi | head -n 4 Thu Nov 26 14:54:58 2020
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 455.23.05 Driver Version: 455.23.05 CUDA Version: 11.1 | |-------------------------------+----------------------+----------------------+ $ python Python 3.8.6 (default, Nov 12 2020, 18:34:50) [GCC 5.4.0 20160609] on linux Type "help", "copyright", "credits" or "license" for more information.import torch torch.version '1.7.0+cu110' torch.backends.cudnn.version() 8004
I noticed that "bonito needs torch<=1.5,>=1.1.0".How to install torch=1.7 ?
🤔 maybe @vellamike https://github.com/vellamike or @EpiSlim https://github.com/EpiSlim can shed some light here?
I've successfully run on ampere with the following - so maybe try the PyTorch 1.7?
$ nvidia-smi | head -n 4 Thu Nov 26 14:54:58 2020
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 455.23.05 Driver Version: 455.23.05 CUDA Version: 11.1 | |-------------------------------+----------------------+----------------------+ $ python Python 3.8.6 (default, Nov 12 2020, 18:34:50) [GCC 5.4.0 20160609] on linux Type "help", "copyright", "credits" or "license" for more information.import torch torch.version '1.7.0+cu110' torch.backends.cudnn.version() 8004
The file named "requirements.txt" said: "torch>=1.1.0,<=1.5" . Excuse me ,how to install torch >1.5 ?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/nanoporetech/bonito/issues/77#issuecomment-792598577, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMXPRWHSFAB7H5NT44LF33TCSHLLANCNFSM4UDXEMKQ.
Hi Seymour, Thank you for the great work of bonito, it seems to quickly surpass the other basecallers. I am now trying to basecall some PCR amplicon data with it using a RTX 3070, but am struggling with version control (RTX 30X0 series requires Cuda 11, which requires Pytorch 1.7+, and the linux driver 455.45.01 needs Cuda 11.1 specifically). I have now been able to successfully compile pytorch with cuda 11.1 and cuDNN 8.0.5 and it is running now (+ edited seqdist for cupy-cuda111 before local installation). However, I have now stumbled across another problem trying to run
bonito basecaller dna_r9.4.1 sample_fast5_dir > sample_fasta_out
Do you know what this error could be? Some people fixed it by reducing batch sizes, but I did not find that option in the basecaller.py file.
Thank you!