Closed caldodge closed 8 months ago
This seems to be the same issue with: https://github.com/hzi-bifo/RiboDetector/issues/34.
Could you check if there is any CUDA device available with echo $CUDA_VISIBLE_DEVICES
, then check which version of CUDA you are using with nvcc --version
?
You can also set --chunk_size 256
and -m 8
parameter to avoid out of memory issue. The value can be adjusted according to your memory, GPU memory.
Hi @dawnmy, I seem to be encountering the same issue.
I'm running python=3.9, pytorch=2.0.1 installed through conda. When I run python and check CUDA status through torch.cuda, it shows the current device and that it is available.
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
File "/home/user/miniconda3/envs/epicall-nextflow/lib/python3.9/site-packages/ribodetector/model/model.py", line 118, in last_items
indices = sorted_last_indices(pack=pack)
if unsort and pack.unsorted_indices is not None:
indices = indices[pack.unsorted_indices]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
return pack.data[indices]
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)
❯ python
Python 3.9.17 | packaged by conda-forge | (main, Aug 10 2023, 07:02:31)
[GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
>>> torch.cuda.current_device()
0
>>>
After some Googling, I realized this might be an issue related to the compatibility between the installed Pytorch version and CUDA version. Particularly cuda 11.7 might cause the problem
I'm not sure where the incompatibility might lie, but I tried out the previous pytorch installations and was able to get it working by installing the following dependencies prior to installing ribodetector:
mamba install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.6 -c pytorch -c conda-forge
cudatoolkit-dev
(which installs nvcc
) is not required.
The issue showed up when installing the latest version of Pytorch as per the Pytorch website, both with CUDA 11.7 and CUDA 11.8.
I do not think it is a direct result of incompatibility between the Pytorch version and CUDA version, since the issues that I had showed up when installing via the official Pytorch installation instructions. Perhaps either the newer versions of Pytorch or CUDA are causing these issues.
Edit: On further testing, it looks like it may be an issue caused by the newer versions of Pytorch. I've found that Pytorch 1.12.1 works, while Pytorch 1.13.1 does not. Both are running CUDA 11.6. Here are some findings from my tests:
WORKS:
mamba install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.6 -c pytorch -c conda-forge
DOES NOT WORK:
# Latest Pytorch/CUDA installation as per standard Pytorch installation instructions (Pytorch 2.0.1, CUDA 11.8/11.7)
mamba install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
# CUDA 11.6
mamba install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.6 -c pytorch -c nvidia
# CUDA 11.7
mamba install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
@gohweixun Thank you for taking the time to conduct such a thorough test. Based on your observations, it appears there might be an issue with RiboDetector when using a newer version of PyTorch. I'll be looking into this on my end and will subsequently update the installation guide as well as the conda requirements to solve this issue.
Thanks again for bringing this to our attention!
Hi @dawnmy,
for this issues I added a new line in the model.py to move indices to the same device as pack.data and it might tackle this issue:
@jit.script
def last_items(pack: PackedSequence, unsort: bool) -> Tensor:
indices = sorted_last_indices(pack=pack)
if unsort and pack.unsorted_indices is not None:
# Move indices to the same device as pack.data
indices = indices.to(pack.data.device)
indices = indices[pack.unsorted_indices]
return pack.data[indices]
Thank you @jarffery for the suggestion. Has this fix been tested with older (1.6-1.9) and new Pytorch versions. I will test it and update later.
The fix needs to be tested
This is on a Red Hat 8.6 system, with an Nvidia A30 The Python version is 3.9. The Torch version is 2.0.1 (installed with pip) The ribodetector version is 0.2.7 (installed with pip)
We run the following command on some sample data: ribodetector -d 0 -l 100 -i SRR14098566.fastq.gz -o test.fastq.gz
It fails. Here's the complete program output:
2023-08-16 17:19:43 : INFO Using high MCC model file: /apps/ribodetector/0.2.7/ribodetector/data/ribodetector_600k_variable_len70_101_epoch47.pth 2023-08-16 17:19:44 : INFO Model using cuda for read length 100 loaded 2023-08-16 17:19:45 : INFO Choose batch size: 32768 based on the given GPU RAM size 32GB and max read length 100 2023-08-16 17:20:00 : INFO 5933995 sequences loaded! 2023-08-16 17:20:00 : INFO Writing output non-rRNA sequences into file: test.fastq.gz 0%| | 0/182 [00:05<?, ?it/s] Traceback (most recent call last): File "/apps/ribodetector/0.2.7/bin/ribodetector", line 8, in
sys.exit(main())
File "/apps/ribodetector/0.2.7/ribodetector/detect.py", line 726, in main
seq_pred.detect()
File "/apps/ribodetector/0.2.7/ribodetector/detect.py", line 501, in detect
self.run()
File "/apps/ribodetector/0.2.7/ribodetector/detect.py", line 260, in run
output = self.model(
File "/apps/torch/2.0.1/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/apps/ribodetector/0.2.7/ribodetector/model/model.py", line 34, in forward1
last_out = last_items(pack=r_out, unsort=True)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
File "/apps/ribodetector/0.2.7/ribodetector/model/model.py", line 118, in last_items
indices = sorted_last_indices(pack=pack)
if unsort and pack.unsorted_indices is not None:
indices = indices[pack.unsorted_indices]