CUDA erros in the inference.ipynb

I keep encountering the following error when I ran the last block of inference.ipynb:

RuntimeError Traceback (most recent call last) Cell In[14], line 1 ----> 1 output = model.generate(graph, input_tokens) 2 print(output)

File ~/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py:115, in context_decorator..decorate_context(*args, kwargs) 112 @functools.wraps(func) 113 def decorate_context(*args, *kwargs): 114 with ctx_factory(): --> 115 return func(args, kwargs)

File ~/3D-MoLM/model/blip2_llama_inference.py:85, in Blip2Llama.generate(self, graph_batch, text_batch, do_sample, num_beams, max_length, min_length, max_new_tokens, min_new_tokens, repetition_penalty, length_penalty, num_captions) 70 @torch.no_grad() 71 def generate( 72 self, (...) 83 num_captions=1, 84 ):
---> 85 graph_embeds, graph_masks = self.graph_encoder(*graph_batch) 86 graph_embeds = self.ln_graph(graph_embeds) 87 query_tokens = self.query_tokens.expand(graph_embeds.shape[0], -1, -1)

File ~/.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1532, in Module._wrapped_call_impl(self, *args, *kwargs) 1530 return self._compiled_call_impl(args, **kwargs) # type: ignore[misc] ... RuntimeError: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

The cuda version is 12.2. and the environment dependency is:

Package Version

absl-py 2.1.0 accelerate 0.32.1 aiohttp 3.9.5 aiosignal 1.3.1 altair 5.3.0 annotated-types 0.7.0 antlr4-python3-runtime 4.9.3 asttokens 2.4.1 async-timeout 4.0.3 attrs 23.2.0 bleach 6.1.0 blinker 1.8.2 blis 0.7.11 braceexpand 0.1.7 Brotli 1.1.0 cachetools 5.3.3 catalogue 2.0.10 certifi 2024.7.4 cffi 1.16.0 cfgv 3.4.0 charset-normalizer 3.3.2 click 8.1.7 cloudpathlib 0.18.1 cmake 3.30.0 comm 0.2.2 confection 0.1.5 contextlib2 21.6.0 contexttimer 0.3.3 contourpy 1.2.1 cycler 0.12.1 cymem 2.0.8 debugpy 1.8.2 decorator 5.1.1 decord 0.6.0 deepspeed 0.12.2 distlib 0.3.8 docker-pycreds 0.4.0 einops 0.8.0 exceptiongroup 1.2.0 executing 2.0.1 fairscale 0.4.4 filelock 3.15.4 fonttools 4.53.1 frozenlist 1.4.1 fsspec 2024.6.1 ftfy 6.2.0 gitdb 4.0.11 GitPython 3.1.43 gmpy2 2.1.5 h2 4.1.0 hjson 3.1.0 hpack 4.0.0 huggingface-hub 0.23.4 hyperframe 6.0.1 identify 2.6.0 idna 3.7 imageio 2.34.2 importlib_metadata 8.0.0 iopath 0.1.10 ipykernel 6.29.5 ipython 8.26.0 jedi 0.19.1 Jinja2 3.1.4 joblib 1.4.2 jsonschema 4.23.0 jsonschema-specifications 2023.12.1 jupyter_client 8.6.2 jupyter_core 5.7.2 kaggle 1.6.14 kiwisolver 1.4.5 langcodes 3.4.0 language_data 1.2.0 lazy_loader 0.4 lightning-utilities 0.11.3.post0 lit 18.1.8 lmdb 1.5.1 marisa-trie 1.2.0 markdown-it-py 3.0.0 MarkupSafe 2.1.5 matplotlib 3.9.1 matplotlib-inline 0.1.7 mdurl 0.1.2 ml_collections 0.1.1 mpmath 1.3.0 multidict 6.0.5 murmurhash 1.0.10 nest_asyncio 1.6.0 networkx 3.3 ninja 1.11.1.1 nodeenv 1.9.1 numpy 1.26.4 nvidia-cublas-cu11 11.10.3.66 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu11 11.7.101 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu11 11.7.99 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu11 11.7.99 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu11 8.5.0.96 nvidia-cudnn-cu12 8.9.2.26 nvidia-cufft-cu11 10.9.0.58 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu11 10.2.10.91 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu11 11.4.0.1 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu11 11.7.4.91 nvidia-cusparse-cu12 12.1.0.106 nvidia-ml-py 12.555.43 nvidia-nccl-cu11 2.14.3 nvidia-nccl-cu12 2.20.5 nvidia-nvjitlink-cu12 12.5.82 nvidia-nvtx-cu11 11.7.91 nvidia-nvtx-cu12 12.1.105 omegaconf 2.3.0 opencv-python-headless 4.5.5.64 opendatasets 0.1.22 packaging 24.1 pandas 2.2.2 parso 0.8.4 peft 0.11.1 pexpect 4.9.0 pickleshare 0.7.5 pillow 10.4.0 pip 24.1.2 platformdirs 4.2.2 plotly 5.22.0 portalocker 2.10.0 pre-commit 3.7.1 preshed 3.0.9 prompt_toolkit 3.0.47 protobuf 5.28.0rc1 psutil 6.0.0 ptyprocess 0.7.0 pure-eval 0.2.2 py-cpuinfo 9.0.0 pyarrow 16.1.0 pycocoevalcap 1.2 pycocotools 2.0.8 pycparser 2.22 pydantic 2.8.2 pydantic_core 2.20.1 pydeck 0.9.1 Pygments 2.18.0 pynvml 11.5.3 pyparsing 3.1.2 PySocks 1.7.1 python-dateutil 2.9.0 python-magic 0.4.27 python-slugify 8.0.4 pytorch-lightning 2.0.7 pytz 2024.1 PyYAML 6.0.1 pyzmq 26.0.3 rdkit 2024.3.3 referencing 0.35.1 regex 2024.5.15 requests 2.32.3 rich 13.7.1 rpds-py 0.19.0 safetensors 0.4.3 salesforce-lavis 1.0.2 scikit-image 0.24.0 scikit-learn 1.5.1 scipy 1.14.0 sentencepiece 0.2.0 sentry-sdk 2.9.0 setproctitle 1.3.3 setuptools 69.5.1 shellingham 1.5.4 six 1.16.0 smart-open 7.0.4 smmap 5.0.1 spacy 3.7.5 spacy-legacy 3.0.12 spacy-loggers 1.0.5 srsly 2.4.8 stack-data 0.6.2 streamlit 1.36.0 sympy 1.13.0 tenacity 8.5.0 tensorboardX 2.6.2.2 text-unidecode 1.3 thinc 8.2.5 threadpoolctl 3.5.0 tifffile 2024.7.2 timm 0.4.12 tokenizers 0.19.1 toml 0.10.2 toolz 0.12.1 torch 2.3.1 torch_geometric 2.5.3 torchaudio 2.3.0 torchmetrics 1.4.0.post0 torchvision 0.18.0 tornado 6.4.1 tqdm 4.66.4 traitlets 5.14.3 transformers 4.42.4 triton 2.3.1 typer 0.12.3 typing_extensions 4.12.2 tzdata 2024.1 unicore 0.0.1 urllib3 2.2.2 virtualenv 20.26.3 wandb 0.17.4 wasabi 1.1.3 watchdog 4.0.1 wcwidth 0.2.13 weasel 0.4.1 webdataset 0.2.86 webencodings 0.5.1 wheel 0.43.0 wrapt 1.16.0 yarl 1.9.4 zipp 3.19.2 zstandard 0.23.0

I have tried to switch to the pytorch version of 2.3.0 and have the error:

RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasGemmStridedBatchedEx(handle, opa, opb, (int)m, (int)n, (int)k, (void*)&falpha, a, CUDA_R_16BF, (int)lda, stridea, b, CUDA_R_16BF, (int)ldb, strideb, (void*)&fbeta, c, CUDA_R_16BF, (int)ldc, stridec, (int)num_batches, compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP)

It works if I use the cpu instead of gpu. However, cpu generates responses too slow.

Could you provide some advice on how to resolve this error? Thanks.

btw: I tried to install the environment with the requirements.txt but the pip failed.

lsh0520 / 3D-MoLM

CUDA erros in the inference.ipynb #13