Open 1106301825 opened 1 year ago
Have same issue while handling #417 my error's looks like this
../aten/src/ATen/native/cuda/Indexing.cu:1146: indexSelectLarge
Index: block: [232,0,0], thread: [31,0,0] Assertion srcIndex < srcSelectDimSize
failed.
'''
2023-09-30 13:06:31 | ERROR | stderr | Traceback (most recent c
all last):
2023-09-30 13:06:31 | ERROR | stderr | File "/root/anaconda3/
envs/llava/lib/python3.10/threading.py", line 1016, in _bootstr
ap_inner
2023-09-30 13:06:31 | ERROR | stderr | self.run()
2023-09-30 13:06:31 | ERROR | stderr | File "/root/anaconda3/
envs/llava/lib/python3.10/threading.py", line 953, in run
2023-09-30 13:06:31 | ERROR | stderr | self.target(*self.
args, self._kwargs)
2023-09-30 13:06:31 | ERROR | stderr | File "/root/anaconda3/
envs/llava/lib/python3.10/site-packages/torch/utils/_contextlib
.py", line 115, in decorate_context
2023-09-30 13:06:31 | ERROR | stderr | return func(args,
kwargs)
2023-09-30 13:06:31 | ERROR | stderr | File "/root/anaconda3/
envs/llava/lib/python3.10/site-packages/transformers/generation
/utils.py", line 1588, in generate
2023-09-30 13:06:31 | ERROR | stderr | return self.sample(
2023-09-30 13:06:31 | ERROR | stderr | File "/root/anaconda3/
envs/llava/lib/python3.10/site-packages/transformers/generation
/utils.py", line 2642, in sample
2023-09-30 13:06:31 | ERROR | stderr | outputs = self(
2023-09-30 13:06:31 | ERROR | stderr | File "/root/anaconda3/
envs/llava/lib/python3.10/site-packages/torch/nn/modules/module
.py", line 1501, in _call_impl
2023-09-30 13:06:31 | ERROR | stderr | return forward_call(
args, kwargs)
2023-09-30 13:06:31 | ERROR | stderr | File "/root/anaconda3/
envs/llava/lib/python3.10/site-packages/accelerate/hooks.py", l
ine 165, in new_forward
2023-09-30 13:06:31 | ERROR | stderr | output = old_forward
(*args, kwargs)
2023-09-30 13:06:31 | ERROR | stderr | File "/home/vln_worksp
ace/LLaVA/llava/model/language_model/llava_llama.py", line 78,
in forward
2023-09-30 13:06:31 | ERROR | stderr | outputs = self.model
(
2023-09-30 13:06:31 | ERROR | stderr | File "/root/anaconda3/
envs/llava/lib/python3.10/site-packages/torch/nn/modules/module
.py", line 1501, in _call_impl
2023-09-30 13:06:31 | ERROR | stderr | return forward_call(
*args, *kwargs)
2023-09-30 13:06:31 | ERROR | stderr | File "/root/anaconda3/
envs/llava/lib/python3.10/site-packages/transformers/models/lla
ma/modeling_llama.py", line 646, in forward
2023-09-30 13:06:31 | ERROR | stderr | inputs_embeds = self
.embed_tokens(input_ids)
2023-09-30 13:06:31 | ERROR | stderr | File "/root/anaconda3/
envs/llava/lib/python3.10/site-packages/torch/nn/modules/module
.py", line 1501, in _call_impl
2023-09-30 13:06:31 | ERROR | stderr | return forward_call(
args, kwargs)
2023-09-30 13:06:31 | ERROR | stderr | File "/root/anaconda3/
envs/llava/lib/python3.10/site-packages/accelerate/hooks.py", l
ine 165, in new_forward
2023-09-30 13:06:31 | ERROR | stderr | output = old_forward
(*args, **kwargs)
2023-09-30 13:06:31 | ERROR | stderr | File "/root/anaconda3/
envs/llava/lib/python3.10/site-packages/torch/nn/modules/sparse
.py", line 162, in forward
2023-09-30 13:06:31 | ERROR | stderr | return F.embedding(
2023-09-30 13:06:31 | ERROR | stderr | File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/torch/nn/functional.py", line 2210, in embedding
2023-09-30 13:06:31 | ERROR | stderr | return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
2023-09-30 13:06:31 | ERROR | stderr | RuntimeError: CUDA error: device-side assert triggered
2023-09-30 13:06:31 | ERROR | stderr | Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
'''
I have the same problem, but there is something to analyze, for example, I first tried to run them on Google Colab, it happened to me with the first colibs that everyone gave me that mistake, Then with the Llava_7b_8bit_colab.ipynb version There I could make it work last night, while downloading the models on my PC, although at first I could not walk until I saw another git https://github.com/natlamir/LLaVA-Windows Following those steps I could see what they should initialize in 3 different Powershells, and thus achieve that at first, lift the 3 servers, but, oh surprise I told me the same error, that some gave me some google colabs, (Another thing is that with the send he created in the other git, he could not walk, but if he added some things that he mentioned there, in the github of Haotian-liu, then there he worked in parts, Now I have that problem, which I imagine, is because when I lift the 3rd server, which I see that the complete vram occupies me, perhaps for some reason does not connect to the other 2 servers?) omg now are working ! Well as I wrote this I would regenerate to see if any error came out and started writing, Maybe you have to give a good start time. Now I am using a Ryzen 5 3600X, with 48GB of RAM, and an RTX 2060 with 12GB.
ohh yes now is working!, I'm going to share the pip list to check your versions
Package Version Editable project location
accelerate 0.21.0 aiofiles 23.2.1 aiohttp 3.8.6 aiosignal 1.3.1 altair 5.1.2 anyio 3.7.1 appdirs 1.4.4 async-timeout 4.0.3 attrs 23.1.0 bitsandbytes 0.37.5 bitsandbytes-cuda111 0.26.0.post2 Brotli 1.0.9 certifi 2023.7.22 cffi 1.15.1 chardet 5.2.0 charset-normalizer 2.0.4 click 8.1.7 colorama 0.4.6 contourpy 1.1.1 cryptography 41.0.3 cycler 0.12.1 docker-pycreds 0.4.0 einops 0.6.1 einops-exts 0.0.4 exceptiongroup 1.1.3 fastapi 0.104.0 ffmpy 0.3.1 filelock 3.12.4 fonttools 4.43.1 frozenlist 1.4.0 fsspec 2023.10.0 gitdb 4.0.11 GitPython 3.1.40 gradio 3.35.2 gradio_client 0.2.9 h11 0.14.0 httpcore 0.17.3 httpx 0.24.0 huggingface-hub 0.18.0 idna 3.4 Jinja2 3.1.2 joblib 1.3.2 jsonschema 4.19.1 jsonschema-specifications 2023.7.1 kiwisolver 1.4.5 linkify-it-py 2.0.2 llava 1.1.3 H:\ia\llava markdown-it-py 2.2.0 markdown2 2.4.10 MarkupSafe 2.1.1 matplotlib 3.8.0 mdit-py-plugins 0.3.3 mdurl 0.1.2 mkl-fft 1.3.8 mkl-random 1.2.4 mkl-service 2.4.0 mpmath 1.3.0 multidict 6.0.4 networkx 3.1 ninja 1.11.1.1 numpy 1.26.0 orjson 3.9.10 packaging 23.2 pandas 2.1.2 pathtools 0.1.2 peft 0.4.0 Pillow 10.0.1 pip 23.3 protobuf 4.24.4 psutil 5.9.6 pycparser 2.21 pydantic 1.10.9 pydub 0.25.1 Pygments 2.16.1 pyOpenSSL 23.2.0 pyparsing 3.1.1 PySocks 1.7.1 python-dateutil 2.8.2 python-multipart 0.0.6 pytz 2023.3.post1 PyYAML 6.0.1 referencing 0.30.2 regex 2023.10.3 requests 2.31.0 rpds-py 0.10.6 safetensors 0.4.0 scikit-learn 1.2.2 scipy 1.11.3 semantic-version 2.10.0 sentencepiece 0.1.99 sentry-sdk 1.32.0 setproctitle 1.3.3 setuptools 68.0.0 shortuuid 1.0.11 six 1.16.0 smmap 5.0.1 sniffio 1.3.0 starlette 0.27.0 svgwrite 1.4.3 sympy 1.11.1 threadpoolctl 3.2.0 timm 0.6.13 tokenizers 0.13.3 toolz 0.12.0 torch 2.0.1 torchaudio 2.1.0 torchvision 0.15.2 tqdm 4.66.1 transformers 4.31.0 typing_extensions 4.8.0 tzdata 2023.3 uc-micro-py 1.0.2 urllib3 1.26.18 uvicorn 0.23.2 wandb 0.15.12 wavedrom 2.0.3.post3 websockets 12.0 wheel 0.41.2 win-inet-pton 1.1.0 yarl 1.9.2 youtube-dl 2021.12.17
| ERROR | stderr | RuntimeError: CUDA error: device-side assert triggered | ERROR | stderr | CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. | ERROR | stderr | For debugging consider passing CUDA_LAUNCH_BLOCKING=1. | ERROR | stderr | Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.