InconsolableCellist commented 8 months ago

Describe the issue

Issue:

I get a segmentation fault when trying to load the model_worker using the provided weights and installation steps

Environment:

nvidia drivers and GPUs

OS and distro

``` $ cat /etc/os-release NAME="Artix Linux" PRETTY_NAME="Artix Linux" ID=artix BUILD_ID=rolling ANSI_COLOR="0;36" HOME_URL="https://www.artixlinux.org/" DOCUMENTATION_URL="https://wiki.artixlinux.org/" SUPPORT_URL="https://forum.artixlinux.org/" BUG_REPORT_URL="https://bugs.artixlinux.org/" PRIVACY_POLICY_URL="https://terms.artixlinux.org/docs/privacy-policy/" LOGO=artixlinux-logo ```

pip/micromamba environment

``` $ pip list Package Version Editable project location ------------------------- ------------ ------------------------- accelerate 0.21.0 aiofiles 23.2.1 aiohttp 3.9.3 aiosignal 1.3.1 altair 5.2.0 anyio 4.2.0 async-timeout 4.0.3 attrs 23.2.0 bitsandbytes 0.41.0 certifi 2023.11.17 charset-normalizer 3.3.2 click 8.1.7 cmake 3.28.1 contourpy 1.2.0 cycler 0.12.1 einops 0.6.1 einops-exts 0.0.4 exceptiongroup 1.2.0 fastapi 0.109.0 ffmpy 0.3.1 filelock 3.13.1 fonttools 4.47.2 frozenlist 1.4.1 fsspec 2023.12.2 gradio 3.35.2 gradio_client 0.2.9 h11 0.14.0 httpcore 0.17.3 httpx 0.24.0 huggingface-hub 0.20.3 idna 3.6 Jinja2 3.1.3 joblib 1.3.2 jsonschema 4.21.1 jsonschema-specifications 2023.12.1 kiwisolver 1.4.5 linkify-it-py 2.0.2 lit 17.0.6 llava 1.2.0 /home/user/Programs/LLaVA markdown-it-py 2.2.0 markdown2 2.4.12 MarkupSafe 2.1.4 matplotlib 3.8.2 mdit-py-plugins 0.3. mdurl 0.1.2 mpmath 1.3.0 multidict 6.0.4 networkx 3.2.1 numpy 1.26.3 nvidia-cublas-cu11 11.10.3.66 nvidia-cuda-cupti-cu11 11.7.101 nvidia-cuda-nvrtc-cu11 11.7.99 nvidia-cuda-runtime-cu11 11.7.99 nvidia-cudnn-cu11 8.5.0.96 nvidia-cufft-cu11 10.9.0.58 nvidia-curand-cu11 10.2.10.91 nvidia-cusolver-cu11 11.4.0.1 nvidia-cusparse-cu11 11.7.4.91 nvidia-nccl-cu11 2.14.3 nvidia-nvtx-cu11 11.7.91 orjson 3.9.12 packaging 23.2 pandas 2.2.0 peft 0.4.0 pillow 10.2.0 pip 23.3.2 psutil 5.9.8 pydantic 1.10.14 pydub 0.25.1 Pygments 2.17.2 pyparsing 3.1.1 python-dateutil 2.8.2 python-multipart 0.0.6 pytz 2023.4 PyYAML 6.0.1 referencing 0.33.0 regex 2023.12.25 requests 2.31.0 rpds-py 0.17.1 safetensors 0.4.2 scikit-learn 1.2.2 scipy 1.12.0 semantic-version 2.10.0 sentencepiece 0.1.99 setuptools 69.0.3 shortuuid 1.0.11 six 1.16.0 sniffio 1.3.0 starlette 0.35.1 svgwrite 1.4.3 sympy 1.12 threadpoolctl 3.2.0 timm 0.6.13 tokenizers 0.15.0 toolz 0.12.1 torch 2.0.1 torchvision 0.15.2 tqdm 4.66.1 transformers 4.36.2 triton 2.0.0 typing_extensions 4.9.0 tzdata 2023.4 uc-micro-py 1.0.2 urllib3 2.2.0 uvicorn 0.27.0.post1 wavedrom 2.0.3.post3 websockets 12.0 wheel 0.42.0 yarl 1.9.4 ```

Python version

``` $ python --version Python 3.10.13 ```

Steps to reproduce:

I followed the installation steps:

git clone https://github.com/haotian-liu/LLaVA.git
cd LLaVA
micromamba env create -n llava python=3.10 -y
micromamba activate llava
pip install --upgrade pip
pip install -e .

I then downloaded the weights (llava-v1.6-mistral-7b)

And ran things in the following order:

tmux (new session)
python -m llava.serve.controller --host 0.0.0.0 --port 10000
python -m llava.serve.gradio_web_server --controller http://localhost:10000 --model-list-mode reload
python -m llava.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path ./models/llava-v1.6-mistral-7b

I monitored the progress with nv-top and saw that the GPUs started getting data loaded into them.

Observed Results

$ python -m llava.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path ./models/llava-v1.6-mistral-7b              
2024-02-01 11:05:24 | INFO | model_worker | args: Namespace(host='0.0.0.0', port=40000, worker_address='http://localhost:40000', controller_address='http://localhost:10000', model_path='./models/llava-v1.6-mist
ral-7b', model_base=None, model_name=None, device='cuda', multi_modal=False, limit_model_concurrency=5, stream_interval=1, no_register=False, load_8bit=False, load_4bit=False)                                   
2024-02-01 11:05:24 | INFO | model_worker | Loading the model llava-v1.6-mistral-7b on worker 16a702 ...                                                                                                          
preprocessor_config.json:   0%|                                                                                      | 0.00/316 [00:00<?, ?B/s]                                                                   
preprocessor_config.json: 100%|███████████████████████████████████████████████████████████████████████████████| 316/316 [00:00<00:00, 1.04MB/s]                                                                   
2024-02-01 11:05:25 | ERROR | stderr |                                                                                                                                                                            
config.json:   0%|                                                                                                 | 0.00/4.76k [00:00<?, ?B/s]                                                                   
config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████| 4.76k/4.76k [00:00<00:00, 14.7MB/s]                                                                   
2024-02-01 11:05:25 | ERROR | stderr |                                                                   
pytorch_model.bin:   0%|                                                                                           | 0.00/1.71G [00:00<?, ?B/s]                                                                   
pytorch_model.bin:   1%|▌                                                                                 | 10.5M/1.71G [00:00<02:07, 13.3MB/s]  
...
pytorch_model.bin: 100%|██████████████████████████████████████████████████████████████████████████████████| 1.71G/1.71G [00:48<00:00, 35.4MB/s]                                                                   
2024-02-01 11:06:14 | ERROR | stderr |                                                                   
Segmentation fault

Dmesg reports: [ 2319.436369] traps: python[5578] general protection fault ip:74a5de9987f7 sp:7ffe314352a0 error:0 in libtorch_cpu.so[74a5dd241000+12ef4000]

I tried upgrading to the latest version of Python (python 3.10.13 h7a1cb2a_0) but with no change in behavior. I tried Python 3.11 and 3.12 but they had an incompatible version of a Llama library.

I tried upgrading torch and torchvision with pip install --upgrade torch torchvision to torch-2.2.0-cp310-cp310-manylinux1_x86_64.whl and torchvision-0.17.0-cp310-cp310-manylinux1_x86_64.whl from the installed version 2.0.1 and 0.15.2 respectively.

But the crash still occurred: traps: python[5501] general protection fault ip:5e39bfa86cfc sp:78ce0effe4b0 error:0 in python3.10 (deleted)[5e39bf9b7000+206000]

And it somehow crashed tmux, which I don't understand how that happened.

Running nvidia-smi again resulted in:

$ nvidia-smi 
Segmentation fault

$ sudo dmesg
...
[20404.550385] nvidia-smi[9483]: segfault at 0 ip 00000000004481c0 sp 00007ffc6a45ab58 error 6 in nvidia-smi[400000+92000] likely on CPU 5 (core 1, socket 0)
[20404.550408] Code: 90 1e 40 a0 00 23 b4 02 c8 1d c8 20 30 1e a0 04 a0 1a f8 0c c0 06 f8 09 60 19 18 1f 68 1c 88 a0 60 96 b0 1e a2 81 38 27 00 9e <30> 1f d0 20 58 a1 60 a2 e0 91 88 1b 88 0f 28 1d 90 8c 78 18 40 08

siddy819 commented 8 months ago

The new repo seems to have some problems, going to an earlier checkpoint around Dec 23 solved most of my issues. The mistral7b based model runs for me, although with some warnings. I'm using the eval function btw and not the gradio implementation.

InconsolableCellist commented 8 months ago

Using 2120e82 I was able to get the model to load, but it segfaulted again as soon as I tried an inference in the gradio UI, resulting in "NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE." in the UI and this output

Output

``` Some weights of LlavaLlamaForCausalLM were not initialized from the model checkpoint at ./models/llava-v1.6-mistral-7b and are newly initialized: ['model. layers.4.self_attn.rotary_emb.inv_freq', 'model.layers.10.self_attn.rotary_emb.inv_freq', 'model.layers.3.self_attn.rotary_emb.inv_freq', 'model.layers.7. self_attn.rotary_emb.inv_freq', 'model.layers.14.self_attn.rotary_emb.inv_freq', 'model.layers.11.self_attn.rotary_emb.inv_freq', 'model.layers.15.self_at tn.rotary_emb.inv_freq', 'model.layers.13.self_attn.rotary_emb.inv_freq', 'model.layers.8.self_attn.rotary_emb.inv_freq', 'model.layers.16.self_attn.rotar y_emb.inv_freq', 'model.layers.30.self_attn.rotary_emb.inv_freq', 'model.layers.31.self_attn.rotary_emb.inv_freq', 'model.layers.19.self_attn.rotary_emb.i nv_freq', 'model.layers.22.self_attn.rotary_emb.inv_freq', 'model.layers.6.self_attn.rotary_emb.inv_freq', 'model.layers.2.self_attn.rotary_emb.inv_freq', 'model.layers.25.self_attn.rotary_emb.inv_freq', 'model.layers.29.self_attn.rotary_emb.inv_freq', 'model.layers.27.self_attn.rotary_emb.inv_freq', 'model .layers.9.self_attn.rotary_emb.inv_freq', 'model.layers.26.self_attn.rotary_emb.inv_freq', 'model.layers.28.self_attn.rotary_emb.inv_freq', 'model.layers. 1.self_attn.rotary_emb.inv_freq', 'model.layers.21.self_attn.rotary_emb.inv_freq', 'model.layers.5.self_attn.rotary_emb.inv_freq', 'model.layers.12.self_a ttn.rotary_emb.inv_freq', 'model.layers.0.self_attn.rotary_emb.inv_freq', 'model.layers.24.self_attn.rotary_emb.inv_freq', 'model.layers.20.self_attn.rota ry_emb.inv_freq', 'model.layers.23.self_attn.rotary_emb.inv_freq', 'model.layers.17.self_attn.rotary_emb.inv_freq', 'model.layers.18.self_attn.rotary_emb. inv_freq'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. 2024-02-01 16:49:23 | INFO | model_worker | Register to controller 2024-02-01 16:49:23 | ERROR | stderr | INFO: Started server process [4446] 2024-02-01 16:49:23 | ERROR | stderr | INFO: Waiting for application startup. 2024-02-01 16:49:23 | ERROR | stderr | INFO: Application startup complete. 2024-02-01 16:49:23 | ERROR | stderr | INFO: Uvicorn running on http://0.0.0.0:40000 (Press CTRL+C to quit) 2024-02-01 16:49:38 | INFO | model_worker | Send heart beat. Models: ['llava-v1.6-mistral-7b']. Semaphore: None. global_counter: 0 2024-02-01 16:49:53 | INFO | model_worker | Send heart beat. Models: ['llava-v1.6-mistral-7b']. Semaphore: None. global_counter: 0 2024-02-01 16:49:54 | INFO | stdout | INFO: 127.0.0.1:59894 - "POST /worker_get_status HTTP/1.1" 200 OK 2024-02-01 16:50:00 | INFO | model_worker | Send heart beat. Models: ['llava-v1.6-mistral-7b']. Semaphore: Semaphore(value=4, locked=False). global_counte r: 1 2024-02-01 16:50:00 | INFO | stdout | INFO: 127.0.0.1:60498 - "POST /worker_generate_stream HTTP/1.1" 200 OK 2024-02-01 16:50:00 | ERROR | stderr | /home/user/micromamba/envs/llava/lib/python3.10/site-packages/transformers/generation/utils.py:1270: UserWarning: Yo u have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation ) 2024-02-01 16:50:00 | ERROR | stderr | warnings.warn( Segmentation fault ```

Here are the hashes of the model I downloaded:

SHA256sums

``` f3c77fe6e6b6457849e0f57c6cb6fa6222a4a64d914778f7234821d3588d40ff config.json 741acba7f5e235dac0e6865ecc212bbadb1ab1d6d853de7d759268cb62aaf2b4 generation_config.json 57f11463314a7b628842ba55008c323fbc8d2c6d48a90f02343d550d61321d8e model-00001-of-00004.safetensors 1e88a821d441aef6685311cb319eebbac47fa99d523c71519a3cfa59478da451 model-00002-of-00004.safetensors 225e6c059b92f1c7c234ab748ae718e29c26e558f3f645e19cbca502e8d94042 model-00003-of-00004.safetensors 493af3a613d7b4ad965d7ae2be1592776d0f792fc0b6d38d44e3dc5268c0cec0 model-00004-of-00004.safetensors c1a32061b0ad3059ded2beeb22b2fb1cc885811c9baadb56a1ab6c6b92a9e4d3 model.safetensors.index.json 7d0d549f44bdffba479581e56d229d4cfa1d85c7112fa62c11817b791bb286de README.md 719833ff26ac897a3ec8ed946028a135de2a351470af59b4008744ab1f0ee9b7 special_tokens_map.json b4b50144c2149fcd26c3068523ae847c460d3424b7fac2921af49d111cd34c30 tokenizer_config.json fc4f0bd70b3709312d9d1d9e5ba674794b6bc5abc17429897a540f93882f25fc tokenizer.json dadfd56d766715c61d2ef780a525ab43b8e6da4de6865bda3d95fdef5e134055 tokenizer.model 1f4f710e77837f113c3801bcba48f8c5662fd22d435fbd724b758e999d66ac93 trainer_state.json c4afa32f230f004feea7301f637244194ecb01fe655fa6a3ad426091430dc565 training_args.bin ```

InconsolableCellist commented 8 months ago

I'm currently running the worker with CUDA_VISIBLE_DEVICES=0 python -m llava.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path ./models/llava-v1.6-mistral-7b and the demo is working, so far no seg fault

haotian-liu commented 8 months ago

does the latest commit work for you?

InconsolableCellist commented 8 months ago

I updated, did pip install -e . and ran with CUDA_VISIBLE_DEVICES=0,1 python -m llava.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path ./models/llava-v1.6 -34b --load-8bit

And I get a different error now:

STDOUT and STDERR

``` 2024-02-02 18:44:51 | ERROR | stderr | warnings.warn( ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [118,0,0], thread: [96,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [118,0,0], thread: [97,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [118,0,0], thread: [98,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [118,0,0], thread: [99,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [118,0,0], thread: [100,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [118,0,0], thread: [101,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [118,0,0], thread: [102,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [118,0,0], thread: [103,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [107,0,0], thread: [123,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed. ... ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [107,0,0], thread: [124,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [107,0,0], thread: [125,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [107,0,0], thread: [126,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [107,0,0], thread: [127,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed. 2024-02-02 18:44:52 | ERROR | stderr | Exception in thread Thread-3 (generate): 2024-02-02 18:44:52 | ERROR | stderr | Traceback (most recent call last): 2024-02-02 18:44:52 | ERROR | stderr | File "/home/user/micromamba/envs/llava/lib/python3.10/threading.py", line 1016, in _bootstrap_inner 2024-02-02 18:44:52 | ERROR | stderr | self.run() 2024-02-02 18:44:52 | ERROR | stderr | File "/home/user/micromamba/envs/llava/lib/python3.10/threading.py", line 953, in run 2024-02-02 18:44:52 | ERROR | stderr | self._target(*self._args, **self._kwargs) 2024-02-02 18:44:52 | ERROR | stderr | File "/home/user/micromamba/envs/llava/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context 2024-02-02 18:44:52 | ERROR | stderr | return func(*args, **kwargs) 2024-02-02 18:44:52 | ERROR | stderr | File "/home/user/Programs/LLaVA/llava/model/language_model/llava_llama.py", line 137, in generate 2024-02-02 18:44:52 | ERROR | stderr | return super().generate( 2024-02-02 18:44:52 | ERROR | stderr | File "/home/user/micromamba/envs/llava/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context 2024-02-02 18:44:52 | ERROR | stderr | return func(*args, **kwargs) 2024-02-02 18:44:52 | ERROR | stderr | File "/home/user/micromamba/envs/llava/lib/python3.10/site-packages/transformers/generation/utils.py", line 1764, in generate 2024-02-02 18:44:52 | ERROR | stderr | return self.sample( 2024-02-02 18:44:52 | ERROR | stderr | File "/home/user/micromamba/envs/llava/lib/python3.10/site-packages/transformers/generation/utils.py", line 2861, in sample 2024-02-02 18:44:52 | ERROR | stderr | outputs = self( 2024-02-02 18:44:52 | ERROR | stderr | File "/home/user/micromamba/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl 2024-02-02 18:44:52 | ERROR | stderr | return forward_call(*args, **kwargs) 2024-02-02 18:44:52 | ERROR | stderr | File "/home/user/micromamba/envs/llava/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward 2024-02-02 18:44:52 | ERROR | stderr | output = old_forward(*args, **kwargs) 2024-02-02 18:44:52 | ERROR | stderr | File "/home/user/Programs/LLaVA/llava/model/language_model/llava_llama.py", line 91, in forward 2024-02-02 18:44:52 | ERROR | stderr | return super().forward( 2024-02-02 18:44:52 | ERROR | stderr | File "/home/user/micromamba/envs/llava/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1181, in forward 2024-02-02 18:44:52 | ERROR | stderr | outputs = self.model( 2024-02-02 18:44:52 | ERROR | stderr | File "/home/user/micromamba/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl 2024-02-02 18:44:52 | ERROR | stderr | return forward_call(*args, **kwargs) 2024-02-02 18:44:52 | ERROR | stderr | File "/home/user/micromamba/envs/llava/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1068, in forward 2024-02-02 18:44:52 | ERROR | stderr | layer_outputs = decoder_layer( 2024-02-02 18:44:52 | ERROR | stderr | File "/home/user/micromamba/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl 2024-02-02 18:44:52 | ERROR | stderr | return forward_call(*args, **kwargs) 2024-02-02 18:44:52 | ERROR | stderr | File "/home/user/micromamba/envs/llava/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward 2024-02-02 18:44:52 | ERROR | stderr | output = old_forward(*args, **kwargs) 2024-02-02 18:44:52 | ERROR | stderr | File "/home/user/micromamba/envs/llava/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 796, in forward 2024-02-02 18:44:52 | ERROR | stderr | hidden_states, self_attn_weights, present_key_value = self.self_attn( 2024-02-02 18:44:52 | ERROR | stderr | File "/home/user/micromamba/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl 2024-02-02 18:44:52 | ERROR | stderr | return forward_call(*args, **kwargs) 2024-02-02 18:44:52 | ERROR | stderr | File "/home/user/micromamba/envs/llava/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward 2024-02-02 18:44:52 | ERROR | stderr | output = old_forward(*args, **kwargs) 2024-02-02 18:44:52 | ERROR | stderr | File "/home/user/micromamba/envs/llava/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 448, in forward 2024-02-02 18:44:52 | ERROR | stderr | attn_output = self.o_proj(attn_output) 2024-02-02 18:44:52 | ERROR | stderr | File "/home/user/micromamba/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl 2024-02-02 18:44:52 | ERROR | stderr | return forward_call(*args, **kwargs) 2024-02-02 18:44:52 | ERROR | stderr | File "/home/user/micromamba/envs/llava/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward 2024-02-02 18:44:52 | ERROR | stderr | output = old_forward(*args, **kwargs) 2024-02-02 18:44:52 | ERROR | stderr | File "/home/user/micromamba/envs/llava/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 441, in forward 2024-02-02 18:44:52 | ERROR | stderr | out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state) 2024-02-02 18:44:52 | ERROR | stderr | File "/home/user/micromamba/envs/llava/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 563, in matmul 2024-02-02 18:44:52 | ERROR | stderr | return MatMul8bitLt.apply(A, B, out, bias, state) 2024-02-02 18:44:52 | ERROR | stderr | File "/home/user/micromamba/envs/llava/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply 2024-02-02 18:44:52 | ERROR | stderr | return super().apply(*args, **kwargs) # type: ignore[misc] 2024-02-02 18:44:52 | ERROR | stderr | File "/home/user/micromamba/envs/llava/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 327, in forward 2024-02-02 18:44:52 | ERROR | stderr | CA, CAt, SCA, SCAt, coo_tensorA = F.double_quant(A.to(torch.float16), threshold=state.threshold) 2024-02-02 18:44:52 | ERROR | stderr | File "/home/user/micromamba/envs/llava/lib/python3.10/site-packages/bitsandbytes/functional.py", line 2016, in double_quant 2024-02-02 18:44:52 | ERROR | stderr | nnz = nnz_row_ptr[-1].item() 2024-02-02 18:44:52 | ERROR | stderr | RuntimeError: CUDA error: device-side assert triggered 2024-02-02 18:44:52 | ERROR | stderr | CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. 2024-02-02 18:44:52 | ERROR | stderr | For debugging consider passing CUDA_LAUNCH_BLOCKING=1. 2024-02-02 18:44:52 | ERROR | stderr | Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. 2024-02-02 18:44:52 | ERROR | stderr | ```

dmesg output

``` [57310.268117] Fixing recursive fault but reboot is needed! [57310.268119] BUG: scheduling while atomic: cuda-EvtHandlr/13789/0x00000000 [57310.268122] Modules linked in: fuse rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs qrtr bnep bluetooth ecdh_generic sunrpc vfat fat intel_rapl_msr intel_rapl_common intel_uncore _frequency intel_uncore_frequency_common isst_if_common skx_edac nfit snd_soc_avs x86_pkg_temp_thermal intel_powerclamp snd_soc_hda_codec snd_hda_ext_core coretemp snd_soc_core kvm_intel snd_compress snd_hda_co dec_realtek ac97_bus rfkill snd_pcm_dmaengine snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel kvm snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec irqbypass snd_hda_core rapl snd_hwdep iTCO_wdt intel_cs tate mei_wdt snd_pcm intel_pmc_bxt dell_wmi crypto_user iTCO_vendor_support dell_smm_hwmon dell_smbios snd_timer ledtrig_audio mei_me dcdbas joydev snd intel_uncore sparse_keymap wmi_bmof dell_wmi_descriptor in tel_wmi_thunderbolt pcspkr ioatdma i2c_i801 e1000e soundcore mei i2c_smbus dca acpi_tad mac_hid ext4 uas usb_storage crc32c_generic crc16 mbcache jbd2 dm_crypt usbhid cbc encrypted_keys trusted [57310.268210] asn1_encoder tee dm_mod nvme nvme_core nvme_auth crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic gf128mul serio_raw ghash_clmulni_intel sha512_ssse3 atkbd sha256_ssse3 libps2 sha1_ssse3 vivaldi_fmap aesni_intel crypto_simd cryptd vmd xhci_pci xhci_pci_renesas i8042 serio nvidia_drm(POE) nvidia_uvm(POE) nvidia_modeset(POE) video wmi nvidia(POE) [57310.268245] CPU: 15 PID: 13789 Comm: cuda-EvtHandlr Tainted: P D OE 6.7.1-artix1-1 #1 f007f59a7ff3e3e54eae55d54af75a4e06d3590f [57310.268250] Hardware name: Dell Inc. Precision 5820 Tower/0X30MX, BIOS 2.20.0 05/26/2022 [57310.268252] Call Trace: [57310.268255] [57310.268259] dump_stack_lvl+0x47/0x60 [57310.268265] __schedule_bug+0x56/0x70 [57310.268273] __schedule+0x103e/0x1410 [57310.268278] ? vprintk_emit+0x175/0x2b0 [57310.268284] ? _printk+0x64/0x80 [57310.268290] do_task_dead+0x43/0x50 [57310.268296] make_task_dead+0x151/0x170 [57310.268302] rewind_stack_and_make_dead+0x17/0x20 [57310.268306] RIP: 0033:0x7bf286c6cf7f [57310.268338] Code: Unable to access opcode bytes at 0x7bf286c6cf55. [57310.268340] RSP: 002b:00007bf154dffd50 EFLAGS: 00000293 ORIG_RAX: 0000000000000007 [57310.268343] RAX: fffffffffffffdfc RBX: 00000000ffffffff RCX: 00007bf286c6cf7f [57310.268345] RDX: 0000000000000064 RSI: 000000000000000a RDI: 00007bf0d4000c20 [57310.268347] RBP: 00007bf154dffe20 R08: 0000000000000000 R09: 0000000000000000 [57310.268349] R10: 00007bf154dffde0 R11: 0000000000000293 R12: 0000000000000000 [57310.268351] R13: 0000000000000064 R14: 00007bf0d4000c20 R15: 0000648d4edf1e90 [57310.268354] [57310.268356] ------------[ cut here ]------------ [57310.268357] Voluntary context switch within RCU read-side critical section! [57310.268364] WARNING: CPU: 15 PID: 13789 at kernel/rcu/tree_plugin.h:320 rcu_note_context_switch+0x5e0/0x660 [57310.268374] Modules linked in: fuse rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs qrtr bnep bluetooth ecdh_generic sunrpc vfat fat intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common isst_if_common skx_edac nfit snd_soc_avs x86_pkg_temp_thermal intel_powerclamp snd_soc_hda_codec snd_hda_ext_core coretemp snd_soc_core kvm_intel snd_compress snd_hda_codec_realtek ac97_bus rfkill snd_pcm_dmaengine snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel kvm snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec irqbypass snd_hda_core rapl snd_hwdep iTCO_wdt intel_cstate mei_wdt snd_pcm intel_pmc_bxt dell_wmi crypto_user iTCO_vendor_support dell_smm_hwmon dell_smbios snd_timer ledtrig_audio mei_me dcdbas joydev snd intel_uncore sparse_keymap wmi_bmof dell_wmi_descriptor intel_wmi_thunderbolt pcspkr ioatdma i2c_i801 e1000e soundcore mei i2c_smbus dca acpi_tad mac_hid ext4 uas usb_storage crc32c_generic crc16 mbcache jbd2 dm_crypt usbhid cbc encrypted_keys trusted [57310.268431] asn1_encoder tee dm_mod nvme nvme_core nvme_auth crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic gf128mul serio_raw ghash_clmulni_intel sha512_ssse3 atkbd sha256_ssse3 libps2 sha1_ssse3 vivaldi_fmap aesni_intel crypto_simd cryptd vmd xhci_pci xhci_pci_renesas i8042 serio nvidia_drm(POE) nvidia_uvm(POE) nvidia_modeset(POE) video wmi nvidia(POE) [57310.268456] CPU: 15 PID: 13789 Comm: cuda-EvtHandlr Tainted: P D W OE 6.7.1-artix1-1 #1 f007f59a7ff3e3e54eae55d54af75a4e06d3590f [57310.268459] Hardware name: Dell Inc. Precision 5820 Tower/0X30MX, BIOS 2.20.0 05/26/2022 [57310.268461] RIP: 0010:rcu_note_context_switch+0x5e0/0x660 [57310.268465] Code: 00 00 00 00 0f 85 07 fd ff ff 49 89 8c 24 a0 00 00 00 e9 fa fc ff ff 48 c7 c7 00 fa 07 a4 c6 05 29 5b e5 01 01 e8 60 ee f3 ff <0f> 0b e9 7b fa ff ff 49 83 bc 24 98 00 00 00 00 49 8b 84 24 a0 00 [57310.268468] RSP: 0018:ffff9ce2c0473e18 EFLAGS: 00010086 [57310.268470] RAX: 0000000000000000 RBX: ffff89d3dfff5100 RCX: 0000000000000027 [57310.268472] RDX: ffff89d3dffe16c8 RSI: 0000000000000001 RDI: ffff89d3dffe16c0 [57310.268474] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff9ce2c0473ca0 [57310.268476] R10: 0000000000000003 R11: ffff89d45ff9b1e8 R12: ffff89d3dfff4200 [57310.268478] R13: ffff89b5a6932740 R14: 0000000000000000 R15: 0000000000000000 [57310.268479] FS: 0000000000000000(0000) GS:ffff89d3dffc0000(0000) knlGS:0000000000000000 [57310.268482] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [57310.268484] CR2: 00007bf0a40088c8 CR3: 0000001052c20001 CR4: 00000000003706f0 [57310.268486] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [57310.268488] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [57310.268490] Call Trace: [57310.268491] [57310.268492] ? rcu_note_context_switch+0x5e0/0x660 [57310.268496] ? __warn+0x81/0x130 [57310.268502] ? rcu_note_context_switch+0x5e0/0x660 [57310.268506] ? report_bug+0x171/0x1a0 [57310.268512] ? prb_read_valid+0x1b/0x30 [57310.268516] ? handle_bug+0x3c/0x80 [57310.268521] ? exc_invalid_op+0x17/0x70 [57310.268525] ? asm_exc_invalid_op+0x1a/0x20 [57310.268529] ? rcu_note_context_switch+0x5e0/0x660 [57310.268533] ? rcu_note_context_switch+0x5e0/0x660 [57310.268538] __schedule+0xc0/0x1410 [57310.268541] ? vprintk_emit+0x175/0x2b0 [57310.268544] ? _printk+0x64/0x80 [57310.268549] do_task_dead+0x43/0x50 [57310.268553] make_task_dead+0x151/0x170 [57310.268556] rewind_stack_and_make_dead+0x17/0x20 [57310.268559] RIP: 0033:0x7bf286c6cf7f [57310.268565] Code: Unable to access opcode bytes at 0x7bf286c6cf55. [57310.268567] RSP: 002b:00007bf154dffd50 EFLAGS: 00000293 ORIG_RAX: 0000000000000007 [57310.268570] RAX: fffffffffffffdfc RBX: 00000000ffffffff RCX: 00007bf286c6cf7f [57310.268571] RDX: 0000000000000064 RSI: 000000000000000a RDI: 00007bf0d4000c20 [57310.268573] RBP: 00007bf154dffe20 R08: 0000000000000000 R09: 0000000000000000 [57310.268575] R10: 00007bf154dffde0 R11: 0000000000000293 R12: 0000000000000000 [57310.268577] R13: 0000000000000064 R14: 00007bf0d4000c20 R15: 0000648d4edf1e90 [57310.268580] [57310.268581] ---[ end trace 0000000000000000 ]--- [57370.273679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [57370.273693] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P13789/1:b..l [57370.273706] rcu: (detected by 9, t=18002 jiffies, g=495481, q=2899 ncpus=20) ```

Additionally I couldn't ctrl-c or SIGKILL the worker process

haotian-liu commented 8 months ago

Does mistral work for you now? This seems to be a bnb error?

InconsolableCellist commented 8 months ago

Mistral 7B worked, probably because I was able to load it into just one GPU. It didn't do a very good job at anything though, nor did 4-bit 34B. I think I need the full FP16 to get performance as good as the demo, which was quite usable.

I think 8-bit said of the demo pic, "that's a man standing on an ironing board. It's unusual to stand on an ironing board in traffic." etc.

What is bnb?

haotian-liu / LLaVA

[Usage] Segmentation fault in Python when following installation steps #1053

Describe the issue

Issue:

Environment:

Steps to reproduce:

Observed Results