Open InconsolableCellist opened 8 months ago
The new repo seems to have some problems, going to an earlier checkpoint around Dec 23 solved most of my issues. The mistral7b based model runs for me, although with some warnings. I'm using the eval function btw and not the gradio implementation.
Using 2120e82 I was able to get the model to load, but it segfaulted again as soon as I tried an inference in the gradio UI, resulting in "NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE." in the UI and this output
Here are the hashes of the model I downloaded:
I'm currently running the worker with CUDA_VISIBLE_DEVICES=0 python -m llava.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path ./models/llava-v1.6-mistral-7b
and the demo is working, so far no seg fault
does the latest commit work for you?
I updated, did pip install -e .
and ran with CUDA_VISIBLE_DEVICES=0,1 python -m llava.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path ./models/llava-v1.6 -34b --load-8bit
And I get a different error now:
Additionally I couldn't ctrl-c or SIGKILL the worker process
Does mistral work for you now? This seems to be a bnb error?
Mistral 7B worked, probably because I was able to load it into just one GPU. It didn't do a very good job at anything though, nor did 4-bit 34B. I think I need the full FP16 to get performance as good as the demo, which was quite usable.
I think 8-bit said of the demo pic, "that's a man standing on an ironing board. It's unusual to stand on an ironing board in traffic." etc.
What is bnb?
Describe the issue
Issue:
I get a segmentation fault when trying to load the model_worker using the provided weights and installation steps
Environment:
nvidia drivers and GPUs
``` $ nvidia-smi Thu Feb 1 15:48:45 2024 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 545.29.06 Driver Version: 545.29.06 CUDA Version: 12.3 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 3090 Off | 00000000:17:00.0 Off | N/A | | 0% 47C P8 19W / 350W | 5MiB / 24576MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 1 NVIDIA GeForce RTX 3090 Off | 00000000:B3:00.0 Off | N/A | | 0% 48C P8 28W / 370W | 5MiB / 24576MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | No running processes found | +---------------------------------------------------------------------------------------+ ```OS and distro
``` $ cat /etc/os-release NAME="Artix Linux" PRETTY_NAME="Artix Linux" ID=artix BUILD_ID=rolling ANSI_COLOR="0;36" HOME_URL="https://www.artixlinux.org/" DOCUMENTATION_URL="https://wiki.artixlinux.org/" SUPPORT_URL="https://forum.artixlinux.org/" BUG_REPORT_URL="https://bugs.artixlinux.org/" PRIVACY_POLICY_URL="https://terms.artixlinux.org/docs/privacy-policy/" LOGO=artixlinux-logo ```pip/micromamba environment
``` $ pip list Package Version Editable project location ------------------------- ------------ ------------------------- accelerate 0.21.0 aiofiles 23.2.1 aiohttp 3.9.3 aiosignal 1.3.1 altair 5.2.0 anyio 4.2.0 async-timeout 4.0.3 attrs 23.2.0 bitsandbytes 0.41.0 certifi 2023.11.17 charset-normalizer 3.3.2 click 8.1.7 cmake 3.28.1 contourpy 1.2.0 cycler 0.12.1 einops 0.6.1 einops-exts 0.0.4 exceptiongroup 1.2.0 fastapi 0.109.0 ffmpy 0.3.1 filelock 3.13.1 fonttools 4.47.2 frozenlist 1.4.1 fsspec 2023.12.2 gradio 3.35.2 gradio_client 0.2.9 h11 0.14.0 httpcore 0.17.3 httpx 0.24.0 huggingface-hub 0.20.3 idna 3.6 Jinja2 3.1.3 joblib 1.3.2 jsonschema 4.21.1 jsonschema-specifications 2023.12.1 kiwisolver 1.4.5 linkify-it-py 2.0.2 lit 17.0.6 llava 1.2.0 /home/user/Programs/LLaVA markdown-it-py 2.2.0 markdown2 2.4.12 MarkupSafe 2.1.4 matplotlib 3.8.2 mdit-py-plugins 0.3. mdurl 0.1.2 mpmath 1.3.0 multidict 6.0.4 networkx 3.2.1 numpy 1.26.3 nvidia-cublas-cu11 11.10.3.66 nvidia-cuda-cupti-cu11 11.7.101 nvidia-cuda-nvrtc-cu11 11.7.99 nvidia-cuda-runtime-cu11 11.7.99 nvidia-cudnn-cu11 8.5.0.96 nvidia-cufft-cu11 10.9.0.58 nvidia-curand-cu11 10.2.10.91 nvidia-cusolver-cu11 11.4.0.1 nvidia-cusparse-cu11 11.7.4.91 nvidia-nccl-cu11 2.14.3 nvidia-nvtx-cu11 11.7.91 orjson 3.9.12 packaging 23.2 pandas 2.2.0 peft 0.4.0 pillow 10.2.0 pip 23.3.2 psutil 5.9.8 pydantic 1.10.14 pydub 0.25.1 Pygments 2.17.2 pyparsing 3.1.1 python-dateutil 2.8.2 python-multipart 0.0.6 pytz 2023.4 PyYAML 6.0.1 referencing 0.33.0 regex 2023.12.25 requests 2.31.0 rpds-py 0.17.1 safetensors 0.4.2 scikit-learn 1.2.2 scipy 1.12.0 semantic-version 2.10.0 sentencepiece 0.1.99 setuptools 69.0.3 shortuuid 1.0.11 six 1.16.0 sniffio 1.3.0 starlette 0.35.1 svgwrite 1.4.3 sympy 1.12 threadpoolctl 3.2.0 timm 0.6.13 tokenizers 0.15.0 toolz 0.12.1 torch 2.0.1 torchvision 0.15.2 tqdm 4.66.1 transformers 4.36.2 triton 2.0.0 typing_extensions 4.9.0 tzdata 2023.4 uc-micro-py 1.0.2 urllib3 2.2.0 uvicorn 0.27.0.post1 wavedrom 2.0.3.post3 websockets 12.0 wheel 0.42.0 yarl 1.9.4 ```Python version
``` $ python --version Python 3.10.13 ```Steps to reproduce:
I followed the installation steps:
I then downloaded the weights (llava-v1.6-mistral-7b)
And ran things in the following order:
tmux
(new session)python -m llava.serve.controller --host 0.0.0.0 --port 10000
python -m llava.serve.gradio_web_server --controller http://localhost:10000 --model-list-mode reload
python -m llava.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path ./models/llava-v1.6-mistral-7b
I monitored the progress with
nv-top
and saw that the GPUs started getting data loaded into them.Observed Results
Dmesg reports:
[ 2319.436369] traps: python[5578] general protection fault ip:74a5de9987f7 sp:7ffe314352a0 error:0 in libtorch_cpu.so[74a5dd241000+12ef4000]
I tried upgrading to the latest version of Python (python 3.10.13 h7a1cb2a_0) but with no change in behavior. I tried Python 3.11 and 3.12 but they had an incompatible version of a Llama library.
I tried upgrading
torch
andtorchvision
withpip install --upgrade torch torchvision
totorch-2.2.0-cp310-cp310-manylinux1_x86_64.whl
andtorchvision-0.17.0-cp310-cp310-manylinux1_x86_64.whl
from the installed version2.0.1
and0.15.2
respectively.But the crash still occurred:
traps: python[5501] general protection fault ip:5e39bfa86cfc sp:78ce0effe4b0 error:0 in python3.10 (deleted)[5e39bf9b7000+206000]
And it somehow crashed tmux, which I don't understand how that happened.
Running
nvidia-smi
again resulted in: