Closed zfy1041264242 closed 1 year ago
Seems that the model is not properly downloaded? Can you check the size of each file under the llava-v1.5-13b directory, as well as the md5/sha1?
I am seeing the same issue, I think it is an issue with the weights in the .git repository. git status
shows that all files are correct (it had to fix 00002). However, the sum of the .bin
file sizes does not match the total size in pytorch_model.bin.index.json
.
26094813519
: Total Size (on disk)
26094673920
: Total Size (index.json)
Files were fetched via:
git clone https://huggingface.co/liuhaotian/llava-v1.5-13b
@zfy1041264242 I realized that this may be because you did not install git-lfs, so that these files are just some links to the actual files? Try git lfs pull
.
@andrewgross But this may not be your case? Is your machine connected to the internet? If so, can you try this as it automatically downloads from HF and we have verified that this works.
python -m llava.serve.cli \
--model-path liuhaotian/llava-v1.5-13b \
--image-file "https://llava-vl.github.io/static/images/view.jpg" \
--load-4bit
Thanks for the fast reply. I ran your suggested code and got a different error.
After entering the text prompt, the machine froze and stopped responding to any SSH sessions. The only way to get the system responsive again was manually rebooting it. Unfortunately no logs to syslog or kern.log.
Rerunning after a reboot induced the same behavior. Unfortunately it doesn't seem to writing any sort of core dump either. I had issues with segfaulting when trying to run the gradio server/controller/worker setup as well, the worker would segfault when it tried to load the model.
@andrewgross Can you share your system info? OS, CPU RAM, GPU type and count?
Also, maybe try replacing 13b with 7b?
System Specs:
AMD 7950x3D Dual 4090 128 GB DDR5 RAM Asus X670E-E @ 1.0.0.7c
Linux 6.2.0-34-generic #34~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Sep 7 13:12:03 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Mon_Apr__3_17:16:06_PDT_2023 Cuda compilation tools, release 12.1, V12.1.105 Build cuda_12.1.r12.1/compiler.32688072_0
CUDA installed via aptitude package manager 530.30.02-0ubuntu1
>>> import torch
>>> torch.__version__
'2.0.1+cu117'
Also, try CUDA_VISIBLE_DEVICES=0
, as the current code tries to use all GPUs. I may need to change this behavior.
And what is your pytorch version? After receiving some reports on the compatibility issue with PyTorch 2.1 as it requires CUDA 11 by default, I have temporarily set the version to torch==2.0.1
and this seems to fix many issues. Not sure if you were using an older version that does not include this.
Ill give the single GPU thing a shot. I have had issues in the past using dual GPUs because torch reports that 4090s have an NVLINK interconnect, which causes weird issues when moving tensors. See this issue and this partial fix from exllamav2 for more details. There was a follow up commit to fix some edge cases when uses did not want to use all CUDA gpus.
CUDA_VISIBLE_DEVICES=0 --model-path liuhaotian/llava-v1.5-7b
: Works :tada:
--model-path liuhaotian/llava-v1.5-7b
: Fails with
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument tensors in method wrapper_CUDA_cat)
CUDA_VISIBLE_DEVICES=0 --model-path liuhaotian/llava-v1.5-13b
: Works :tada:
Definitely smells like some CUDA / multi GPU issues.
Some more info on nvidia forums of similar issues: https://forums.developer.nvidia.com/t/standard-nvidia-cuda-tests-fail-with-dual-rtx-4090-linux-box/233202/22?page=2
Thank you for all your help diagnosing this issue.
Great, closing this for now :)
Describe the issue
Issue: When I tried to use cli.py locally to perform inference on lava-v1.5-13b, I was prompted that the checkpoint could not be loaded, but in fact both files existed. Command:
Log:
Screenshots: