[Usage] Problems that arise when using cli.py for inference

zfy1041264242 commented 1 year ago

Describe the issue

Issue: When I tried to use cli.py locally to perform inference on lava-v1.5-13b, I was prompted that the checkpoint could not be loaded, but in fact both files existed. Command:

python -m llava.serve.cli \
    --model-path ./llava-v1.5-13b \
    --image-file /train/1/1.jpg \
    --load-4bit

Log:

[2023-10-10 03:29:14,703] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
llava-v1.5-13b
Loading checkpoint shards:  33%|██████████████████████████                                                    | 1/3 [00:06<00:13,  6.79s/it]
Traceback (most recent call last):
  File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/modeling_utils.py", line 460, in load_state_dict
    return torch.load(checkpoint_file, map_location="cpu")
  File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/torch/serialization.py", line 993, in load
    with _open_zipfile_reader(opened_file) as opened_zipfile:
  File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/torch/serialization.py", line 447, in __init__
    super().__init__(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/modeling_utils.py", line 464, in load_state_dict
    if f.read(7) == "version":
  File "/root/anaconda3/envs/llava/lib/python3.10/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 128: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/anaconda3/envs/llava/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/root/anaconda3/envs/llava/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/remote-home/cs_acmis_zfy/MLLM/LLaVA/llava/serve/cli.py", line 120, in <module>
    main(args)
  File "/remote-home/cs_acmis_zfy/MLLM/LLaVA/llava/serve/cli.py", line 33, in main
    tokenizer, model, image_processor, context_len = load_pretrained_model(args.model_path, args.model_base, model_name, args.load_8bit, args.load_4bit)
  File "/remote-home/cs_acmis_zfy/MLLM/LLaVA/llava/model/builder.py", line 103, in load_pretrained_model
    model = LlavaLlamaForCausalLM.from_pretrained(model_path, low_cpu_mem_usage=True, **kwargs)
  File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2903, in from_pretrained
    ) = cls._load_pretrained_model(
  File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3246, in _load_pretrained_model
    state_dict = load_state_dict(shard_file)
  File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/modeling_utils.py", line 476, in load_state_dict
    raise OSError(
OSError: Unable to load weights from pytorch checkpoint file for './llava-v1.5-13b/pytorch_model-00002-of-00003.bin' at './llava-v1.5-13b/pytorch_model-00002-of-00003.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.

Screenshots:

haotian-liu commented 1 year ago

Seems that the model is not properly downloaded? Can you check the size of each file under the llava-v1.5-13b directory, as well as the md5/sha1?

andrewgross commented 1 year ago

I am seeing the same issue, I think it is an issue with the weights in the .git repository. git status shows that all files are correct (it had to fix 00002). However, the sum of the .bin file sizes does not match the total size in pytorch_model.bin.index.json.

26094813519: Total Size (on disk) 26094673920: Total Size (index.json)

Files were fetched via: git clone https://huggingface.co/liuhaotian/llava-v1.5-13b

haotian-liu commented 1 year ago

@zfy1041264242 I realized that this may be because you did not install git-lfs, so that these files are just some links to the actual files? Try git lfs pull.

@andrewgross But this may not be your case? Is your machine connected to the internet? If so, can you try this as it automatically downloads from HF and we have verified that this works.

python -m llava.serve.cli \
    --model-path liuhaotian/llava-v1.5-13b \
    --image-file "https://llava-vl.github.io/static/images/view.jpg" \
    --load-4bit

andrewgross commented 1 year ago

Thanks for the fast reply. I ran your suggested code and got a different error.

After entering the text prompt, the machine froze and stopped responding to any SSH sessions. The only way to get the system responsive again was manually rebooting it. Unfortunately no logs to syslog or kern.log.

Rerunning after a reboot induced the same behavior. Unfortunately it doesn't seem to writing any sort of core dump either. I had issues with segfaulting when trying to run the gradio server/controller/worker setup as well, the worker would segfault when it tried to load the model.

haotian-liu commented 1 year ago

@andrewgross Can you share your system info? OS, CPU RAM, GPU type and count?

Also, maybe try replacing 13b with 7b?

andrewgross commented 1 year ago

System Specs:

AMD 7950x3D Dual 4090 128 GB DDR5 RAM Asus X670E-E @ 1.0.0.7c

Linux 6.2.0-34-generic #34~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Sep 7 13:12:03 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Mon_Apr__3_17:16:06_PDT_2023 Cuda compilation tools, release 12.1, V12.1.105 Build cuda_12.1.r12.1/compiler.32688072_0

CUDA installed via aptitude package manager 530.30.02-0ubuntu1

>>> import torch
>>> torch.__version__
'2.0.1+cu117'

haotian-liu commented 1 year ago

Also, try CUDA_VISIBLE_DEVICES=0, as the current code tries to use all GPUs. I may need to change this behavior.

And what is your pytorch version? After receiving some reports on the compatibility issue with PyTorch 2.1 as it requires CUDA 11 by default, I have temporarily set the version to torch==2.0.1 and this seems to fix many issues. Not sure if you were using an older version that does not include this.

andrewgross commented 1 year ago

Ill give the single GPU thing a shot. I have had issues in the past using dual GPUs because torch reports that 4090s have an NVLINK interconnect, which causes weird issues when moving tensors. See this issue and this partial fix from exllamav2 for more details. There was a follow up commit to fix some edge cases when uses did not want to use all CUDA gpus.

andrewgross commented 1 year ago

CUDA_VISIBLE_DEVICES=0 --model-path liuhaotian/llava-v1.5-7b:

Works :tada:

--model-path liuhaotian/llava-v1.5-7b:

Fails with

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument tensors in method wrapper_CUDA_cat)

CUDA_VISIBLE_DEVICES=0 --model-path liuhaotian/llava-v1.5-13b:

Works :tada:

Definitely smells like some CUDA / multi GPU issues.

andrewgross commented 1 year ago

Some more info on nvidia forums of similar issues: https://forums.developer.nvidia.com/t/standard-nvidia-cuda-tests-fail-with-dual-rtx-4090-linux-box/233202/22?page=2

andrewgross commented 1 year ago

Thank you for all your help diagnosing this issue.

haotian-liu commented 1 year ago

Great, closing this for now :)

haotian-liu / LLaVA

[Usage] Problems that arise when using cli.py for inference #507

Describe the issue