haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
https://llava.hliu.cc
Apache License 2.0
19.35k stars 2.13k forks source link

Can't get Llava to work on M1 mac #700

Closed vashat closed 10 months ago

vashat commented 10 months ago

Describe the issue

Issue: It seams Llava is not working on M1 with MPS backend.

Command:

env PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 python3 -m llava.serve.cli --model-path /Volumes/M1\ Macmini\ backup/scripts/llava-v1.5-13b --image-file /Users/admin/Downloads/Gustav\ Vasa-1.jpg --load-4bit --device=mps 

Log:

/Users/admin/scripts/miniconda3/envs/llava/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
'NoneType' object has no attribute 'cadam32bit_grad_fp32'
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [06:39<00:00, 133.12s/it]
USER: What is in the image
Traceback (most recent call last):
  File "/Users/admin/scripts/miniconda3/envs/llava/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/admin/scripts/miniconda3/envs/llava/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Volumes/M1 Macmini backup/scripts/LLaVA/llava/serve/cli.py", line 125, in <module>
    main(args)
  File "/Volumes/M1 Macmini backup/scripts/LLaVA/llava/serve/cli.py", line 87, in main
    input_ids = tokenizer_image_token(prompt, tokenizer, IMAGE_TOKEN_INDEX, return_tensors='pt').unsqueeze(0).cuda()
  File "/Users/admin/scripts/miniconda3/envs/llava/lib/python3.10/site-packages/torch/cuda/__init__.py", line 239, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
ASSISTANT: %                                                                                              

Also tried it with Gradio UI. When trying to run it with the Gradio UI, it crashes when submitting an image through the UI:

Command:

env PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 python3 -m llava.serve.cli --model-path /Volumes/M1\ Macmini\ backup/scripts/llava-v1.5-13b --image-file /Users/admin/Downloads/Gustav\ Vasa-1.jpg --load-4bit --device=mps 

Log:

/Users/admin/scripts/miniconda3/envs/llava/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
'NoneType' object has no attribute 'cadam32bit_grad_fp32'
2023-10-29 19:12:54 | INFO | model_worker | args: Namespace(host='0.0.0.0', port=40000, worker_address='http://localhost:40000', controller_address='http://localhost:10000', model_path='/Volumes/M1 Macmini backup/scripts/llava-v1.5-13b', model_base=None, model_name=None, device='mps', multi_modal=False, limit_model_concurrency=5, stream_interval=1, no_register=False, load_8bit=False, load_4bit=True)
2023-10-29 19:12:54 | INFO | model_worker | Loading the model llava-v1.5-13b on worker af2c5e ...
Loading checkpoint shards:   0%|                                                                                                                                                                                                                                   | 0/3 [00:00<?, ?it/s]
Loading checkpoint shards:  33%|████████████████████████████████████████████████████████████████████████▋                                                                                                                                                 | 1/3 [02:40<05:20, 160.23s/it]
Loading checkpoint shards:  67%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎                                                                        | 2/3 [05:20<02:40, 160.41s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [06:36<00:00, 121.55s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [06:36<00:00, 132.09s/it]
2023-10-29 19:19:30 | ERROR | stderr | 
2023-10-29 19:19:44 | INFO | model_worker | Register to controller
2023-10-29 19:19:44 | ERROR | stderr | INFO:     Started server process [82066]
2023-10-29 19:19:44 | ERROR | stderr | INFO:     Waiting for application startup.
2023-10-29 19:19:44 | ERROR | stderr | INFO:     Application startup complete.
2023-10-29 19:19:44 | ERROR | stderr | INFO:     Uvicorn running on http://0.0.0.0:40000 (Press CTRL+C to quit)
2023-10-29 19:19:59 | INFO | model_worker | Send heart beat. Models: ['llava-v1.5-13b']. Semaphore: None. global_counter: 0
2023-10-29 19:20:14 | INFO | model_worker | Send heart beat. Models: ['llava-v1.5-13b']. Semaphore: None. global_counter: 0
2023-10-29 19:20:29 | INFO | model_worker | Send heart beat. Models: ['llava-v1.5-13b']. Semaphore: None. global_counter: 0
2023-10-29 19:20:44 | INFO | model_worker | Send heart beat. Models: ['llava-v1.5-13b']. Semaphore: None. global_counter: 0
2023-10-29 19:20:59 | INFO | model_worker | Send heart beat. Models: ['llava-v1.5-13b']. Semaphore: None. global_counter: 0
2023-10-29 19:21:15 | INFO | model_worker | Send heart beat. Models: ['llava-v1.5-13b']. Semaphore: None. global_counter: 0
2023-10-29 19:21:30 | INFO | model_worker | Send heart beat. Models: ['llava-v1.5-13b']. Semaphore: None. global_counter: 0
2023-10-29 19:21:45 | INFO | model_worker | Send heart beat. Models: ['llava-v1.5-13b']. Semaphore: None. global_counter: 0
2023-10-29 19:22:00 | INFO | model_worker | Send heart beat. Models: ['llava-v1.5-13b']. Semaphore: None. global_counter: 0
2023-10-29 19:22:15 | INFO | model_worker | Send heart beat. Models: ['llava-v1.5-13b']. Semaphore: None. global_counter: 0
2023-10-29 19:22:30 | INFO | model_worker | Send heart beat. Models: ['llava-v1.5-13b']. Semaphore: None. global_counter: 0
2023-10-29 19:22:45 | INFO | model_worker | Send heart beat. Models: ['llava-v1.5-13b']. Semaphore: None. global_counter: 0
2023-10-29 19:22:50 | INFO | stdout | INFO:     127.0.0.1:63202 - "POST /worker_get_status HTTP/1.1" 200 OK
2023-10-29 19:22:54 | INFO | model_worker | Send heart beat. Models: ['llava-v1.5-13b']. Semaphore: Semaphore(value=4, locked=False). global_counter: 1
2023-10-29 19:22:54 | INFO | stdout | INFO:     127.0.0.1:63208 - "POST /worker_generate_stream HTTP/1.1" 200 OK
2023-10-29 19:22:54 | ERROR | stderr | /Users/admin/scripts/miniconda3/envs/llava/lib/python3.10/site-packages/transformers/generation/utils.py:725: UserWarning: MPS: no support for int64 repeats mask, casting it to int32 (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/mps/operations/Repeat.mm:236.)
2023-10-29 19:22:54 | ERROR | stderr |   input_ids = input_ids.repeat_interleave(expand_size, dim=0)
loc("varianceEps"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/c2cb9645-dafc-11ed-aa26-6ec1e3b3f7b3/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":228:0)): error: input types 'tensor<1x577x1xf16>' and 'tensor<1xf32>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).
zsh: abort      env PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 python -m llava.serve.model_worker  
(llava) admin@Minisomistrator LLaVA % /Users/admin/scripts/miniconda3/envs/llava/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

(llava) admin@Minisomistrator LLaVA %                                                                                      
alexmead commented 10 months ago

@vashat, I too am getting this error. Any progress? I'm diving into it now, will post any resolution.

haotian-liu commented 10 months ago

macOS support is updated just now, with quantization coming later. Please pull the latest code base and install/run following the instructions here. You may also try llama.cpp.

vashat commented 10 months ago

Hi! Unfortunately still getting the same error after pulling latest code and removing quantize parameters @haotian-liu :

env PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 python3 -m llava.serve.cli --model-path /Volumes/M1\ Macmini\ backup/scripts/llava-v1.5-13b --image-file /Users/admin/Downloads/Gustav\ Vasa-1.jpg --device=mps 
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [07:19<00:00, 146.38s/it]
USER: What is in the image?
/Users/admin/scripts/miniconda3/envs/llava/lib/python3.10/site-packages/transformers/generation/utils.py:725: UserWarning: MPS: no support for int64 repeats mask, casting it to int32 (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/mps/operations/Repeat.mm:236.)
  input_ids = input_ids.repeat_interleave(expand_size, dim=0)
loc("varianceEps"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/c2cb9645-dafc-11ed-aa26-6ec1e3b3f7b3/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":228:0)): error: input types 'tensor<1x577x1xf16>' and 'tensor<1xf32>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).
zsh: abort      env PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 python3 -m llava.serve.cli     
(llava) admin@Minisomistrator LLaVA % /Users/admin/scripts/miniconda3/envs/llava/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
haotian-liu commented 10 months ago

Have you reinstalled PyTorch pip install torch==2.1.0 torchvision==0.16.0

vashat commented 10 months ago

Yes it works when I reinstall to these versions. Thank you for the assistance!

amarflybot commented 10 months ago

Thanks a lot, Working after having pip install torch==2.1.0 torchvision==0.16.0