Open bioinfomagic opened 10 months ago
macOS support is updated just now, with quantization coming later. Please pull the latest code base and install/run following the instructions here. You may also try https://github.com/ggerganov/llama.cpp/pull/3436.
macOS support is updated just now, with quantization coming later. Please pull the latest code base and install/run following the instructions here. You may also try ggerganov/llama.cpp#3436.
Thnanks very much, I have run the following, to update to the lastest git repo, git pull pip install -e .
And have the output, Successfully installed llava-1.1.3
and start the process again, but the same issue persist, text interaction works, once load the pics, the error pops out. NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE.
@bioinfomagic what about try with the cli?
python -m llava.serve.cli \
--model-path liuhaotian/llava-v1.5-7b \
--image-file "https://llava-vl.github.io/static/images/view.jpg" \
--device mps
@bioinfomagic what about try with the cli?
python -m llava.serve.cli \ --model-path liuhaotian/llava-v1.5-7b \ --image-file "https://llava-vl.github.io/static/images/view.jpg" \ --device mps
Oh, thanks very much Haotian, it works on my mac now.
@bioinfomagic
Btw, how is the speed on M1 Max and how much RAM do you have?
@bioinfomagic
Btw, how is the speed on M1 Max and how much RAM do you have?
The webUI works perfectly, now I can run LLaVA on my mac with similar output as the LLaVA online demo. Just with a slower speed. I somehow used 85GB to 125GB of ram (depends on the run time) and the speed is around 2~3 words per second. I can see the CPU usage shows 105% GPU usage is around 35% according to the activity monitor. In regarding of speed, I can see other LLM apps, when I choose metal enabled I can see the CPU load drops to 30% and GPU loads gose up to 100% or 200% and the speed is much faster same as online chatgpt 10 words per second.
For the CLI it seems still have issues, not sure if it is my problem, but webUI works very well.
-m llava.serve.cli --model-path liuhaotian/llava-v1.5-7b --image-file "https://llava-vl.github.io/static/images/view.jpg" --device mps
[2023-10-31 20:28:04,996] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:03<00:00, 1.82s/it]
USER: hello
Traceback (most recent call last):
File "
But If I add the --load4bit, it won't load,
python3 -m llava.serve.cli --model-path liuhaotian/llava-v1.5-7b --image-file "https://llava-vl.github.io/static/images/view.jpg" --load-4bit --device mps [2023-10-31 20:26:52,172] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) Traceback (most recent call last): File "/opt/homebrew/Cellar/python@3.11/3.11.6/Frameworks/Python.framework/Versions/3.11/lib/python3.11/importlib/metadata/init.py", line 563, in from_name return next(cls.discover(name=name)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ StopIteration
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "
It seems that you haven't pulled the latest code base:
https://github.com/haotian-liu/LLaVA/blob/main/llava/serve/cli.py#L87
It seems that you haven't pulled the latest code base:
https://github.com/haotian-liu/LLaVA/blob/main/llava/serve/cli.py#L87
Thank you very much, you are right, I somehow didn't update the code base correctly, now I updated it works perfectly now for both CLI and WebUI.
python3 -m llava.serve.cli --model-path liuhaotian/llava-v1.5-7b --image-file "https://llava-vl.github.io/static/images/view.jpg" --device mps [2023-10-31 20:50:05,977] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:03<00:00, 1.69s/it] USER: hey ASSISTANT: Hello! How can I help you today? USER: summarize the pic
Hmmm. I previously thought it was because of my poor M2 16GB, so it seems that the MPS still needs some optimization.
Hmmm. I previously thought it was because of my poor M2 16GB, so it seems that the MPS still needs some optimization.
I totally agree, I can see LMstudio has a much better metal support, it runs 70B model much faster, when metal enable, clearly noticed the speed difference.
Describe the issue
Issue:
I have enable the m1 chip using the --device mps but still have the errors.
Command:
Log:
Screenshots: You may attach screenshots if it better explains the issue.