Open isaac-vidas opened 8 months ago
Update: I was able to run the AWQ version with the vlm_demo.py
script.
python vlm_demo.py \
--model_type llava \
--model-path ~/llava-v1.5-7b \
--quant-path ~/quant_cache/llava-v1.5-7b-w4-g128-awq-v2.pt \
--image-file https://llava.hliu.cc/file=/nobackup/haotian/tmp/gradio/ca10383cc943e99941ecffdc4d34c51afb2da472/extreme_ironing.jpg
I had to update the llm-awq/tinychat/utils/prompt_templates.py
in order to support llava but aside from that it's working:
python vlm_demo.py \
--model_type llava \
--model-path ~/llava-v1.5-7b \
--quant-path ~/quant_cache/llava-v1.5-7b-w4-g128-awq-v2.pt \
--image-file https://llava.hliu.cc/file=/nobackup/haotian/tmp/gradio/ca10383cc943e99941ecffdc4d34c51afb2da472/extreme_ironing.jpg
/opt/conda/envs/quantize_llava/lib/python3.10/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.__get__(instance, owner)()
real weight quantization...(init only): 100%|██████████████████████████████████████████████████████████| 32/32 [00:00<00:00, 83.69it/s]
Loading checkpoint: 100%|████████████████████████████████████████████████████████████████████████████████| 1/1 [00:03<00:00, 3.57s/it]
==================================================
USER: what is in the picture?
--------------------------------------------------
ASSISTANT: The image features a man standing on the back of a yellow truck, holding a clothes iron. The truck is driving down a busy city street, with other vehicles such as a taxi and a car visible in the scene. The man appears to be ironing clothes while riding in the back of the truck.
==================================================
USER: where has this picture been taken?
--------------------------------------------------
ASSISTANT: This picture has been taken in a busy city street, with various vehicles and pedestrians present.
Hi @isaac-vidas,
We've changed the weight packing format in our latest PR. This PR significantly improves the context stage and decoding latency of TinyChat. As a result, weights generated with commits prior to this PR need to be re-packed. Shang has implemented a script for this. Therefore, I believe the first error you saw is related to weight packing format. Cc @ys-2020.
Best, Haotian
When trying to run the llava_demo.ipynb example:
The behavior is slightly different if I also generate the quantized checkpoints as part of the notebook or not. In some combination, I also get the following warning
and in the output below
I've tried with a new version of transformers as well as an older version of transformers (
4.32.0
). Any idea on how to get the demo running?Also, I was able to run the
TheBloke/llava-v1.5-13B-AWQ
AWQ version on both vllm and sglang. Is that version considered as v1 format?Thanks in advance!