For the example in this page: https://github.com/mit-han-lab/llm-awq/tree/main/tinychat#usage
You can easily inference on images:
python vlm_demo_new.py \
--model-path VILA1.5-13b-AWQ \
--quant-path VILA1.5-13b-AWQ/llm \
--precision W4A16 \
--image-file /PATH/TO/INPUT/IMAGE \
--vis-image #Optional
However, how do you run video QA inference? Can you provide an example?
For the example in this page: https://github.com/mit-han-lab/llm-awq/tree/main/tinychat#usage You can easily inference on images: python vlm_demo_new.py \ --model-path VILA1.5-13b-AWQ \ --quant-path VILA1.5-13b-AWQ/llm \ --precision W4A16 \ --image-file /PATH/TO/INPUT/IMAGE \ --vis-image #Optional
However, how do you run video QA inference? Can you provide an example?