Building model vila-7b
You are using a model of type llava_llama to instantiate a model of type llama. This is not supported for all configurations of models and can yield errors.
Traceback (most recent call last):
File "/root/.pyenv/versions/3.10.13/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/root/.pyenv/versions/3.10.13/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/llm-awq/awq/entry.py", line 299, in
main()
File "/llm-awq/awq/entry.py", line 239, in main
model, enc = build_model_and_enc(args.model_path)
File "/llm-awq/awq/entry.py", line 93, in build_model_and_enc
enc, model, image_processor, context_len = load_pretrained_model(
File "/root/.pyenv/versions/3.10.13/lib/python3.10/site-packages/llava/model/builder.py", line 118, in load_pretrained_model
model = LlavaLlamaForCausalLM.from_pretrained(model_path, config=config, low_cpu_mem_usage=True, *kwargs)
File "/root/.pyenv/versions/3.10.13/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3462, in from_pretrained
model = cls(config, model_args, **model_kwargs)
TypeError: LlavaLlamaForCausalLM.init() got an unexpected keyword argument 'use_cache'
`
I can successfully run vila-7b but when i want to generate awq weights using the "vila-7b-w4-g128-v2.pt" from "https://huggingface.co/Efficient-Large-Model/VILA-7b-4bit-awq/tree/main" I got the error below. Anyone facing this? or anyone who managed to get inference with vila-7b-awq?
note to developers: In "https://github.com/mit-han-lab/llm-awq/blob/main/scripts/vila_example.sh", it is stated that search results are shared, but it does not seem under "https://huggingface.co/datasets/mit-han-lab/awq-model-zoo/tree/main".
` root@e9118846cb22:/llm-awq# python -m awq.entry --model_path vila-7b --w_bit 4 --q_group_size 128 --load_awq VILA-7b-4bit-awq/vila-7b-w4-g128-v2.pt --q_backend real --dump_quant quant_cache/llama-2-7b-chat-w4-g128-awq.pt Quantization config: {'zero_point': True, 'q_group_size': 128}