Error while generating real quantized weights for VILA

I can successfully run vila-7b but when i want to generate awq weights using the "vila-7b-w4-g128-v2.pt" from "https://huggingface.co/Efficient-Large-Model/VILA-7b-4bit-awq/tree/main" I got the error below. Anyone facing this? or anyone who managed to get inference with vila-7b-awq?

note to developers: In "https://github.com/mit-han-lab/llm-awq/blob/main/scripts/vila_example.sh", it is stated that search results are shared, but it does not seem under "https://huggingface.co/datasets/mit-han-lab/awq-model-zoo/tree/main".

` root@e9118846cb22:/llm-awq# python -m awq.entry --model_path vila-7b --w_bit 4 --q_group_size 128 --load_awq VILA-7b-4bit-awq/vila-7b-w4-g128-v2.pt --q_backend real --dump_quant quant_cache/llama-2-7b-chat-w4-g128-awq.pt Quantization config: {'zero_point': True, 'q_group_size': 128}

Building model vila-7b You are using a model of type llava_llama to instantiate a model of type llama. This is not supported for all configurations of models and can yield errors. Traceback (most recent call last): File "/root/.pyenv/versions/3.10.13/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/root/.pyenv/versions/3.10.13/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/llm-awq/awq/entry.py", line 299, in main() File "/llm-awq/awq/entry.py", line 239, in main model, enc = build_model_and_enc(args.model_path) File "/llm-awq/awq/entry.py", line 93, in build_model_and_enc enc, model, image_processor, context_len = load_pretrained_model( File "/root/.pyenv/versions/3.10.13/lib/python3.10/site-packages/llava/model/builder.py", line 118, in load_pretrained_model model = LlavaLlamaForCausalLM.from_pretrained(model_path, config=config, low_cpu_mem_usage=True, *kwargs) File "/root/.pyenv/versions/3.10.13/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3462, in from_pretrained model = cls(config, model_args, **model_kwargs) TypeError: LlavaLlamaForCausalLM.init() got an unexpected keyword argument 'use_cache' `

mit-han-lab / llm-awq

Error while generating real quantized weights for VILA #160