Open raj-ritu17 opened 6 months ago
Hi, @raj-ritu17 Could you please try the latest ipex-llm (2.1.0b20240527) and merge the adapter to original model as we discussed in https://github.com/intel-analytics/ipex-llm/issues/11135 ? Then you could use the merged model do inference, following https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mistral for example.
Scenario:
- accelerate launch -m inference lora.yml --lora_model_dir="./qlora-out/"
after submitting instruction below issue occurred [ it also says certain quantization is not applicable on CPU, while we are running on GPU and did the FT on GPU"]
logs
inference file content:
adapter_config