Open Edward-Lin opened 1 month ago
Hi @Edward-Lin,
You can find the documentation for running Tiny LLaMA on NPU, including how to convert and quantize model to int4, here: https://openvino-doc.iotg.sclab.intel.com/genai-npu-preview-master/learn-openvino/llm_inference_guide/genai-guide-npu.html
Dears, I'd like to try tiny llama chat on NPU, but it always showed the error about static shape. might you please show me where I can download a quanted model for it, both Int4 and int8? or tell me the ways to convert it. thanks a lot,