where to download a converted model for Tiny Llama Chat for NPU? (int4, or int8)

openvinotoolkit / openvino.genai

Run Generative AI models using native OpenVINO C++ API

Apache License 2.0

107 stars 145 forks source link

where to download a converted model for Tiny Llama Chat for NPU? (int4, or int8) #665

Open Edward-Lin opened 1 month ago

Edward-Lin commented 1 month ago

Dears, I'd like to try tiny llama chat on NPU, but it always showed the error about static shape. might you please show me where I can download a quanted model for it, both Int4 and int8? or tell me the ways to convert it. thanks a lot,

TolyaTalamanov commented 2 weeks ago

Hi @Edward-Lin,

You can find the documentation for running Tiny LLaMA on NPU, including how to convert and quantize model to int4, here: https://openvino-doc.iotg.sclab.intel.com/genai-npu-preview-master/learn-openvino/llm_inference_guide/genai-guide-npu.html