if there is a sample code to run on NPU?

openvinotoolkit / openvino.genai

Run Generative AI models using native OpenVINO C++ API

Apache License 2.0

81 stars 117 forks source link

if there is a sample code to run on NPU? #579

Open Edward-Lin opened 2 weeks ago

Edward-Lin commented 2 weeks ago

where can I please get the sample code to run LLM on NPU? Thanks,

Wovchena commented 2 weeks ago

https://github.com/openvinotoolkit/openvino.genai/pull/576

Edward-Lin commented 2 weeks ago

576

what do you mean?

Edward-Lin commented 2 weeks ago

I did not know what u mean

avitial commented 1 week ago

@Edward-Lin the referenced PR (not merged to master branch yet), seems to contain code changes to support LLM pipeline with static shapes out-of-the-box for NPU plugin. Feel free to try the PR to run the chat_sample , or wait until merge to master branch. Hope this helps.

aoke79 commented 1 week ago

@Edward-Lin the referenced PR (not merged to master branch yet), seems to contain code changes to support LLM pipeline with static shapes out-of-the-box for NPU plugin. Feel free to try the PR to run the chat_sample , or wait until merge to master branch. Hope this helps.

Thanks, BTW, what is PR, and how can I get it?

aoke79 commented 1 week ago

I think I've got it, and have a try. will update later. Thanks,

aoke79 commented 1 week ago

https://github.com/TolyaTalamanov/openvino.genai/tree/at/static-llm-pipeline-out-of-the-box I've checked out the code, but it only support CPP version, and I need to try to compile it, and not sure if it can run on NPU or not? but from the 576, it should not work yet.

TolyaTalamanov commented 1 week ago

Hi! @aoke79,

Once https://github.com/openvinotoolkit/openvino.genai/pull/576/ is merged, you will be able to run LLMPipeline on NPU out-of-the-box by using the following code shippet:

ov::genai::LLMPipeline pipe(model_path, "NPU");
ov::genai::GenerationConfig config;
config.max_new_tokens = 100; // optional
std::cout << pipe.generate("Why is the Sun yellow?", config) << std::endl;

Unfortunately, it doesn't support chat mode (will be introduced there: https://github.com/openvinotoolkit/openvino.genai/pull/580), so chat_sample.cpp cannot be used so far.

aoke79 commented 1 week ago

Thanks, Tolya, I updated code, and found 576 was merged, but it seems not work for me. when I changed the "device = 'CPU' # GPU can be used as well" to "GPU" or "NPU", in greedy_causal_lm.py or beam_search_causal_lm.py, neither of them worked. please help check. Thanks

Edward-Lin commented 6 days ago

do you please have the converted tiny llama chat model, which I can try? I suppose that the model, I converted on my side, is not correct. thanks,

aoke79 commented 3 days ago

can anyone update?

TolyaTalamanov commented 2 days ago

Thanks, Tolya, I updated code, and found 576 was merged, but it seems not work for me. when I changed the "device = 'CPU' # GPU can be used as well" to "GPU" or "NPU", in greedy_causal_lm.py or beam_search_causal_lm.py, neither of them worked. please help check. Thanks

Hi @aoke79, @Edward-Lin the following snippet should work with regular openvino model (same as for other plugins)

ov::genai::LLMPipeline pipe(model_path, "NPU");
ov::genai::GenerationConfig config;
config.max_new_tokens = 100; // optional
std::cout << pipe.generate("Why is the Sun yellow?", config) << std::endl;

Additionally, I'd also expect chat_sample to work with the latest master.

As for greedy_causal_lm.py and beam_search_causal_lm.py -- they weren't considered during integration, perhaps will be enabled in the future.