Open Edward-Lin opened 2 weeks ago
576
what do you mean?
I did not know what u mean
@Edward-Lin the referenced PR (not merged to master branch yet), seems to contain code changes to support LLM pipeline with static shapes out-of-the-box for NPU plugin. Feel free to try the PR to run the chat_sample , or wait until merge to master branch. Hope this helps.
@Edward-Lin the referenced PR (not merged to master branch yet), seems to contain code changes to support LLM pipeline with static shapes out-of-the-box for NPU plugin. Feel free to try the PR to run the chat_sample , or wait until merge to master branch. Hope this helps.
Thanks, BTW, what is PR, and how can I get it?
I think I've got it, and have a try. will update later. Thanks,
https://github.com/TolyaTalamanov/openvino.genai/tree/at/static-llm-pipeline-out-of-the-box I've checked out the code, but it only support CPP version, and I need to try to compile it, and not sure if it can run on NPU or not? but from the 576, it should not work yet.
Hi! @aoke79,
Once https://github.com/openvinotoolkit/openvino.genai/pull/576/ is merged, you will be able to run LLMPipeline
on NPU out-of-the-box by using the following code shippet:
ov::genai::LLMPipeline pipe(model_path, "NPU");
ov::genai::GenerationConfig config;
config.max_new_tokens = 100; // optional
std::cout << pipe.generate("Why is the Sun yellow?", config) << std::endl;
Unfortunately, it doesn't support chat mode (will be introduced there: https://github.com/openvinotoolkit/openvino.genai/pull/580), so chat_sample.cpp
cannot be used so far.
Thanks, Tolya, I updated code, and found 576 was merged, but it seems not work for me. when I changed the "device = 'CPU' # GPU can be used as well" to "GPU" or "NPU", in greedy_causal_lm.py or beam_search_causal_lm.py, neither of them worked. please help check. Thanks
do you please have the converted tiny llama chat model, which I can try? I suppose that the model, I converted on my side, is not correct. thanks,
can anyone update?
Thanks, Tolya, I updated code, and found 576 was merged, but it seems not work for me. when I changed the "device = 'CPU' # GPU can be used as well" to "GPU" or "NPU", in greedy_causal_lm.py or beam_search_causal_lm.py, neither of them worked. please help check. Thanks
Hi @aoke79, @Edward-Lin the following snippet should work with regular openvino model (same as for other plugins)
ov::genai::LLMPipeline pipe(model_path, "NPU");
ov::genai::GenerationConfig config;
config.max_new_tokens = 100; // optional
std::cout << pipe.generate("Why is the Sun yellow?", config) << std::endl;
Additionally, I'd also expect chat_sample
to work with the latest master
.
As for greedy_causal_lm.py
and beam_search_causal_lm.py
-- they weren't considered during integration, perhaps will be enabled in the future.
where can I please get the sample code to run LLM on NPU? Thanks,