Open junruizh2021 opened 3 days ago
@pereanub ,could you please comment?
Hello! You are correct, the CompilerAdapter also uses level-zero API and level-zero graph extension API to interact with the driver: .
As you also found, CompilerAdapter would use pfnCreate2 with ZE_GRAPH_FORMAT_NGRAPH_LITE when compiling a model, and pfnCreate2 with the ZE_GRAPH_FORMAT_NATIVE when importing a precompiled model.
The confusion in the diagram is caused by the name of our backend (LevelZero). This is the plugin component that binds an OpenVINO infer request to level-zero primitives like command queue and command lists and executes the model on the device using these primitives.
Historically, NPU plugin supported multiple backends. Among all the others, the one capable of interacting with a level-zero driver was called "LevelZero". Since we currently support only level-zero drivers we could simplify these naming in the future and update the diagram as well. We will try to avoid such confusions in the future. Thank you for your feedback!
@PatrikStepan Thanks so much for your reply. This means that if I run blob format files directly with OpenVINO + NPU plugin, such as blob file in Intel/sd-1.5-controlnet-scribble-quantized, it can run directly. If I'm using OpenVINO IR model files, then the NPU compiler needs to perform serialization and deserialization operations.
Is this interpretation correct?
Additionally, I have two questions to verify with you:
Yes, your interpretation is correct. When you use ( import) a precompiled model (blob) it can be parsed and executed by the driver directly. When you use an IR the flow is the following:
Are the prebuild ELF files in the NPU plugin open source? It seems to contain some non-linear operators. https://github.com/openvinotoolkit/npu_plugin/tree/develop is a public snapshot of the NPU Compiler, not of the NPU plugin. Yes, the name is extremely confusing only because the same repository contained the real plugin source code as well. That repository will soon be renamed to npu_compiler. But those SW kernels are part of the compiler ( thus the driver) , not plugin.
The blob file generated by the NPU driver includes the prebuilt kernels used by that model only. The compiler library released inside the driver contains all prebuilt kernels.
@PatrikStepan Clear explanation. You mean the ELF file will definitely be included in the blob file generated by the compiler.
But as a SW_kernel file, the ELF file can only be pre-built into the compiler by the npu compiler developers, right?
If I, as a user or third-party developer, need to add a new SW_kernel to the npu compiler, is there a way to do this?
But as a SW_kernel file, the ELF file can only be pre-built into the compiler by the npu compiler developers, right? Correct. While we can create public snapshots of our compiler, we still need internal NPU tools to build SW kernels. And unfortunately we are not ready to publish those tools in opensource. This is one of the reasons why those kernels were published as prebuilt binaries. There is no current way for you to add a new SW kernel to the compiler unfortunately.
Documentation link
https://github.com/openvinotoolkit/openvino/blob/master/src/plugins/intel_npu/README.md
Description
I have some questions about the high-level architecture diagram in the README.md that shows the OpenVINO NPU design:
I think it should use Level Zero interfaces to load pre-compiled models, similar to the execution part on the right side.
In reality, compilation and execution sometimes operate sequentially. OpenVINO NPU can load OpenVINO IR models, compile them and pass them to the NPU driver for execution, or it can directly load pre-compiled blob models. I noticed that Level Zero's ze_graph can load pre-compiled models - is this one of the messages that the architecture diagram is trying to convey?
Based on the code provided, we can see that ze_graph supports loading pre-compiled models through the
ZE_GRAPH_FORMAT_NATIVE
format:And the graph descriptor allows loading both pre-compiled blobs and IR models:
This suggests that Level Zero provides interfaces for both compilation and execution phases, though the architecture diagram may be simplifying the relationship between these components.
Issue submission checklist