Open Zctoylm0927 opened 1 month ago
@Zctoylm0927 thanks for reaching out, do you observe the same behavior on the latest 2024.4 release or nightly release? If you can please share minimal sample reproducer and IR model. Also provide the NPU driver version you are using.
Thanks for reply. I have tried 2024.4 release,
And still the same mistake.
I only shared the xml file before, now I upload the bin file together. cross.zip I think my NPU driver version is v1.6.0 cause it matches release date.
> ls -ll | grep libnpu_driver_compiler.so -rw-r--r-- 1 root root 94700456 8月 15 01:06 libnpu_driver_compiler.so
Btw, I don't know how to check the npu driver version information. How can I check it?
OpenVINO Version
2024.3
Operating System
Ubuntu 20.04 (LTS)
Device used for inference
NPU
Framework
PyTorch
Model used
torch.nn.MultiheadAttention
Issue description
I have handwritten a Transformer model that includes three parts: self-attention, cross-attention, and MLP. It can run on the NPU, but when I run only the cross-attention part, the following problem occurs.
RuntimeError: Exception from src/inference/src/cpp/infer_request.cpp:223: Exception from src/plugins/intel_npu/src/backend/include/zero_utils.hpp:21: L0 zeFenceHostSynchronize result: ZE_RESULT_ERROR_UNKNOWN, code 0x7ffffffe - an action is required to complete the desired operation
Step-by-step reproduction
my cross_attention code is here:
And followed by my convert code:
When I try to use the ov cross block, the problem occurs:
t = compiled_model(example_input)
But I use the original model, there is no such problem. And here is my cross block xml.
cross.xml.txt
Relevant log output
Issue submission checklist