Open Zctoylm0927 opened 1 week ago
@Zctoylm0927 thanks for reaching out, do you observe the same behavior on the latest 2024.4 release or nightly release? If you can please share minimal sample reproducer and IR model. Also provide the NPU driver version you are using.
OpenVINO Version
2024.3
Operating System
Ubuntu 20.04 (LTS)
Device used for inference
NPU
Framework
PyTorch
Model used
torch.nn.MultiheadAttention
Issue description
I have handwritten a Transformer model that includes three parts: self-attention, cross-attention, and MLP. It can run on the NPU, but when I run only the cross-attention part, the following problem occurs.
RuntimeError: Exception from src/inference/src/cpp/infer_request.cpp:223: Exception from src/plugins/intel_npu/src/backend/include/zero_utils.hpp:21: L0 zeFenceHostSynchronize result: ZE_RESULT_ERROR_UNKNOWN, code 0x7ffffffe - an action is required to complete the desired operation
Step-by-step reproduction
my cross_attention code is here:
And followed by my convert code:
When I try to use the ov cross block, the problem occurs:
t = compiled_model(example_input)
But I use the original model, there is no such problem. And here is my cross block xml.
cross.xml.txt
Relevant log output
Issue submission checklist