openvinotoolkit / openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
https://docs.openvino.ai
Apache License 2.0
7.38k stars 2.31k forks source link

[Bug]: NPU compile: L0 zeFenceHostSynchronize result: ZE_RESULT_ERROR_UNKNOWN #27099

Open Zctoylm0927 opened 1 month ago

Zctoylm0927 commented 1 month ago

OpenVINO Version

2024.3

Operating System

Ubuntu 20.04 (LTS)

Device used for inference

NPU

Framework

PyTorch

Model used

torch.nn.MultiheadAttention

Issue description

I have handwritten a Transformer model that includes three parts: self-attention, cross-attention, and MLP. It can run on the NPU, but when I run only the cross-attention part, the following problem occurs.

RuntimeError: Exception from src/inference/src/cpp/infer_request.cpp:223: Exception from src/plugins/intel_npu/src/backend/include/zero_utils.hpp:21: L0 zeFenceHostSynchronize result: ZE_RESULT_ERROR_UNKNOWN, code 0x7ffffffe - an action is required to complete the desired operation

Step-by-step reproduction

my cross_attention code is here:

class cross_block(nn.Module):
    def __init__(self, hidden_size=1200, num_heads=16):
        super(cross_block, self).__init__()
        self.head_dim = hidden_size // num_heads
        self.dim = hidden_size
        self.d_model = hidden_size
        self.num_heads = num_heads 

        self.mha = nn.MultiheadAttention(embed_dim=self.d_model, num_heads=self.num_heads)

    def cross_attn(self, q, k, v):
        N,B,C = q.shape
        x, output_weights  = self.mha(q, k, v)
        x = x.view(2, N//2, C) # just for testing
        return x

    def forward(self, q: torch.Tensor, k: torch.Tensor, v: torch.Tensor) -> torch.Tensor:
        return self.cross_attn(q, k, v)

And followed by my convert code:

example_input = {
    "q": torch.randn(q_shape),
    "k": torch.randn(k_shape),
    "v": torch.randn(v_shape),
}

model = cross_block()
print("--------after model-------")
model = ov.convert_model(model, input=[[1920, 1, 1200], [300, 1, 1200], [300, 1, 1200]], example_input=example_input)
ov.save_model(model, CROSS_OV_PATH)
print("--------after convert-------")
compiled_model = core.compile_model(model, device_name="NPU") #check
print("--------after compile-------")

When I try to use the ov cross block, the problem occurs:

t = compiled_model(example_input)

But I use the original model, there is no such problem. And here is my cross block xml.

cross.xml.txt

Relevant log output

Traceback (most recent call last):
  File "/home/mla/model.py", line 50, in <module>
    t = compiled_model(example_input)
  File "/home/xxx/anaconda3/envs/env1/lib/python3.10/site-packages/openvino/runtime/ie_api.py", line 388, in __call__
    return self._infer_request.infer(
  File "/home/xxx/anaconda3/envs/env1/lib/python3.10/site-packages/openvino/runtime/ie_api.py", line 132, in infer
    return OVDict(super().infer(_data_dispatch(
RuntimeError: Exception from src/inference/src/cpp/infer_request.cpp:223:
Exception from src/plugins/intel_npu/src/backend/include/zero_utils.hpp:21:
L0 zeFenceHostSynchronize result: ZE_RESULT_ERROR_UNKNOWN, code 0x7ffffffe - an action is required to complete the desired operation

Issue submission checklist

avitial commented 1 month ago

@Zctoylm0927 thanks for reaching out, do you observe the same behavior on the latest 2024.4 release or nightly release? If you can please share minimal sample reproducer and IR model. Also provide the NPU driver version you are using.

Zctoylm0927 commented 1 month ago

Thanks for reply. I have tried 2024.4 release,

image

And still the same mistake.

image

I only shared the xml file before, now I upload the bin file together. cross.zip I think my NPU driver version is v1.6.0 cause it matches release date.

> ls -ll | grep libnpu_driver_compiler.so -rw-r--r-- 1 root root 94700456 8月 15 01:06 libnpu_driver_compiler.so

Btw, I don't know how to check the npu driver version information. How can I check it?