[Bug]: Linear + Split + Einsum AC error on GPU

OpenVINO Version

2024.0.0-14509-34caeefd078-releases/2024/0

Operating System

Ubuntu 20.04 (LTS)

Device used for inference

GPU

Framework

PyTorch

Model used

N/A

Issue description

When I change "CPU" backend to "GPU for Openvino model inference. I find that when a linear model is followed by an einsum operation, results of OV will be different from Torch and ONNX. This only happens in GPU, so it may be caused by GPU plugin. This can be worked around by replacing y = torch.einsum("bdn,bn->bd", self.step_ssm_state.to(dtype), C) with y = torch.sum(self.step_ssm_state.to(dtype) * C.unsqueeze(1), dim=2)

Step-by-step reproduction

import numpy as np
import torch
import openvino as ov
import onnxruntime
import random
random_seed = 42
random.seed(random_seed)
# Set random seed for NumPy
np.random.seed(random_seed)
torch.manual_seed(random_seed)
class TestModel(torch.nn.Module):
    def __init__(self, *args, **kwargs) -> None:
        super().__init__(*args, **kwargs)
        d_inner = 20
        self.dt_rank = 8
        self.d_state = 16
        self.step_ssm_state = torch.randn(size=(1, d_inner, self.d_state))
        self.x_proj = torch.nn.Linear(
            d_inner, self.dt_rank + self.d_state * 2, bias=False
        )
    def forward(self, x):
        dtype = torch.float32
        x_db = self.x_proj(x)  # (B dt_rank+2*d_state) x_proj is large (~1500x1500 or 750x750)
        dt, B, C = torch.split(x_db, 
                               [self.dt_rank, self.d_state, self.d_state], 
                               dim=-1)
        y = torch.einsum("bdn,bn->bd", self.step_ssm_state.to(dtype), C)
        # y = torch.sum(self.step_ssm_state.to(dtype) * C.unsqueeze(1), dim=2)
        return y, C
onnx_path = "scripts/einsum_gpu_ac_error/einsum_test.onnx"
x1 = torch.randn(1, 20).numpy()
input_data = x1
torch_model = TestModel()
torch.onnx.export(torch_model, torch.tensor(input_data), onnx_path, 
                  input_names=['x'], 
                  output_names=['output'], opset_version=12)
####################### Torch Prediction #####################
torch_output = torch_model(torch.tensor(input_data))[0].detach().numpy()
####################### ONNX Prediction ######################
ort_inputs = {
    'x': input_data
}
ort_session = onnxruntime.InferenceSession(onnx_path)
ort_outputs = ort_session.run(None, ort_inputs)
onnx_output = ort_outputs[0]
######################## OV Prediction #######################
ie = ov.Core()
ov_model_onnx = ie.read_model(model=onnx_path)
ov_compiled_model = ie.compile_model(model=ov_model_onnx, device_name="GPU")
######################## Compare ##########################
onnx_output = torch_model(torch.tensor(input_data))[0].detach().numpy()
ov_output = ov_compiled_model([input_data])[0]
torch_ov_diffs = np.abs(ov_output - torch_output)
torch_onnx_diffs = np.abs(onnx_output - torch_output)
print("torch out: ", torch_output)
print("onnx out: ", onnx_output)
print("ov out:    ", ov_output)
print("torch-onnx diffs: ", torch_onnx_diffs)
print("torch-ov diffs: ", torch_ov_diffs)

Relevant log output

torch out:  [[-3.2792063  -2.6780648  -3.5213442   3.4748304  -1.1317769  -0.34311914
  -1.0856547   0.4457152   0.8185761   3.624406    0.49652025  0.63117456
   3.807504    1.1279294   0.8031126   1.8535024   1.467316   -4.8243656
  -0.599851   -1.7608211 ]]
onnx out:  [[-3.2792063  -2.6780648  -3.5213442   3.4748304  -1.1317769  -0.34311914
  -1.0856547   0.4457152   0.8185761   3.624406    0.49652025  0.63117456
   3.807504    1.1279294   0.8031126   1.8535024   1.467316   -4.8243656
  -0.599851   -1.7608211 ]]
ov out:     [[ 1.3033031   2.7842085  -3.3617764   4.031091   -1.8404965   0.11584504
   1.8276799   5.034357   13.773817   -1.3714824   2.2115555   7.8758435
  -5.3502183  -3.208364   -5.2808113   0.5835259   2.0305717   2.396245
  -2.9602203  -1.4024773 ]]
torch-onnx diffs:  [[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]
torch-ov diffs:  [[ 4.5825095   5.4622736   0.15956783  0.5562608   0.7087196   0.45896417
   2.9133346   4.588642   12.955241    4.9958887   1.7150352   7.244669
   9.157722    4.336293    6.083924    1.2699765   0.56325567  7.2206106
   2.3603692   0.35834384]]

Issue submission checklist

[X] I'm reporting an issue. It's not a question.
[X] I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
[X] There is reproducer code and related data files such as images, videos, models, etc.

openvinotoolkit / openvino