openvinotoolkit / openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
https://docs.openvino.ai
Apache License 2.0
7.29k stars 2.27k forks source link

[Bug]: Linear + Split + Einsum AC error on GPU #23809

Open cold-blue opened 7 months ago

cold-blue commented 7 months ago

OpenVINO Version

2024.0.0-14509-34caeefd078-releases/2024/0

Operating System

Ubuntu 20.04 (LTS)

Device used for inference

GPU

Framework

PyTorch

Model used

N/A

Issue description

When I change "CPU" backend to "GPU for Openvino model inference. I find that when a linear model is followed by an einsum operation, results of OV will be different from Torch and ONNX. This only happens in GPU, so it may be caused by GPU plugin. This can be worked around by replacing y = torch.einsum("bdn,bn->bd", self.step_ssm_state.to(dtype), C) with y = torch.sum(self.step_ssm_state.to(dtype) * C.unsqueeze(1), dim=2)

Step-by-step reproduction

import numpy as np
import torch
import openvino as ov
import onnxruntime
import random
random_seed = 42
random.seed(random_seed)
# Set random seed for NumPy
np.random.seed(random_seed)
torch.manual_seed(random_seed)
class TestModel(torch.nn.Module):
    def __init__(self, *args, **kwargs) -> None:
        super().__init__(*args, **kwargs)
        d_inner = 20
        self.dt_rank = 8
        self.d_state = 16
        self.step_ssm_state = torch.randn(size=(1, d_inner, self.d_state))
        self.x_proj = torch.nn.Linear(
            d_inner, self.dt_rank + self.d_state * 2, bias=False
        )
    def forward(self, x):
        dtype = torch.float32
        x_db = self.x_proj(x)  # (B dt_rank+2*d_state) x_proj is large (~1500x1500 or 750x750)
        dt, B, C = torch.split(x_db, 
                               [self.dt_rank, self.d_state, self.d_state], 
                               dim=-1)
        y = torch.einsum("bdn,bn->bd", self.step_ssm_state.to(dtype), C)
        # y = torch.sum(self.step_ssm_state.to(dtype) * C.unsqueeze(1), dim=2)
        return y, C
onnx_path = "scripts/einsum_gpu_ac_error/einsum_test.onnx"
x1 = torch.randn(1, 20).numpy()
input_data = x1
torch_model = TestModel()
torch.onnx.export(torch_model, torch.tensor(input_data), onnx_path, 
                  input_names=['x'], 
                  output_names=['output'], opset_version=12)
####################### Torch Prediction #####################
torch_output = torch_model(torch.tensor(input_data))[0].detach().numpy()
####################### ONNX Prediction ######################
ort_inputs = {
    'x': input_data
}
ort_session = onnxruntime.InferenceSession(onnx_path)
ort_outputs = ort_session.run(None, ort_inputs)
onnx_output = ort_outputs[0]
######################## OV Prediction #######################
ie = ov.Core()
ov_model_onnx = ie.read_model(model=onnx_path)
ov_compiled_model = ie.compile_model(model=ov_model_onnx, device_name="GPU")
######################## Compare ##########################
onnx_output = torch_model(torch.tensor(input_data))[0].detach().numpy()
ov_output = ov_compiled_model([input_data])[0]
torch_ov_diffs = np.abs(ov_output - torch_output)
torch_onnx_diffs = np.abs(onnx_output - torch_output)
print("torch out: ", torch_output)
print("onnx out: ", onnx_output)
print("ov out:    ", ov_output)
print("torch-onnx diffs: ", torch_onnx_diffs)
print("torch-ov diffs: ", torch_ov_diffs)

Relevant log output

torch out:  [[-3.2792063  -2.6780648  -3.5213442   3.4748304  -1.1317769  -0.34311914
  -1.0856547   0.4457152   0.8185761   3.624406    0.49652025  0.63117456
   3.807504    1.1279294   0.8031126   1.8535024   1.467316   -4.8243656
  -0.599851   -1.7608211 ]]
onnx out:  [[-3.2792063  -2.6780648  -3.5213442   3.4748304  -1.1317769  -0.34311914
  -1.0856547   0.4457152   0.8185761   3.624406    0.49652025  0.63117456
   3.807504    1.1279294   0.8031126   1.8535024   1.467316   -4.8243656
  -0.599851   -1.7608211 ]]
ov out:     [[ 1.3033031   2.7842085  -3.3617764   4.031091   -1.8404965   0.11584504
   1.8276799   5.034357   13.773817   -1.3714824   2.2115555   7.8758435
  -5.3502183  -3.208364   -5.2808113   0.5835259   2.0305717   2.396245
  -2.9602203  -1.4024773 ]]
torch-onnx diffs:  [[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]
torch-ov diffs:  [[ 4.5825095   5.4622736   0.15956783  0.5562608   0.7087196   0.45896417
   2.9133346   4.588642   12.955241    4.9958887   1.7150352   7.244669
   9.157722    4.336293    6.083924    1.2699765   0.56325567  7.2206106
   2.3603692   0.35834384]]

Issue submission checklist

avitial commented 1 month ago

@cold-blue I think the issue you see on GPU might be due to inference precision the model is executed as. By default GPU executes model as f16, whereas CPU as f32. The possible solution here is to modify your script to execute model with f32 precision for GPU to get same results as CPU.

Try adding following lines to your script when running for GPU plugin and see if that resolves it. Hope this helps.

import openvino.properties.hint as hints

ov_core.set_property('GPU', {hints.inference_precision: 'f32'})