Open cold-blue opened 7 months ago
@cold-blue I think the issue you see on GPU might be due to inference precision the model is executed as. By default GPU executes model as f16, whereas CPU as f32. The possible solution here is to modify your script to execute model with f32 precision for GPU to get same results as CPU.
Try adding following lines to your script when running for GPU plugin and see if that resolves it. Hope this helps.
import openvino.properties.hint as hints
ov_core.set_property('GPU', {hints.inference_precision: 'f32'})
OpenVINO Version
2024.0.0-14509-34caeefd078-releases/2024/0
Operating System
Ubuntu 20.04 (LTS)
Device used for inference
GPU
Framework
PyTorch
Model used
N/A
Issue description
When I change "CPU" backend to "GPU for Openvino model inference. I find that when a linear model is followed by an einsum operation, results of OV will be different from Torch and ONNX. This only happens in GPU, so it may be caused by GPU plugin. This can be worked around by replacing
y = torch.einsum("bdn,bn->bd", self.step_ssm_state.to(dtype), C)
withy = torch.sum(self.step_ssm_state.to(dtype) * C.unsqueeze(1), dim=2)
Step-by-step reproduction
Relevant log output
Issue submission checklist