Open xiaowuhu opened 1 year ago
Hi ORT team, Is there any update? This blocks our users. Thanks.
Here is the model path: https://drive.google.com/file/d/1MbmTOLvr5U-RbZ08rJxf6E16Eg-4GUx_/view?usp=sharing • To test, please simply run “python fp16_convert.py” • To test with different batch_size, please change the inputs within “python fp16_convert.py”. In the test file, I created 4 inputs with different batch_size: from batch_size=1, 2, 8, to 32.
Describe the issue
the 1st party model is in the internal email.
do inference on both GPU and CPU, the output is:
GPU inference 0= [array([[0.46005446, 0.53994554]], dtype=float32)] GPU inference 1= [array([[0.46167108, 0.53832895]], dtype=float32)]
CPU inference 0= [array([[0.45498496, 0.545015 ]], dtype=float32)] CPU inference 1= [array([[0.45498496, 0.545015 ]], dtype=float32)] expected: two times GPU inference result should be same, just like CPU's result. actual: the difference is big.
To reproduce
Urgency
ASAP
Platform
Linux
OS Version
ubuntun 20
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.12
ONNX Runtime API
Python
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
CUDA 11.4