microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.05k stars 2.83k forks source link

Inconsistent behavior between CPU and GPU on ReLU operator when input is NaN #11010

Open maybeLee opened 2 years ago

maybeLee commented 2 years ago

Describe the bug I was converting a keras to ONNX through one library (tf2onnx)[https://github.com/onnx/tensorflow-onnx], however, I find out that: when giving a NaN input, onnxruntime will correctly output NaN in CPU mode while output normal value in GPU mode. After some debugging, I find out this inconsistency happens when I set the activation of dense to be relu. Specifically, the simplest graph that can trigger this bug is as follows:

image

To Reproduce

Expected behavior Same as executing on CPU mode, onnxruntime should also output NaN when executing on GPU mode

Screenshots

image
cloudhan commented 2 years ago

I think that is caused by fmaxf in cuda

If both arguments are NaN, returns NaN.
If one argument is NaN, returns the numeric argument.