microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.84k stars 2.94k forks source link

Undefined behaviour in OneHot operator #20659

Open adityagoel4512 opened 6 months ago

adityagoel4512 commented 6 months ago

Describe the issue

The OneHot operator CPU EP implementation features a division operation in calculating the output shape. When the indices input has a shape with a zero dimension, this results in UB here since divide by zero is undefined. On gcc/Linux I get a runtime floating point exception whereas on MacOS/clang 0 is propagated through.

In the ONNX specification for OneHot it states:

The rank of the output tensor will be one greater than the rank of the input tensor.

I believe the correct behaviour when having input indices of shape (0,) and depth with value k should be an output tensor of shape (0, k).

To reproduce

import spox.opset.ai.onnx.v17 as op
from spox import argument, build, Tensor 
import numpy as np
import onnxruntime as ort

if __name__ == "__main__":
    x = argument(Tensor(np.int64, ("N",)))
    cats = [1, 2]
    y = op.one_hot(x, op.const([len(cats)], dtype="int64"), op.const([0, 1], dtype="int64"))
    mp = build({"x": x}, {"y": y})

    s = ort.InferenceSession(mp.SerializeToString())
    out = s.run(None, {"x": np.array([], dtype="int64").reshape(0,)})
    print(out) # On MacOS I get [array([], shape=(0, 2), dtype=int64)] whereas on Linux I get a ``Floating point exception (core dumped)

Urgency

No response

Platform

Linux

OS Version

4.18

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.17.3

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

LLDB step through on debug build on MacOS:

(lldb) n
Process 22526 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = step over
    frame #0: 0x000000015558db5c onnxruntime_pybind11_state.so`onnxruntime::PrepareOutputShape(indices=0x0000600002990000, depth_val=2, axis=-1, prefix_dim_size=0x000000016bc43030, suffix_dim_size=0x000000016bc43028, output_shape=0x000000016bc430d0) at onehot.cc:108:21
   105    for (int64_t i = 0; i < true_axis; ++i) {
   106      prefix_dim_size *= indices_dims[onnxruntime::narrow<size_t>(i)];
   107    }
-> 108    suffix_dim_size = indices_shape.Size() / prefix_dim_size;
   109
   110    return Status::OK();
   111  }
(lldb) frame variable prefix_dim_size
(int64_t &) prefix_dim_size = 0x000000016bc43030 (&prefix_dim_size = 0)
(lldb) frame variable *prefix_dim_size
(int64_t) *prefix_dim_size = 0
(lldb) n
Process 22526 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = step over
    frame #0: 0x000000015558db7c onnxruntime_pybind11_state.so`onnxruntime::PrepareOutputShape(indices=0x0000600002990000, depth_val=2, axis=-1, prefix_dim_size=0x000000016bc43030, suffix_dim_size=0x000000016bc43028, output_shape=0x000000016bc430d0) at onehot.cc:110:10
   107    }
   108    suffix_dim_size = indices_shape.Size() / prefix_dim_size;
   109
-> 110    return Status::OK();
   111  }
github-actions[bot] commented 5 months ago

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.