onnx / onnx-mlir

Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure
Apache License 2.0
770 stars 321 forks source link

[BUG] Error in Shape inference of Constant Op #3014

Open amd-abhikulk opened 1 week ago

amd-abhikulk commented 1 week ago

I was trying to lower Mistral-7B-v0.1.onnx model, when I get the following error.

onnx-mlir: /home/amd/Workspace/Abhishek/onnx-mlir/src/Support/Diagnostic.hpp:40: onnx_mlir::Diagnostic::Range<T>::Range(T, T) [with T = long int]: Assertion `min <= max && "Illegal range"' failed.
Aborted (core dumped)

This error doesn't much context into what went wrong.

I did some debugging and found out that it was Concat Op was giving this error here

https://github.com/onnx/onnx-mlir/blob/a5ae8baff62209567eec4cc1e8a621cd36292cce/src/Dialect/ONNX/ONNXOps/Tensor/Concat.cpp#L99C1-L103C1

This happens cause concat received tensors with different ranks and the logic kinda fell apart. IMO verify function must check whether all the operands have same rank. (I will send a PR for this soon).

Now as to how it got different ranked tensors, that's the actual cause of this error.

Screenshot (17)_edited

The concat Op has a constant [-1] Tensor to concatenated to the input tensor. But for some reason the onnx-mlir lowers it to

%298 = onnx.Constant {onnx_node_name = "/model/constant_nodes/TensorProto.INT64/1D/-1", value = dense<-1> : tensor<1xi64>} : tensor<i64>

The constant Op strips the constant tensors Op of it's dimensionality for some reason, and I can't undersrtand why