Numerics comparison generating wrong results

vivekkhandelwal1 commented 1 month ago

Hi, I was trying to run an operator test through the SHARK-Testsuite and I found that the test result generated through the test suite does not match the golden result. While the same test, when compiled through IREE, and executed manually gives the results matching the golden results. I think there's some issue with the way test results are generated. I'm sharing the test script and the steps to repro the issue.

Script to run the test:

# Copyright 2024 Advanced Micro Devices, Inc.
#
# Licensed under the Apache License v2.0 with LLVM Exceptions.
# See https://llvm.org/LICENSE.txt for license information.
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

from ..helper_classes import AzureDownloadableModel, BuildAModel
from onnx.helper import make_tensor, make_tensor_value_info
from onnx import TensorProto
from e2e_testing.registry import register_test

import numpy
from onnx import TensorProto
import onnx
from onnx.helper import make_tensor_value_info

# data = [
#     [1, 2, 3, 4],
#     [5, 6, 7, 8],
# ]
# starts = [0, 1]
# ends = [-1, 1000]
# result = [
#     [2, 3, 4],
# ]

class INModel(BuildAModel):

    def construct_initializers(self):
        # Input 1
        # self.initializers = [ make_tensor("i1", TensorProto.INT64, [2, 4], [1, 2, 3, 4, 5, 6, 7, 8])] #data
        # self.initializers += [ make_tensor("i2", TensorProto.INT64, [2], [1, 0])] #starts
        # self.initializers += [ make_tensor("i3", TensorProto.INT64, [2], [2, 3])] #ends
        # self.initializers += [ make_tensor("i4", TensorProto.INT64, [2], [0, 1])] #axes
        # self.initializers += [ make_tensor("i5", TensorProto.INT64, [2], [1, 2])] #steps

        # Input 2
        self.initializers = [ make_tensor("i1", TensorProto.INT64, [2, 4], [1, 2, 3, 4, 5, 6, 7, 8])] #data
        self.initializers += [ make_tensor("i2", TensorProto.INT64, [2], [0, 1])] #starts
        self.initializers += [ make_tensor("i3", TensorProto.INT64, [2], [-1, 1000])] #ends
        self.initializers += [ make_tensor("i4", TensorProto.INT64, [2], [0, 1])] #axes
        self.initializers += [ make_tensor("i5", TensorProto.INT64, [2], [1, 1])] #steps

    def construct_nodes(self):
        app_node = self.get_app_node()
        app_node("Slice", ["i1", "i2", "i3", "i4", "i5"], ["o1"]) # Order of args: data, starts, ends, axes, steps

    def construct_i_o_value_info(self):
        arg1 =  make_tensor_value_info("arg1", TensorProto.INT64,[2, 4])
        o1 =  make_tensor_value_info("o1", TensorProto.INT64,[1, 3])
        self.input_vi = [arg1]
        self.output_vi = [o1]

register_test(INModel, "slice_test")

Add this script in the /SHARK-TestSuite/alt_e2eshark/onnx_tests/operators/ directory with a name let's say slice.py and then add from .slice import * in /SHARK-TestSuite/alt_e2eshark/onnx_tests/operators/model.py. Since the test is registered now, you can run the test using python ./run.py -t slice_test -v, now it will generate the numerics failed issue with the values present in /SHARK-TestSuite/alt_e2eshark/test-run/slice_test/inference_comparison.log.

If you run this test manually, using the torch onnx IR present in the test-run/slice_test directory, you will find out that the results generated are same as the golden results.

To compile and run this manually, run the following: 1.) iree-compile --iree-hal-target-backends=llvm-cpu test-run/gather_test/model.torch_onnx.mlir -o slice_test.vmfb 2.) iree-run-module --module=slice_test.vmfb --device=local-task --input=1xi64=4

zjgarvey commented 1 month ago

As mentioned in the sync earlier, I think that the issue is because of the flag "iree-input-demote-i64-to-i32", which is turned on by default. Try running with iree compile args as -ica="" or with mode -m cl-onnx-iree and see if the issue persists.

We should figure out why the numerics are incorrect specifically when i64 inputs are used with this flag.

zjgarvey commented 1 month ago

Yeah, I just checked that this gives correct numerics without --iree-input-demote-i64-to-i32.

We can remove this flag from the default.

nod-ai / SHARK-TestSuite

Numerics comparison generating wrong results #353