onnx / onnx-mlir

Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure
Apache License 2.0
766 stars 320 forks source link

PyCompileAndRuntime produces different results when running multiple times #2956

Open axeabc opened 1 month ago

axeabc commented 1 month ago

Hi , I want to use PyCompileAndRuntime to CompileAndRun a onnx model (e.g. model.onnx) and print the output of the model.onnx. However, this will produce different results when I repeatedly run the code. The onnx-mlir used in my experiment is a prebuild verion in onnxmlir/onnx-mlir-dev.

OnnxRuntime4Validate.py

import pickle
from PyCompileAndRuntime import OMCompileExecutionSession
import json
import numpy as np
import os

def run_lib():

    session = OMCompileExecutionSession( 'model.onnx',"-O3",use_default_entry_point=False )
    # Query entry points in the model.
    entry_points = session.entry_points()
    for entry_point in entry_points:
        session.set_entry_point(name=entry_point)
        print(f'Run at {entry_point}')
        # print("input signature in json", session.input_signature())
        # print("output signature in json", session.output_signature())
        inputs = json.loads(session.input_signature())
        input_names = [item['name'] for item in inputs]
        data = {}
        with open('oracle.pkl', 'rb') as file:
            data = pickle.load(file)
        inputs = [data['input'][input_name] for input_name in input_names]
        outputs1 = session.run(inputs)
        print(outputs1)

if __name__ == "__main__":
    run_lib()

model.onnx and oralce.pkl are in model.zip

'python3 OnnxRuntime4Validate.py'

image
AlexandreEichenberger commented 1 month ago

Would it make sense to also print the inputs, so that we are sure they are the same?

axeabc commented 1 month ago

Would it make sense to also print the inputs, so that we are sure they are the same?

I cannot have a complete printscreen because the inputs are large. The print results including inputs and outputs have been exported to two files: tmp1 and tmp2. The diff results shows that the outputs are different given the same inputs.

image

The tmp1 and tmp2 can be found in tmp.zip The specific inputs and model can be found in model.zip

tungld commented 1 month ago

Have you tried to compile the model with -O0 to see if it works?

axeabc commented 1 month ago

Have you tried to compile the model with -O0 to see if it works?

Yes, the cases are the same (the results are different when running multiple times) from '-O0' to '-O2'