Get wrong constant folding result when calling InferenceSession

Describe the bug I created an onnx model and saved it with save_as_external_data = True, then loaded it into memory with load_external_data= False， then pass model.SerializeToString() to onnxrt.InferenceSession() to do constant folding. The generated onnx file of the folded model will have wrong value for the folded part.

e.g. in the following example model,

to fold node_cf, the expected result of tensor Z1 be: , while I get the wrong result of Z1: or something like:

To Reproduce example code:

import numpy as np
import onnx
import onnxruntime as rt
from onnx import helper
from onnx import TensorProto

#======== 1. create an onnx model

dtype = np.float32
x = np.ones((3, 3), dtype)

data_w = np.ones((3, 3), dtype)
w = helper.make_tensor(name = 'w', data_type = TensorProto.FLOAT, dims = data_w.shape, vals = data_w.flatten().astype(dtype).tobytes(), raw=True)

V = helper.make_tensor_value_info('V', TensorProto.FLOAT, [3, 3])
Z = helper.make_tensor_value_info('Z', TensorProto.FLOAT, [3, 3])

# Create node (NodeProto)
X = helper.make_node(
    'Constant',
    inputs=[],
    outputs=['X'],
    value=onnx.helper.make_tensor(
        name='const_tensor_x',
        data_type=onnx.TensorProto.FLOAT,
        dims=x.shape,
        vals=x.flatten().astype(float),
    ),
)
node_cf = helper.make_node(
    'Gemm',             
    ['w', 'X'], # inputs
    ['Z1'],                  # outputs
    name = 'tocf'
)
node_sum = helper.make_node(
    'Sum',                  # name
    ['Z1', 'V'], # inputs
    ['Z'],                  # outputs
)
# Create the graph (GraphProto)
graph_def = helper.make_graph(
    [X, node_cf, node_sum],        # nodes
    'test-model',      # name
    [V],  # inputs
    [Z],  # outputs
    initializer = [w],
)
model_def = helper.make_model(graph_def, producer_name='onnx-example')

# #======== 2. save the onnx model
# (0) save model with data (to get correct result)
path_full = 'model_full.onnx'
onnx.save_model(model_def, path_full
                , save_as_external_data=False)
# (1) save model with separate data files  (will get the wrong result)
path_sep = 'model_sep.onnx'
onnx.save_model(model_def, path_sep
                , save_as_external_data=True
                , all_tensors_to_one_file=False
                , size_threshold=0)

# #======== 3. load and do constant folding
# # (1) will get wrong result: the generated file "model_sep_cf.onnx" will have tensor Z1 with wrong values
model = onnx.load(path_sep, load_external_data= False) # without data
sess_options = rt.SessionOptions()
sess_options.graph_optimization_level = rt.GraphOptimizationLevel.ORT_ENABLE_BASIC
sess_options.optimized_model_filepath = path_sep.replace(".onnx", "_cf_no_data.onnx")
session = rt.InferenceSession(model.SerializeToString(), sess_options) # wrong: Z1 with 1-values

# (2) the expected result: the generated file "model_full_cf.onnx" will have tensor Z1 with correct values
model2 = onnx.load(path_full)
sess_options2 = rt.SessionOptions()
sess_options2.graph_optimization_level = rt.GraphOptimizationLevel.ORT_ENABLE_BASIC
sess_options2.optimized_model_filepath = path_full.replace(".onnx", "_cf.onnx")
session2 = rt.InferenceSession(model2.SerializeToString(), sess_options2) # as expected: Z1 with 3-values

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): win10
ONNX Runtime installed from (source or binary): binary
ONNX Runtime version: 1.8.1
Python version: 3.8.0

Expected behavior

get correct constant folding results;
for a model with large tensors, load data on demand (i.e. only load the tensor when it is required for the process);

Thank you

microsoft / onnxruntime

Get wrong constant folding result when calling InferenceSession #8422