Constant in the Reshape operator on s390x architecture (big endian) is a large number 268632064 (expected 784 as on x86)

aosadchyy commented 2 years ago

Dubugging advice Converting TF model to onnx on s390x succeeds, but the resulting onnx file includes a large number 268632064 in Reshape operator. python3 -m tf2onnx.convert --opset 15 --fold_const --saved-model mnist_seqdnn --output mnist_seqdnn_s390x.onnx

The workaround is to run the tf2.convert on x86. The resulting onnx file then has the correct number 784 in Reshape operator. python3 -m tf2onnx.convert --opset 15 --fold_const --saved-model mnist_seqdnn --output mnist_seqdnn_x86.onnx

Describe the bug With all the same library versions and the same source model in TF. When converting to onnx on s390x processor architecture system, a constant in the Reshape operator is a large number 268632064. On a system with x86 processor the constant is as expected 784. s390x is a big endian system unlike x86. Typically the endianes is handled by the primitive operations and operators of either python or c++. Unless there is a direct interpretation of words/dwords in the code. The large number could be an indication of it.

Urgency This blocks the application of tf2onnx on s390x families of systems. Perhaps also powerpc64le, etc

System information

Ubuntu 20.04
Tensorflow Version: 2.5.0
Python version: 3.8

To Reproduce Use MNIST dataset to train and save a sequential DNN model. model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(input_shape=(28, 28)), tf.keras.layers.Dense(128, activation='relu'), tf.keras.layers.Dropout(0.2), tf.keras.layers.Dense(10) ]) ... # compile and fit. model.save("mnist_seqdnn", overwrite=True, include_optimizer=False, save_format='tf')

On s390x architecture machine python3 -m tf2onnx.convert --opset 15 --fold_const --saved-model mnist_seqdnn --output mnist_seqdnn_s390x.onnx

On s390x architecture machine python3 -m tf2onnx.convert --opset 15 --fold_const --saved-model mnist_seqdnn --output mnist_seqdnn_x86.onnx

Compare mnist_seqdnn_s390x.onnx and mnist_seqdnn_x86.onnx Reshape in x86 is const_fold_opt7 kind: Initializer type: int64[2] [ -1, 784 ] - OK Reshape in s390x is const_fold_opt7 kind: Initializer type: int64[2] [ -1, 268632064 ] - NG

Screenshots0

Additional context Console output on s390x system:

python -m tf2onnx.convert --opset 15  --fold_const --saved-model mnist_seqdnn --output mnist_seqdnn_s390x.onnx 
/usr/lib/python3.8/runpy.py:127: RuntimeWarning: 'tf2onnx.convert' found in sys.modules after import of package 'tf2onnx', but prior to execution of 'tf2onnx.convert'; this may result in unpredictable behaviour
  warn(RuntimeWarning(msg))
2022-03-31 17:31:10,754 - WARNING - ***IMPORTANT*** Installed protobuf is not cpp accelerated. Conversion will be extremely slow. See https://github.com/onnx/tensorflow-onnx/issues/1557
2022-03-31 17:31:10,756 - WARNING - '--tag' not specified for saved_model. Using --tag serve
2022-03-31 17:31:11,064 - INFO - Signatures found in model: [serving_default].
2022-03-31 17:31:11,064 - WARNING - '--signature_def' not specified, using first signature: serving_default
2022-03-31 17:31:11,064 - INFO - Output names: ['dense_1']
WARNING:tensorflow:From /home/jovyan/.local/lib/python3.8/site-packages/tf2onnx/tf_loader.py:706: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`
2022-03-31 17:31:11,113 - WARNING - From /home/jovyan/.local/lib/python3.8/site-packages/tf2onnx/tf_loader.py:706: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`
2022-03-31 17:31:11,314 - INFO - Using tensorflow=2.5.0, onnx=1.11.0, tf2onnx=1.9.3/1190aa
2022-03-31 17:31:11,314 - INFO - Using opset <onnx, 15>
2022-03-31 17:31:12,171 - INFO - Computed 0 values for constant folding
2022-03-31 17:31:12,846 - INFO - Optimizing ONNX model
2022-03-31 17:31:12,873 - INFO - After optimization: Cast -1 (1->0), Identity -6 (6->0)
2022-03-31 17:31:12,875 - INFO - 
2022-03-31 17:31:12,875 - INFO - Successfully converted TensorFlow model mnist_seqdnn to ONNX
2022-03-31 17:31:12,875 - INFO - Model inputs: ['flatten_input']
2022-03-31 17:31:12,875 - INFO - Model outputs: ['dense_1']
2022-03-31 17:31:12,875 - INFO - ONNX model is saved at mnist_seqdnn_s390x.onnx

hv0905 commented 9 months ago

It seems the problem still exists when I tried to convert a tensorflow resnet152 model into onnx on s390x (aka LinuxOne). Are there any possible solution or workaround for this problem?

tehbone commented 5 months ago

I have debugged this issue (I think). Invocations of Node.get_tensor_value() in tf2onnx result in calls to onnx's numpy_helper.to_array(), which will perform a byteswap on the data. If there is a means to have ONNX not perform the byteswapping, then it would be as simple as fixing up all of these invocations to not perform the byteswapping.

tehbone commented 5 months ago

This should now be fixed with this ONNX PR.: onnx/onnx#5904.

onnx / tensorflow-onnx

Constant in the Reshape operator on s390x architecture (big endian) is a large number 268632064 (expected 784 as on x86) #1902