Open gongsu832 opened 2 years ago
I quickly tried squeezenet1.1-7
on a Z machine, and the model passed:
$ VERBOSE=1 ONNX_MLIR_HOME=/home/tungld/dl/onnx-mlir/build/Debug python CheckONNXModelZoo.py -m squeezenet1.1-7 -compile_args="-O3 --mcpu=z14"
There are 155 models in the ONNX model zoo where 31 models are not checked because of old opsets or quantization.
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
Downloading https://github.com/onnx/models/raw/main/vision/classification/squeezenet/model/squeezenet1.1-7.tar.gz
Extracting the .tag.gz to /tmp/tmpj53qm9hx
Checking the model squeezenet1.1-7 ...
[squeezenet1.1-7] Temporary directory has been created at /tmp/tmp73w9upo6
Reading inputs from /tmp/tmpj53qm9hx/squeezenet1.1/test_data_set_2 ...
- 1st input: [1x3x224x224xfloat32]
done.
Compiling the model ...
/home/tungld/dl/onnx-mlir/build/Debug/bin/onnx-mlir -O3 --mcpu=z14 /tmp/tmp73w9upo6/model.onnx
took 5.614736581221223 seconds.
Loading the compiled model ...
took 0.00034265127032995224 seconds.
Running inference ...
took 0.514042291790247 seconds.
Reading reference outputs from /tmp/tmpj53qm9hx/squeezenet1.1/test_data_set_2 ...
- 1st output: [1x1000xfloat32]
done.
Verifying value of squeezenet0_flatten0_reshape0:[1, 1000] using atol=0.01, rtol=0.05 ...
correct.
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 8.8s finished
1 models tested: squeezenet1.1-7
1 models passed: squeezenet1.1-7
And this it protobuf:
protoc --version
libprotoc 3.17.1
It seemed that the error was related to protobuf version.
Not quite for me. I actually have a build system with
# protoc --version
libprotoc 3.12.4
and I'm on commit 8e54f577d05ebd271f4f8b23ac9c3adfe682480e
. This system gets only 21 failures. BTW, are you on z?
BTW, are you on z?
Yes, I am. I am using the last commit.
Looking at this in your log:
[squeezenet1.1-7] Traceback (most recent call last):
File "RunONNXModel.py", line 404, in <module>
main()
File "RunONNXModel.py", line 240, in main
model = onnx.load(args.model_path)`;
It failed at a very early stage when loading the .onnx file using onnx (not onnx-mlir). It should be something related to the onnx or protobuf package.
Can you try this in our docker dev image? It gets 24 failures.
docker pull onnxmlirczar/onnx-mlir-dev
docker run --rm -ti onnxmlirczar/onnx-mlir-dev
Inside the container,
apt-get update && apt-get install wget
pip3 install joblib
git clone https://github.com/onnx/models
cd models
ln -sf ../onnx-mlir/utils/RunONNXModel.py
ln -sf ../onnx-mlir/test/onnx-model-zoo/CheckONNXModelZoo.py
VERBOSE=2 ONNX_MLIR_HOME=/workdir/onnx-mlir/build/Debug python3 CheckONNXModelZoo.py -m squeezenet1.1-7
Just tried protobuf 3.12.4, 3.14.0, 3.17.1, and 3.20.1, all fail.
@gongsu832 did what you suggested using our docker dev image, and I got this:
root@95a4aa53b769:/workdir/models# VERBOSE=2 ONNX_MLIR_HOME=/workdir/onnx-mlir/build/Debug python3 CheckONNXModelZoo.py -m squeezenet1.1-7
find . -mindepth 2 -type f -name *.tar.gz
There are 155 models in the ONNX model zoo where 31 models are not checked because of old opsets or quantization.
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
Downloading https://github.com/onnx/models/raw/main/vision/classification/squeezenet/model/squeezenet1.1-7.tar.gz
wget --no-check-certificate --timestamping https://github.com/onnx/models/raw/main/vision/classification/squeezenet/model/squeezenet1.1-7.tar.gz
Extracting the .tag.gz to /tmp/tmp5sjkx8tt
tar -xzvf ./squeezenet1.1-7.tar.gz -C /tmp/tmp5sjkx8tt
find /tmp/tmp5sjkx8tt -type f -name *.onnx
find /tmp/tmp5sjkx8tt -type d -name test_data_set*
Checking the model squeezenet1.1-7 ...
python RunONNXModel.py /tmp/tmp5sjkx8tt/squeezenet1.1/squeezenet1.1.onnx --compile_args=-O3 --verify=ref --data_folder=/tmp/tmp5sjkx8tt/squeezenet1.1/test_data_set_2
[squeezenet1.1-7] Temporary directory has been created at /tmp/tmpc0de48vj
Reading inputs from /tmp/tmp5sjkx8tt/squeezenet1.1/test_data_set_2 ...
- 1st input: [1x3x224x224xfloat32]
done.
Compiling the model ...
/workdir/onnx-mlir/build/Debug/bin/onnx-mlir -O3 /tmp/tmpc0de48vj/model.onnx
took 3.346841878257692 seconds.
Loading the compiled model ...
took 0.00033509451895952225 seconds.
Running inference ...
took 0.7248765854164958 seconds.
Reading reference outputs from /tmp/tmp5sjkx8tt/squeezenet1.1/test_data_set_2 ...
- 1st output: [1x1000xfloat32]
done.
Verifying value of squeezenet0_flatten0_reshape0:[1, 1000] using atol=0.01, rtol=0.05 ...
correct.
rm ./squeezenet1.1-7.tar.gz
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 7.1s finished
1 models tested: squeezenet1.1-7
1 models passed: squeezenet1.1-7
root@95a4aa53b769:/workdir/models# protoc --version
libprotoc 3.14.0
root@95a4aa53b769:/workdir/models# pip3 show protobuf
Name: protobuf
Version: 3.14.0
Summary: Protocol Buffers
Home-page: https://developers.google.com/protocol-buffers/
Author: None
Author-email: None
License: 3-Clause BSD License
Location: /usr/local/lib/python3.8/dist-packages/protobuf-3.14.0-py3.8.egg
Requires: six
Required-by: onnx
I didn't see any error.
Now this is very weird.
root@bda611eb4b38:/workdir/models# VERBOSE=2 ONNX_MLIR_HOME=/workdir/onnx-mlir/build/Debug python3 CheckONNXModelZoo.py -m squeezenet1.1-7
find . -mindepth 2 -type f -name *.tar.gz
There are 155 models in the ONNX model zoo where 31 models are not checked because of old opsets or quantization.
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
Downloading https://github.com/onnx/models/raw/main/vision/classification/squeezenet/model/squeezenet1.1-7.tar.gz
wget --no-check-certificate --timestamping https://github.com/onnx/models/raw/main/vision/classification/squeezenet/model/squeezenet1.1-7.tar.gz
Extracting the .tag.gz to /tmp/tmpld_si8ea
tar -xzvf ./squeezenet1.1-7.tar.gz -C /tmp/tmpld_si8ea
find /tmp/tmpld_si8ea -type f -name *.onnx
find /tmp/tmpld_si8ea -type d -name test_data_set*
Checking the model squeezenet1.1-7 ...
python RunONNXModel.py /tmp/tmpld_si8ea/squeezenet1.1/._squeezenet1.1.onnx --compile_args=-O3 --verify=ref --data_folder=/tmp/tmpld_si8ea/squeezenet1.1/test_data_set_0
[squeezenet1.1-7] Traceback (most recent call last):
File "RunONNXModel.py", line 404, in <module>
main()
File "RunONNXModel.py", line 240, in main
model = onnx.load(args.model_path)
File "/usr/local/lib/python3.8/dist-packages/onnx-1.11.0-py3.8-linux-s390x.egg/onnx/__init__.py", line 121, in load_model
model = load_model_from_string(s, format=format)
File "/usr/local/lib/python3.8/dist-packages/onnx-1.11.0-py3.8-linux-s390x.egg/onnx/__init__.py", line 158, in load_model_from_string
return _deserialize(s, ModelProto())
File "/usr/local/lib/python3.8/dist-packages/onnx-1.11.0-py3.8-linux-s390x.egg/onnx/__init__.py", line 99, in _deserialize
decoded = cast(Optional[int], proto.ParseFromString(s))
File "/usr/local/lib/python3.8/dist-packages/protobuf-3.14.0-py3.8.egg/google/protobuf/message.py", line 199, in ParseFromString
return self.MergeFromString(serialized)
File "/usr/local/lib/python3.8/dist-packages/protobuf-3.14.0-py3.8.egg/google/protobuf/internal/python_message.py", line 1145, in MergeFromString
if self._InternalParse(serialized, 0, length) != length:
File "/usr/local/lib/python3.8/dist-packages/protobuf-3.14.0-py3.8.egg/google/protobuf/internal/python_message.py", line 1195, in InternalParse
raise message_mod.DecodeError('Field number 0 is illegal.')
google.protobuf.message.DecodeError: Field number 0 is illegal.
rm ./squeezenet1.1-7.tar.gz
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.9s finished
1 models tested: squeezenet1.1-7
0 models passed:
1 models failed: squeezenet1.1-7
root@bda611eb4b38:/workdir/models# protoc --version
libprotoc 3.14.0
root@bda611eb4b38:/workdir/models# pip3 show protobuf
Name: protobuf
Version: 3.14.0
Summary: Protocol Buffers
Home-page: https://developers.google.com/protocol-buffers/
Author: None
Author-email: None
License: 3-Clause BSD License
Location: /usr/local/lib/python3.8/dist-packages/protobuf-3.14.0-py3.8.egg
Requires: six
Required-by: onnx
@tungld I was testing my model zoo build and noticed that the latest commit has 24 failed tests, 2 more than what you reported on https://github.com/onnx/onnx-mlir/issues/128#issuecomment-1128755672. The 2 tests are
squeezenet1.1-7
andvgg16-bn-7
. They both failed with the same errorField number 0 is illegal
(only showing squeeznet1.1-7):I tried both protobuf 3.14.0 and 3.20.1 and the results are the same.