Open lcskrishna opened 7 years ago
Hi @lcskrishna. Did you install using the binaries (conda install -c ezyang onnx
) or from source?
I have installed using source (pip install onnx)
Please pip uninstall onnx
and then try the binary install.
@lcskrishna Did you do pip inside a conda virtual environment?
@bddppq No, I don't have conda virtual environment setup, I'm trying directly using pip install. Also, I have installed protobuf and protoc from github source and I have installed numpy using pip.
@lcskrishna I see. Which OS are you using? Could you do
ldd /home/chaitanya/.local/lib/python2.7/site-packages/onnx/onnx_cpp2py_export.so
and readelf -d /home/chaitanya/.local/lib/python2.7/site-packages/onnx/onnx_cpp2py_export.so
and paste the output here? Also echo $LD_LIBRARY_PATH
.
@bddppq
I am using Ubuntu 16.04
Here are the outputs you asked for:
%ldd /home/chaitanya/.local/lib/python2.7/site-packages/onnx/onnx_cpp2py_export.so
linux-vdso.so.1 => (0x00007ffd2ad53000)
libprotobuf.so.9 => /usr/lib/x86_64-linux-gnu/libprotobuf.so.9 (0x00007f0f3edc6000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f0f3ebb0000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f0f3e992000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f0f3e5c8000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f0f3e3ae000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f0f3e02b000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f0f3dd22000)
/lib64/ld-linux-x86-64.so.2 (0x00005563d2b0c000)
%readelf -d /home/chaitanya/.local/lib/python2.7/site-packages/onnx/onnx_cpp2py_export.so
Dynamic section at offset 0x67bd0 contains 27 entries:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [libprotobuf.so.9]
0x0000000000000001 (NEEDED) Shared library: [libgcc_s.so.1]
0x0000000000000001 (NEEDED) Shared library: [libpthread.so.0]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
0x000000000000000c (INIT) 0x13a48
0x000000000000000d (FINI) 0x4feb4
0x0000000000000019 (INIT_ARRAY) 0x2671f0
0x000000000000001b (INIT_ARRAYSZ) 96 (bytes)
0x000000000000001a (FINI_ARRAY) 0x267250
0x000000000000001c (FINI_ARRAYSZ) 8 (bytes)
0x000000006ffffef5 (GNU_HASH) 0x1f0
0x0000000000000005 (STRTAB) 0x5be0
0x0000000000000006 (SYMTAB) 0x18d8
0x000000000000000a (STRSZ) 41221 (bytes)
0x000000000000000b (SYMENT) 24 (bytes)
0x0000000000000003 (PLTGOT) 0x268000
0x0000000000000002 (PLTRELSZ) 4896 (bytes)
0x0000000000000014 (PLTREL) RELA
0x0000000000000017 (JMPREL) 0x12728
0x0000000000000007 (RELA) 0x102e0
0x0000000000000008 (RELASZ) 9288 (bytes)
0x0000000000000009 (RELAENT) 24 (bytes)
0x000000006ffffffe (VERNEED) 0x10280
0x000000006fffffff (VERNEEDNUM) 2
0x000000006ffffff0 (VERSYM) 0xfce6
0x000000006ffffff9 (RELACOUNT) 186
0x0000000000000000 (NULL) 0x0
% echo $LD_LIBRARY_PATH
/usr/local/lib
@lcskrishna Hmm...everything looks normal to me. Could you also do nm -C /usr/lib/x86_64-linux-gnu/libprotobuf.so.9 | grep SpaceUsedLong
?
I am getting the following output :
nm: /usr/lib/x86_64-linux-gnu/libprotobuf.so.9: no symbols
@lcskrishna Not sure whether it's your protobuf installation being broken. Adding "-D" flag to the nm command might help debugging. In the meanwhile, since you are using ubuntu, could you use "sudo apt-get install libprotobuf-dev protobuf-compiler" to install protobuf?
@lcskrishna Have you been able to resolve the issue?
I tried a fresh installation of caffe2, protobuf, onnx and onnx-caffe2 Also, I have used conda installation for onnx. Now the above error doesn't show up, however I am getting the following error while running the conversion:
WARNING:root:This caffe2 python run does not have GPU support. Will run in CPU only mode.
WARNING:root:Debug message: No module named caffe2_pybind11_state_gpu
Traceback (most recent call last):
File "../../caffe2-conv/conversion.py", line 9, in <module>
c2_net.ParseFromString(f.read())
google.protobuf.message.DecodeError: Error parsing message
Here is my script:
import onnx_caffe2.frontend as c2_onnx
from caffe2.proto import caffe2_pb2
import os
c2_net = caffe2_pb2.NetDef()
model_path = '/home/chaitu/work/caffe2_models/model/'
c2_model_file = os.path.join(model_path, 'resnet101_init_net.pb')
with open(c2_model_file, 'rb') as f:
c2_net.ParseFromString(f.read())
onnx_graph = c2_onnx.caffe2_net_to_onnx_graph(c2_net)
@lcskrishna What's the size of your pb file? I suspect it's hitting the 64mb limit. Could you try 'export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python' in your terminal and then run your code snippet again?
@bddppq the size of the file is around 4mb and i forgot to export. However, I have tried by using a simple CIFAR 10 network.
I took a caffemodel of CIFAR10 and using the conversion tool in caffe2 and translated into caffe2 model and tried to perform conversion as mentioned above, still i get the following error. I'm not sure what's the issue is:
WARNING:root:This caffe2 python run does not have GPU support. Will run in CPU only mode.
WARNING:root:Debug message: No module named caffe2_pybind11_state_gpu
Unrecognized attribute: legacy_pad
Traceback (most recent call last):
File "conversion.py", line 10, in <module>
onnx_graph = c2_onnx.caffe2_net_to_onnx_graph(c2_net)
File "/home/chaitu/.local/lib/python2.7/site-packages/onnx_caffe2/frontend.py", line 254, in caffe2_net_to_onnx_graph
caffe2_op_to_node_def(op, name_map) for op in net_def.op)
File "/home/chaitu/.local/lib/python2.7/site-packages/onnx_caffe2/frontend.py", line 254, in <genexpr>
caffe2_op_to_node_def(op, name_map) for op in net_def.op)
File "/home/chaitu/.local/lib/python2.7/site-packages/onnx_caffe2/frontend.py", line 205, in caffe2_op_to_node_def
checker.check_node(node_def)
File "/home/chaitu/.local/lib/python2.7/site-packages/onnx/checker.py", line 38, in check_node
'NodeProto of type {} did not pass defs schema check.'.format(str(node.op_type)))
ValueError: NodeProto of type MaxPool did not pass defs schema check.
@lcskrishna Please add option --remove_legacy_pad when you do the translation from caffe model to caffe2 model.
@jerryzh168 I get the following error while I try to translate using --remove_legacy_pad
Traceback (most recent call last):
File "caffe_translator.py", line 853, in
@lcskrishna could you post your caffe1 model? I'll try to modify caffe_translator to make sure it works with your model.
@jerryzh168 Please find the trained caffemodel
Can you post the deploy.prototxt as well? Thanks
Here is my prototxt file used.
@lcskrishna did you update your build to the most recent caffe2? I can translate your model actually. Since a more recent update don't remove legacy pad by default.
As a side note, there was an problem in _GetLegacyDims(should use feed rather than feed_blob), and it will be fixed after my new diff lands.
@jerryzh168 I tried it again and im still getting the same issue. Can you post the command on how you were executing the translator.
Thanks.
I see, since you need to use remove_legacy_pad, that code will be called. Please wait until my diff landed. Also you should probably provide input_dim by add "--input_dims" option after that diff is landed.
@lcskrishna The diff is landed, please update your c2 and try again.
@jerryzh168 I tried the following after updating caffe2 and I get the following error.
Command:
python -m caffe2.python.caffe_translator ../caffe_models/cifar.prototxt ../caffe_models/cifar10_quick_iter_4000.caffemodel --remove_legacy_pad --input_dims 1 3 32 32
Error
W1003 22:11:36.411902 3167 workspace.cc:157] Blob label not in the workspace.
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/usr/local/caffe2/python/caffe_translator.py", line 928, in <module>
input_dims=args.input_dims
File "/usr/local/caffe2/python/caffe_translator.py", line 299, in TranslateModel
return TranslatorRegistry.TranslateModel(*args, **kwargs)
File "/usr/local/caffe2/python/caffe_translator.py", line 294, in TranslateModel
net = _RemoveLegacyPad(net, net_params, input_dims)
File "/usr/local/caffe2/python/caffe_translator.py", line 139, in _RemoveLegacyPad
dim_map = _GetLegacyDims(net, net_params, dummy_input, legacy_pad_ops)
File "/usr/local/caffe2/python/caffe_translator.py", line 77, in _GetLegacyDims
ws._run_operator(op_def.SerializeToString())
RuntimeError: [enforce fail at operator.cc:52] blob != nullptr. op Accuracy: Encountered a non-existing input blob: label
@lcskrishna is this a train net? Probably you should use "deploy_net" instead.
Hi, I encountered a similar problem when I try to import onnx using python on Ubuntu 14.04.5 LTS (GNU/Linux 3.13.0-135-generic x86_64). I installed protocol buffer from https://github.com/google/protobuf/releases/download/v3.5.1/protobuf-all-3.5.1.zip (version 3.5.1) and also installed onnx using pip (not conda). Below is the error message and some output of my debugging.
$ python -c 'import onnx'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/onnx/__init__.py", line 10, in <module>
import onnx.helper # noqa
File "/usr/local/lib/python2.7/dist-packages/onnx/helper.py", line 15, in <module>
import onnx.defs as defs
File "/usr/local/lib/python2.7/dist-packages/onnx/defs/__init__.py", line 6, in <module>
import onnx.onnx_cpp2py_export.defs as C
ImportError: /usr/local/lib/python2.7/dist-packages/onnx/onnx_cpp2py_export.so: undefined symbol: _ZNK6google8protobuf7Message13SpaceUsedLongEv
$ ldd /usr/local/lib/python2.7/dist-packages/onnx/onnx_cpp2py_export.so
linux-vdso.so.1 => (0x00007ffc60dc2000)
libprotobuf.so.8 => /usr/lib/x86_64-linux-gnu/libprotobuf.so.8 (0x00007fc1fba23000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fc1fb71f000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fc1fb419000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fc1fb203000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc1fae3a000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fc1fac1c000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fc1faa03000)
/lib64/ld-linux-x86-64.so.2 (0x00007fc1fbff8000)
$ readelf -d /usr/local/lib/python2.7/dist-packages/onnx/onnx_cpp2py_export.so
Dynamic section at offset 0xd0a30 contains 28 entries:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [libprotobuf.so.8]
0x0000000000000001 (NEEDED) Shared library: [libstdc++.so.6]
0x0000000000000001 (NEEDED) Shared library: [libm.so.6]
0x0000000000000001 (NEEDED) Shared library: [libgcc_s.so.1]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
0x000000000000000c (INIT) 0x25948
0x000000000000000d (FINI) 0xa2538
0x0000000000000019 (INIT_ARRAY) 0x2cf500
0x000000000000001b (INIT_ARRAYSZ) 144 (bytes)
0x000000000000001a (FINI_ARRAY) 0x2cf590
0x000000000000001c (FINI_ARRAYSZ) 8 (bytes)
0x000000006ffffef5 (GNU_HASH) 0x1f0
0x0000000000000005 (STRTAB) 0xaf80
0x0000000000000006 (SYMTAB) 0x27a8
0x000000000000000a (STRSZ) 82207 (bytes)
0x000000000000000b (SYMENT) 24 (bytes)
0x0000000000000003 (PLTGOT) 0x2d1000
0x0000000000000002 (PLTRELSZ) 5904 (bytes)
0x0000000000000014 (PLTREL) RELA
0x0000000000000017 (JMPREL) 0x24238
0x0000000000000007 (RELA) 0x1fd08
0x0000000000000008 (RELASZ) 17712 (bytes)
0x0000000000000009 (RELAENT) 24 (bytes)
0x000000006ffffffe (VERNEED) 0x1fbf8
0x000000006fffffff (VERNEEDNUM) 4
0x000000006ffffff0 (VERSYM) 0x1f0a0
0x000000006ffffff9 (RELACOUNT) 370
0x0000000000000000 (NULL) 0x0
$ echo $LD_LIBRARY_PATH
/usr/local/lib
$ nm -C /usr/lib/x86_64-linux-gnu/libprotobuf.so.8 | grep SpaceUsedLong
nm: /usr/lib/x86_64-linux-gnu/libprotobuf.so.8: no symbols
$ nm -C -D /usr/lib/x86_64-linux-gnu/libprotobuf.so.8 | grep SpaceUsedLong
I did "sudo apt-get install libprotobuf-dev protobuf-compiler" but found the protoc version is too low (2.5.0-9ubuntu1) for onnx (as suggested by this issue). So I manually installed a newer version of protoc (version 3.5.1). I appreciate that if anyone can give some hints on what is wrong.
This is still an issue. I just build caffe (which is now a part of pytorch) and I am getting the same error:
usr/local/lib/python2.7/dist-packages/onnx/onnx_cpp2py_export.so: undefined symbol: _ZNK6google8protobuf7Message13SpaceUsedLongEv
One thing I found is that before (when everything was still working), protoc --version
would give me libprotoc 2.6.1
. Now after compiling pytorch protoc --version
gives me libprotoc 3.5.0
. Could it be that there is an issue with conflicting protobuf libraries?
yeah protobuf version need to match, I think we are using 2.6, cc @bddppq
How to solve this then, if caffe is using a different version? I tried building caffe with BUILD_CUSTOM_PROTOBUF=OFF, forcing it to use the 2.6.1 protobuf that was installed already - causes it to fail on runtime, when running inference with an ONNX model.
I created a Docker to reproduce the error. You can either build it yourself with this file, or download the image via docker pull mschwier/debug-caffe2-onnx
Then just run it with docker run <image name>
. It will execute a small Python script that simply contains
import onnx
import caffe2.python.onnx.backend
which will throw the above mentioned ImportError.
Hope this helps to understand/fix the problem.
@bddppq @houseroad could you take a look?
@michaelschwier could you try to add import onnx.backend
before import caffe2.python.onnx.backend
? I think this should solve the issue.
@houseroad Unfortunately it didn't. Actually no matter in which order I run the three imports
import onnx
import onnx.backend
import caffe2.python.onnx.backend
I will always get the same error. I think this has something to do with Caffe2 building it's own version of protobuf which is not compatible with ONNX!?
I installed onnx with conda, having the same ERROR
conda install -c conda-forge onnx
conda install -c ezyang onnx
have the same error
I am trying to use this tool for converting a caffe2 model to onnx model using the example given #3
I am trying to convert resnet-101 model.
Below is my error log:
can someone help me out with the above issue.