onnx / onnx-caffe2

Caffe2 implementation of Open Neural Network Exchange (ONNX)
Other
165 stars 66 forks source link

undefined symbol: _ZNK6google8protobuf7Message13SpaceUsedLongEv #19

Open lcskrishna opened 7 years ago

lcskrishna commented 7 years ago

I am trying to use this tool for converting a caffe2 model to onnx model using the example given #3

I am trying to convert resnet-101 model.

Below is my error log:

Traceback (most recent call last):
  File "conversion.py", line 1, in <module>
    import onnx_caffe2.frontend as c2_onnx
  File "/home/chaitanya/.local/lib/python2.7/site-packages/onnx_caffe2/frontend.py", line 8, in <module>
    from onnx import onnx_pb2, checker
  File "/home/chaitanya/.local/lib/python2.7/site-packages/onnx/__init__.py", line 7, in <module>
    from . import checker, helper
  File "/home/chaitanya/.local/lib/python2.7/site-packages/onnx/checker.py", line 14, in <module>
    from onnx import defs
  File "/home/chaitanya/.local/lib/python2.7/site-packages/onnx/defs/__init__.py", line 6, in <module>
    import onnx.onnx_cpp2py_export as C
ImportError: /home/chaitanya/.local/lib/python2.7/site-packages/onnx/onnx_cpp2py_export.so: undefined symbol: _ZNK6google8protobuf7Message13SpaceUsedLongEv

can someone help me out with the above issue.

ezyang commented 7 years ago

Hi @lcskrishna. Did you install using the binaries (conda install -c ezyang onnx) or from source?

lcskrishna commented 7 years ago

I have installed using source (pip install onnx)

ezyang commented 7 years ago

Please pip uninstall onnx and then try the binary install.

bddppq commented 7 years ago

@lcskrishna Did you do pip inside a conda virtual environment?

lcskrishna commented 7 years ago

@bddppq No, I don't have conda virtual environment setup, I'm trying directly using pip install. Also, I have installed protobuf and protoc from github source and I have installed numpy using pip.

bddppq commented 7 years ago

@lcskrishna I see. Which OS are you using? Could you do ldd /home/chaitanya/.local/lib/python2.7/site-packages/onnx/onnx_cpp2py_export.so and readelf -d /home/chaitanya/.local/lib/python2.7/site-packages/onnx/onnx_cpp2py_export.so and paste the output here? Also echo $LD_LIBRARY_PATH.

lcskrishna commented 7 years ago

@bddppq

I am using Ubuntu 16.04

Here are the outputs you asked for:

%ldd /home/chaitanya/.local/lib/python2.7/site-packages/onnx/onnx_cpp2py_export.so

 linux-vdso.so.1 =>  (0x00007ffd2ad53000)
        libprotobuf.so.9 => /usr/lib/x86_64-linux-gnu/libprotobuf.so.9 (0x00007f0f3edc6000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f0f3ebb0000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f0f3e992000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f0f3e5c8000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f0f3e3ae000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f0f3e02b000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f0f3dd22000)
        /lib64/ld-linux-x86-64.so.2 (0x00005563d2b0c000)

%readelf -d /home/chaitanya/.local/lib/python2.7/site-packages/onnx/onnx_cpp2py_export.so

Dynamic section at offset 0x67bd0 contains 27 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libprotobuf.so.9]
 0x0000000000000001 (NEEDED)             Shared library: [libgcc_s.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libpthread.so.0]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x000000000000000c (INIT)               0x13a48
 0x000000000000000d (FINI)               0x4feb4
 0x0000000000000019 (INIT_ARRAY)         0x2671f0
 0x000000000000001b (INIT_ARRAYSZ)       96 (bytes)
 0x000000000000001a (FINI_ARRAY)         0x267250
 0x000000000000001c (FINI_ARRAYSZ)       8 (bytes)
 0x000000006ffffef5 (GNU_HASH)           0x1f0
 0x0000000000000005 (STRTAB)             0x5be0
 0x0000000000000006 (SYMTAB)             0x18d8
 0x000000000000000a (STRSZ)              41221 (bytes)
 0x000000000000000b (SYMENT)             24 (bytes)
 0x0000000000000003 (PLTGOT)             0x268000
 0x0000000000000002 (PLTRELSZ)           4896 (bytes)
 0x0000000000000014 (PLTREL)             RELA
 0x0000000000000017 (JMPREL)             0x12728
 0x0000000000000007 (RELA)               0x102e0
 0x0000000000000008 (RELASZ)             9288 (bytes)
 0x0000000000000009 (RELAENT)            24 (bytes)
 0x000000006ffffffe (VERNEED)            0x10280
 0x000000006fffffff (VERNEEDNUM)         2
 0x000000006ffffff0 (VERSYM)             0xfce6
 0x000000006ffffff9 (RELACOUNT)          186
 0x0000000000000000 (NULL)               0x0

% echo $LD_LIBRARY_PATH


/usr/local/lib
bddppq commented 7 years ago

@lcskrishna Hmm...everything looks normal to me. Could you also do nm -C /usr/lib/x86_64-linux-gnu/libprotobuf.so.9 | grep SpaceUsedLong?

lcskrishna commented 7 years ago

I am getting the following output :

nm: /usr/lib/x86_64-linux-gnu/libprotobuf.so.9: no symbols
bddppq commented 7 years ago

@lcskrishna Not sure whether it's your protobuf installation being broken. Adding "-D" flag to the nm command might help debugging. In the meanwhile, since you are using ubuntu, could you use "sudo apt-get install libprotobuf-dev protobuf-compiler" to install protobuf?

bddppq commented 7 years ago

@lcskrishna Have you been able to resolve the issue?

lcskrishna commented 7 years ago

I tried a fresh installation of caffe2, protobuf, onnx and onnx-caffe2 Also, I have used conda installation for onnx. Now the above error doesn't show up, however I am getting the following error while running the conversion:

WARNING:root:This caffe2 python run does not have GPU support. Will run in CPU only mode.
WARNING:root:Debug message: No module named caffe2_pybind11_state_gpu
Traceback (most recent call last):
  File "../../caffe2-conv/conversion.py", line 9, in <module>
    c2_net.ParseFromString(f.read())
google.protobuf.message.DecodeError: Error parsing message

Here is my script:

import onnx_caffe2.frontend as c2_onnx
from caffe2.proto import caffe2_pb2
import os

c2_net = caffe2_pb2.NetDef()
model_path = '/home/chaitu/work/caffe2_models/model/'
c2_model_file = os.path.join(model_path, 'resnet101_init_net.pb')
with open(c2_model_file, 'rb') as f:
    c2_net.ParseFromString(f.read())
onnx_graph = c2_onnx.caffe2_net_to_onnx_graph(c2_net)
bddppq commented 7 years ago

@lcskrishna What's the size of your pb file? I suspect it's hitting the 64mb limit. Could you try 'export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python' in your terminal and then run your code snippet again?

lcskrishna commented 7 years ago

@bddppq the size of the file is around 4mb and i forgot to export. However, I have tried by using a simple CIFAR 10 network.

I took a caffemodel of CIFAR10 and using the conversion tool in caffe2 and translated into caffe2 model and tried to perform conversion as mentioned above, still i get the following error. I'm not sure what's the issue is:

WARNING:root:This caffe2 python run does not have GPU support. Will run in CPU only mode.
WARNING:root:Debug message: No module named caffe2_pybind11_state_gpu
Unrecognized attribute: legacy_pad
Traceback (most recent call last):
  File "conversion.py", line 10, in <module>
    onnx_graph = c2_onnx.caffe2_net_to_onnx_graph(c2_net)
  File "/home/chaitu/.local/lib/python2.7/site-packages/onnx_caffe2/frontend.py", line 254, in caffe2_net_to_onnx_graph
    caffe2_op_to_node_def(op, name_map) for op in net_def.op)
  File "/home/chaitu/.local/lib/python2.7/site-packages/onnx_caffe2/frontend.py", line 254, in <genexpr>
    caffe2_op_to_node_def(op, name_map) for op in net_def.op)
  File "/home/chaitu/.local/lib/python2.7/site-packages/onnx_caffe2/frontend.py", line 205, in caffe2_op_to_node_def
    checker.check_node(node_def)
  File "/home/chaitu/.local/lib/python2.7/site-packages/onnx/checker.py", line 38, in check_node
    'NodeProto of type {} did not pass defs schema check.'.format(str(node.op_type)))
ValueError: NodeProto of type MaxPool did not pass defs schema check.
jerryzh168 commented 7 years ago

@lcskrishna Please add option --remove_legacy_pad when you do the translation from caffe model to caffe2 model.

lcskrishna commented 7 years ago

@jerryzh168 I get the following error while I try to translate using --remove_legacy_pad

Traceback (most recent call last): File "caffe_translator.py", line 853, in input_dims=args.input_dims File "caffe_translator.py", line 259, in TranslateModel return TranslatorRegistry.TranslateModel(*args, **kwargs) File "caffe_translator.py", line 254, in TranslateModel net = _RemoveLegacyPad(net, net_params, input_dims) File "caffe_translator.py", line 124, in _RemoveLegacyPad dim_map = _GetLegacyDims(net, net_params, dummy_input, legacy_pad_ops) File "caffe_translator.py", line 55, in _GetLegacyDims ws.create_blob(param.name) \ AttributeError: 'caffe2.python.caffe2_pybind11_state.Blob' object has no attribute 'feed_blob'

jerryzh168 commented 7 years ago

@lcskrishna could you post your caffe1 model? I'll try to modify caffe_translator to make sure it works with your model.

lcskrishna commented 7 years ago

@jerryzh168 Please find the trained caffemodel

jerryzh168 commented 7 years ago

Can you post the deploy.prototxt as well? Thanks

lcskrishna commented 7 years ago

Here is my prototxt file used.

jerryzh168 commented 7 years ago

@lcskrishna did you update your build to the most recent caffe2? I can translate your model actually. Since a more recent update don't remove legacy pad by default.

As a side note, there was an problem in _GetLegacyDims(should use feed rather than feed_blob), and it will be fixed after my new diff lands.

lcskrishna commented 7 years ago

@jerryzh168 I tried it again and im still getting the same issue. Can you post the command on how you were executing the translator.

Thanks.

jerryzh168 commented 7 years ago

I see, since you need to use remove_legacy_pad, that code will be called. Please wait until my diff landed. Also you should probably provide input_dim by add "--input_dims" option after that diff is landed.

jerryzh168 commented 7 years ago

@lcskrishna The diff is landed, please update your c2 and try again.

lcskrishna commented 7 years ago

@jerryzh168 I tried the following after updating caffe2 and I get the following error.

Command:

python -m caffe2.python.caffe_translator ../caffe_models/cifar.prototxt ../caffe_models/cifar10_quick_iter_4000.caffemodel --remove_legacy_pad --input_dims 1 3 32 32

Error

W1003 22:11:36.411902  3167 workspace.cc:157] Blob label not in the workspace.
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/usr/local/caffe2/python/caffe_translator.py", line 928, in <module>
    input_dims=args.input_dims
  File "/usr/local/caffe2/python/caffe_translator.py", line 299, in TranslateModel
    return TranslatorRegistry.TranslateModel(*args, **kwargs)
  File "/usr/local/caffe2/python/caffe_translator.py", line 294, in TranslateModel
    net = _RemoveLegacyPad(net, net_params, input_dims)
  File "/usr/local/caffe2/python/caffe_translator.py", line 139, in _RemoveLegacyPad
    dim_map = _GetLegacyDims(net, net_params, dummy_input, legacy_pad_ops)
  File "/usr/local/caffe2/python/caffe_translator.py", line 77, in _GetLegacyDims
    ws._run_operator(op_def.SerializeToString())
RuntimeError: [enforce fail at operator.cc:52] blob != nullptr. op Accuracy: Encountered a non-existing input blob: label 
jerryzh168 commented 7 years ago

@lcskrishna is this a train net? Probably you should use "deploy_net" instead.

hongkedavid commented 6 years ago

Hi, I encountered a similar problem when I try to import onnx using python on Ubuntu 14.04.5 LTS (GNU/Linux 3.13.0-135-generic x86_64). I installed protocol buffer from https://github.com/google/protobuf/releases/download/v3.5.1/protobuf-all-3.5.1.zip (version 3.5.1) and also installed onnx using pip (not conda). Below is the error message and some output of my debugging.

$ python -c 'import onnx'

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/onnx/__init__.py", line 10, in <module>
    import onnx.helper  # noqa
  File "/usr/local/lib/python2.7/dist-packages/onnx/helper.py", line 15, in <module>
    import onnx.defs as defs
  File "/usr/local/lib/python2.7/dist-packages/onnx/defs/__init__.py", line 6, in <module>
    import onnx.onnx_cpp2py_export.defs as C
ImportError: /usr/local/lib/python2.7/dist-packages/onnx/onnx_cpp2py_export.so: undefined symbol: _ZNK6google8protobuf7Message13SpaceUsedLongEv

$ ldd /usr/local/lib/python2.7/dist-packages/onnx/onnx_cpp2py_export.so

    linux-vdso.so.1 =>  (0x00007ffc60dc2000)
    libprotobuf.so.8 => /usr/lib/x86_64-linux-gnu/libprotobuf.so.8 (0x00007fc1fba23000)
    libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fc1fb71f000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fc1fb419000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fc1fb203000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc1fae3a000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fc1fac1c000)
    libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fc1faa03000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fc1fbff8000)

$ readelf -d /usr/local/lib/python2.7/dist-packages/onnx/onnx_cpp2py_export.so

Dynamic section at offset 0xd0a30 contains 28 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libprotobuf.so.8]
 0x0000000000000001 (NEEDED)             Shared library: [libstdc++.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libm.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libgcc_s.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x000000000000000c (INIT)               0x25948
 0x000000000000000d (FINI)               0xa2538
 0x0000000000000019 (INIT_ARRAY)         0x2cf500
 0x000000000000001b (INIT_ARRAYSZ)       144 (bytes)
 0x000000000000001a (FINI_ARRAY)         0x2cf590
 0x000000000000001c (FINI_ARRAYSZ)       8 (bytes)
 0x000000006ffffef5 (GNU_HASH)           0x1f0
 0x0000000000000005 (STRTAB)             0xaf80
 0x0000000000000006 (SYMTAB)             0x27a8
 0x000000000000000a (STRSZ)              82207 (bytes)
 0x000000000000000b (SYMENT)             24 (bytes)
 0x0000000000000003 (PLTGOT)             0x2d1000
 0x0000000000000002 (PLTRELSZ)           5904 (bytes)
 0x0000000000000014 (PLTREL)             RELA
 0x0000000000000017 (JMPREL)             0x24238
 0x0000000000000007 (RELA)               0x1fd08
 0x0000000000000008 (RELASZ)             17712 (bytes)
 0x0000000000000009 (RELAENT)            24 (bytes)
 0x000000006ffffffe (VERNEED)            0x1fbf8
 0x000000006fffffff (VERNEEDNUM)         4
 0x000000006ffffff0 (VERSYM)             0x1f0a0
 0x000000006ffffff9 (RELACOUNT)          370
 0x0000000000000000 (NULL)               0x0

$ echo $LD_LIBRARY_PATH /usr/local/lib

$ nm -C /usr/lib/x86_64-linux-gnu/libprotobuf.so.8 | grep SpaceUsedLong nm: /usr/lib/x86_64-linux-gnu/libprotobuf.so.8: no symbols

$ nm -C -D /usr/lib/x86_64-linux-gnu/libprotobuf.so.8 | grep SpaceUsedLong

I did "sudo apt-get install libprotobuf-dev protobuf-compiler" but found the protoc version is too low (2.5.0-9ubuntu1) for onnx (as suggested by this issue). So I manually installed a newer version of protoc (version 3.5.1). I appreciate that if anyone can give some hints on what is wrong.

michaelschwier commented 6 years ago

This is still an issue. I just build caffe (which is now a part of pytorch) and I am getting the same error: usr/local/lib/python2.7/dist-packages/onnx/onnx_cpp2py_export.so: undefined symbol: _ZNK6google8protobuf7Message13SpaceUsedLongEv

One thing I found is that before (when everything was still working), protoc --version would give me libprotoc 2.6.1. Now after compiling pytorch protoc --version gives me libprotoc 3.5.0. Could it be that there is an issue with conflicting protobuf libraries?

jerryzh168 commented 6 years ago

yeah protobuf version need to match, I think we are using 2.6, cc @bddppq

michaelschwier commented 6 years ago

How to solve this then, if caffe is using a different version? I tried building caffe with BUILD_CUSTOM_PROTOBUF=OFF, forcing it to use the 2.6.1 protobuf that was installed already - causes it to fail on runtime, when running inference with an ONNX model.

michaelschwier commented 6 years ago

I created a Docker to reproduce the error. You can either build it yourself with this file, or download the image via docker pull mschwier/debug-caffe2-onnx

Then just run it with docker run <image name>. It will execute a small Python script that simply contains

import onnx
import caffe2.python.onnx.backend

which will throw the above mentioned ImportError.

Hope this helps to understand/fix the problem.

jerryzh168 commented 6 years ago

@bddppq @houseroad could you take a look?

houseroad commented 6 years ago

@michaelschwier could you try to add import onnx.backend before import caffe2.python.onnx.backend? I think this should solve the issue.

michaelschwier commented 6 years ago

@houseroad Unfortunately it didn't. Actually no matter in which order I run the three imports

import onnx
import onnx.backend
import caffe2.python.onnx.backend

I will always get the same error. I think this has something to do with Caffe2 building it's own version of protobuf which is not compatible with ONNX!?

ascenoputing commented 6 years ago

I installed onnx with conda, having the same ERROR

ascenoputing commented 6 years ago

conda install -c conda-forge onnx

conda install -c ezyang onnx

have the same error