smallcorgi / Faster-RCNN_TF

Faster-RCNN in Tensorflow
MIT License
2.34k stars 1.12k forks source link

Failed to run demo.py with undefined symbol,how can I solve this problem #232

Open soldatjiang opened 7 years ago

soldatjiang commented 7 years ago

soldat@soldat:~/Program/Faster-RCNN_TF$ python ./tools/demo.py --model ./data/faster_rcnn_models/VGG16_faster_rcnn_final.caffemodel Traceback (most recent call last): File "./tools/demo.py", line 11, in from networks.factory import get_network File "/home/soldat/Program/Faster-RCNN_TF/tools/../lib/networks/init.py", line 8, in from .VGGnet_train import VGGnet_train File "/home/soldat/Program/Faster-RCNN_TF/tools/../lib/networks/VGGnet_train.py", line 2, in from networks.network import Network File "/home/soldat/Program/Faster-RCNN_TF/tools/../lib/networks/network.py", line 3, in import roi_pooling_layer.roi_pooling_op as roi_pool_op File "/home/soldat/Program/Faster-RCNN_TF/tools/../lib/roi_pooling_layer/roi_pooling_op.py", line 5, in _roi_pooling_module = tf.load_op_library(filename) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/load_library.py", line 56, in load_op_library lib_handle = py_tf.TF_LoadLibrary(library_filename, status) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/errors_impl.py", line 473, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.NotFoundError: /home/soldat/Program/Faster-RCNN_TF/tools/../lib/roi_pooling_layer/roi_pooling.so: undefined symbol: _ZTIN10tensorflow8OpKernelE

awilliamson commented 6 years ago

I am getting the same issue. Perhaps https://github.com/tensorflow/tensorflow/issues/13607 is related?

apennisi commented 6 years ago

I was not able to solve the problem, and you?

awilliamson commented 6 years ago

@apennisi It was the culmination of a few days worth of bashing my head against a wall and collating from many sources on the fly. I have my fork with 2to3 conversion. ( Which is what I presume caused your issue ). Specifically most changes were Makefile changes. ( here )

  1. Ensure the CUDA_PATH at the top is changed to your path, or alternatively replace it in-line. This way the CUDA section gets executed.
  2. Define TF_LIB=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_lib())') and modify the g++ call to include -L$TF_LIB -ltensorflow_framework
  3. The default arch set by this repository does not account for 10-series cards. I was running on a few GTX Titan XP ( and 1080 ). Therefore I set -arch=sm_61.

I'm not claiming this will fix your issues. You may then encounter issues when running the demo. This is due to encoding issues caused by the 2to3 conversion. The solution to these was a combination of eragonruan/text-detection-ctpn and CharlesShang/TFFRCNN

This may be a little beyond scope of your original error, but I believe the cause was your attempt at 2to3 conversion, alongside the MakeFile issues with your system. If you could feedback on any of the above steps, this would be very useful; additionally, this may provide a singular location for others who like us were struggling with errors.

apennisi commented 6 years ago

@awilliamson I already tried all these fixes without success..I receive always that error. I already converted from python2 to python3 and on my macbook (cpu) works. I am trying on a server with a Tesla TK80 and I have such an error. Do you have any other suggestions?

awilliamson commented 6 years ago

@apennisi Not quite sure without more information regarding your environment etc. It does sound odd, as the fix for your specific undefined symbol is TF_LIB linking in step 2. You shouldn't be getting that error on a CPU only implementation to my knowledge (ensure you pass the cpu only flag to Faster-RCNN). Additionally for a K80, it is a different architecture. This article shows some of the sm_XX codes for various cards and their respective CUDA variants. I admit, it is a hard problem to solve, and took me a day or two to collate enough information to solve it for my specific platform. Feel free to e-mail me on my institutional e-mail address ( shouldn't be hard to find / figure out ;) ) if you want to discuss this further. If we can figure out your problem, then it might be suitable to respond here once found.

apennisi commented 6 years ago

Of course, I change the architecture! Did you change something else?

ambr89 commented 6 years ago

I solve it,

I downgraded tensorflow to 1.3

I've change demo.py I've GTX 1080 Ti. at line 114 config = tf.ConfigProto(allow_soft_placement=True) config.gpu_options.allow_growth = True sess = tf.Session(config=config)

but your 2° step for me doesn't work, in make.sh

g++ -std=c++11 -shared -o roi_pooling.so roi_pooling_op.cc \ roi_pooling_op.cu.o -I $TF_INC -D GOOGLE_CUDA=1 -fPIC $CXXFLAGS \ -D_GLIBCXX_USE_CXX11_ABI=0 -lcudart -L $CUDA_PATH/lib64 -L $TF_LIB -ltensorflow_framework

_/usr/bin/ld: cannot find -ltensorflowframework collect2: error: ld returned 1 exit status

trikim commented 6 years ago

I think the problem is that your tensorflow version is too high. My cuda version is 8.0. My cudnn version is 6.0. At the first time, I used "pip install --user tensorflow-gpu" to install tensorflow whose version is 1.4.1. So I met the same problem said above. At the second time, I downloaded the "Linux GPU: Python 2" package from https://github.com/tensorflow/tensorflow. And finished the installation by "pip install tf_nightly_gpu-1.head-cp27-none-linux_x86_64.whl". This time the tensorflow version changed to 1.4.0-dev20170920. In Faster-RCNN_TF/lib, before "make", I edited the file:~/.local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/platform/default/mutex.h by the reference of https://github.com/smallcorgi/Faster-RCNN_TF/issues/245 At last, I succeed to run the demo.py. python ./tools/demo.py --model ./models/VGGnet_fast_rcnn_iter_70000.ckpt

xtanitfy commented 6 years ago

awilliamson is right! I use his way and solved the problem . add this compile flag: LIBS_FLGAS=-L/usr/local/lib/python2.7/dist-packages/tensorflow -ltensorflow_framework

wtliao commented 6 years ago

@awilliamson Hi, thanks for your solution. But it does not work for me. I encountered the new issues as:

tensorflow.python.framework.errors_impl.NotFoundError: /home/wtliao/work_space/Faster-RCNN_TF/tools/../lib/roi_pooling_layer/roi_pooling.so: undefined symbol: _ZN10tensorflow7strings6StrCatB5cxx11ERKNS08AlphaNumES3 only a little different. Could you help me? thanks

wtliao commented 6 years ago

@awilliamson the only way i can fix this problem is to use tf1.3+cuda8.0+cudnn6.0... so sad

ChanChiChoi commented 6 years ago

my environment is: cuda 9.0 ; tensorflow 1.8.0. python3.6 this is my solution, just change:

    g++ -std=c++11 -shared -o roi_pooling.so roi_pooling_op.cc \
        roi_pooling_op.cu.o -I $TF_INC  -D GOOGLE_CUDA=1 -fPIC $CXXFLAGS \
        -lcudart -L $CUDA_PATH/lib64

to

    TF_LIB=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_lib())')
    g++ -std=c++11 -shared  -o roi_pooling.so  roi_pooling_op.cc  \
         roi_pooling_op.cu.o -I $TF_INC  -D GOOGLE_CUDA=1 -fPIC $CXXFLAGS  \
         -lcudart -L $CUDA_PATH/lib64  -L $TF_LIB -ltensorflow_framework
cfh3c commented 6 years ago

You can use both include and lib to solve it: TF_INC=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())')
TF_LIB=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_lib())')

nvcc -std=c++11 -c -o roi_pooling_op_gpu.cu.o roi_pooling_op_gpu.cu.cc \ -I $TF_INC -L $TF_LIB -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC $CXXFLAGS

g++ -std=c++11 -D_GLIBCXX_USE_CXX11_ABI=0 -shared -o ./build/roi_pooling.so roi_pooling_op.cc \ roi_pooling_op_gpu.cu.o -I $TF_INC -fPIC $CXXFLAGS -D_GLIBCXX_USE_CXX11_ABI=0 -lcudart -L $CUDA_HOME/lib64 -L $TF_LIB -ltensorflow_framework

rm -rf roi_pooling_op_gpu.cu.o

chenyanyin commented 5 years ago

my environment is: cuda 9.0 ; tensorflow 1.8.0. python3.6 this is my solution, just change:

    g++ -std=c++11 -shared -o roi_pooling.so roi_pooling_op.cc \
        roi_pooling_op.cu.o -I $TF_INC  -D GOOGLE_CUDA=1 -fPIC $CXXFLAGS \
        -lcudart -L $CUDA_PATH/lib64

to

    TF_LIB=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_lib())')
    g++ -std=c++11 -shared  -o roi_pooling.so  roi_pooling_op.cc  \
         roi_pooling_op.cu.o -I $TF_INC  -D GOOGLE_CUDA=1 -fPIC $CXXFLAGS  \
         -lcudart -L $CUDA_PATH/lib64  -L $TF_LIB -ltensorflow_framework

hello, my envs is same with you ,that is cuda 9.0 too, but i got a erro with you said: erro is:

ImportError: Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 41, in from tensorflow.python.pywrap_tensorflow_internal import * File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in _pywrap_tensorflow_internal = swig_import_helper() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description) ImportError: libcusolver.so.8.0: cannot open shared object file: No such file or directory

emilyfy commented 5 years ago

@ambr89 I got the same error as you, compiling with -ltensorflow_framework didn't work. I tried to look for libtensorflow_framework.so and couldn't find it but found libtensorflow_framework.so.1 instead inside /usr/local/lib/python2.7/dist-packages/tensorflow. So I made a copy called libtensorflow_framework.so and that fixed it. Hope that helps!

lijf138 commented 4 years ago

my error @soldatjiang same error:

tensorflow.python.framework.errors_impl.NotFoundError: /home/ii/app/Faster-RCNN_TF-master/tools/../lib/roi_pooling_layer/roi_pooling.so: undefined symbol: _ZTIN10tensorflow8OpKernelE

@awilliamson Hope your helps!! my environment is: cuda 9.0 ; cudnn7.1.2 tensorflow 1.10.0 python3.5.5