Open soldatjiang opened 7 years ago
I am getting the same issue. Perhaps https://github.com/tensorflow/tensorflow/issues/13607 is related?
I was not able to solve the problem, and you?
@apennisi It was the culmination of a few days worth of bashing my head against a wall and collating from many sources on the fly. I have my fork with 2to3 conversion. ( Which is what I presume caused your issue ). Specifically most changes were Makefile changes. ( here )
TF_LIB=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_lib())')
and modify the g++
call to include -L$TF_LIB -ltensorflow_framework
-arch=sm_61
.I'm not claiming this will fix your issues. You may then encounter issues when running the demo. This is due to encoding issues caused by the 2to3 conversion. The solution to these was a combination of eragonruan/text-detection-ctpn and CharlesShang/TFFRCNN
This may be a little beyond scope of your original error, but I believe the cause was your attempt at 2to3 conversion, alongside the MakeFile issues with your system. If you could feedback on any of the above steps, this would be very useful; additionally, this may provide a singular location for others who like us were struggling with errors.
@awilliamson I already tried all these fixes without success..I receive always that error. I already converted from python2 to python3 and on my macbook (cpu) works. I am trying on a server with a Tesla TK80 and I have such an error. Do you have any other suggestions?
@apennisi Not quite sure without more information regarding your environment etc. It does sound odd, as the fix for your specific undefined symbol is TF_LIB linking in step 2. You shouldn't be getting that error on a CPU only implementation to my knowledge (ensure you pass the cpu only flag to Faster-RCNN). Additionally for a K80, it is a different architecture. This article shows some of the sm_XX
codes for various cards and their respective CUDA variants.
I admit, it is a hard problem to solve, and took me a day or two to collate enough information to solve it for my specific platform. Feel free to e-mail me on my institutional e-mail address ( shouldn't be hard to find / figure out ;) ) if you want to discuss this further. If we can figure out your problem, then it might be suitable to respond here once found.
Of course, I change the architecture! Did you change something else?
I solve it,
I downgraded tensorflow to 1.3
I've change demo.py
I've GTX 1080 Ti.
at line 114
config = tf.ConfigProto(allow_soft_placement=True)
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
but your 2° step for me doesn't work, in make.sh
g++ -std=c++11 -shared -o roi_pooling.so roi_pooling_op.cc \ roi_pooling_op.cu.o -I $TF_INC -D GOOGLE_CUDA=1 -fPIC $CXXFLAGS \ -D_GLIBCXX_USE_CXX11_ABI=0 -lcudart -L $CUDA_PATH/lib64 -L $TF_LIB -ltensorflow_framework
_/usr/bin/ld: cannot find -ltensorflowframework collect2: error: ld returned 1 exit status
I think the problem is that your tensorflow version is too high. My cuda version is 8.0. My cudnn version is 6.0. At the first time, I used "pip install --user tensorflow-gpu" to install tensorflow whose version is 1.4.1. So I met the same problem said above. At the second time, I downloaded the "Linux GPU: Python 2" package from https://github.com/tensorflow/tensorflow. And finished the installation by "pip install tf_nightly_gpu-1.head-cp27-none-linux_x86_64.whl". This time the tensorflow version changed to 1.4.0-dev20170920. In Faster-RCNN_TF/lib, before "make", I edited the file:~/.local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/platform/default/mutex.h by the reference of https://github.com/smallcorgi/Faster-RCNN_TF/issues/245 At last, I succeed to run the demo.py. python ./tools/demo.py --model ./models/VGGnet_fast_rcnn_iter_70000.ckpt
awilliamson is right! I use his way and solved the problem . add this compile flag: LIBS_FLGAS=-L/usr/local/lib/python2.7/dist-packages/tensorflow -ltensorflow_framework
@awilliamson Hi, thanks for your solution. But it does not work for me. I encountered the new issues as:
tensorflow.python.framework.errors_impl.NotFoundError: /home/wtliao/work_space/Faster-RCNN_TF/tools/../lib/roi_pooling_layer/roi_pooling.so: undefined symbol: _ZN10tensorflow7strings6StrCatB5cxx11ERKNS08AlphaNumES3 only a little different. Could you help me? thanks
@awilliamson the only way i can fix this problem is to use tf1.3+cuda8.0+cudnn6.0... so sad
my environment is: cuda 9.0 ; tensorflow 1.8.0. python3.6 this is my solution, just change:
g++ -std=c++11 -shared -o roi_pooling.so roi_pooling_op.cc \
roi_pooling_op.cu.o -I $TF_INC -D GOOGLE_CUDA=1 -fPIC $CXXFLAGS \
-lcudart -L $CUDA_PATH/lib64
to
TF_LIB=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_lib())')
g++ -std=c++11 -shared -o roi_pooling.so roi_pooling_op.cc \
roi_pooling_op.cu.o -I $TF_INC -D GOOGLE_CUDA=1 -fPIC $CXXFLAGS \
-lcudart -L $CUDA_PATH/lib64 -L $TF_LIB -ltensorflow_framework
You can use both include and lib to solve it:
TF_INC=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())')
TF_LIB=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_lib())')
nvcc -std=c++11 -c -o roi_pooling_op_gpu.cu.o roi_pooling_op_gpu.cu.cc \ -I $TF_INC -L $TF_LIB -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC $CXXFLAGS
g++ -std=c++11 -D_GLIBCXX_USE_CXX11_ABI=0 -shared -o ./build/roi_pooling.so roi_pooling_op.cc \ roi_pooling_op_gpu.cu.o -I $TF_INC -fPIC $CXXFLAGS -D_GLIBCXX_USE_CXX11_ABI=0 -lcudart -L $CUDA_HOME/lib64 -L $TF_LIB -ltensorflow_framework
rm -rf roi_pooling_op_gpu.cu.o
my environment is: cuda 9.0 ; tensorflow 1.8.0. python3.6 this is my solution, just change:
g++ -std=c++11 -shared -o roi_pooling.so roi_pooling_op.cc \ roi_pooling_op.cu.o -I $TF_INC -D GOOGLE_CUDA=1 -fPIC $CXXFLAGS \ -lcudart -L $CUDA_PATH/lib64
to
TF_LIB=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_lib())') g++ -std=c++11 -shared -o roi_pooling.so roi_pooling_op.cc \ roi_pooling_op.cu.o -I $TF_INC -D GOOGLE_CUDA=1 -fPIC $CXXFLAGS \ -lcudart -L $CUDA_PATH/lib64 -L $TF_LIB -ltensorflow_framework
hello, my envs is same with you ,that is cuda 9.0 too, but i got a erro with you said: erro is:
ImportError: Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 41, in
@ambr89 I got the same error as you, compiling with -ltensorflow_framework
didn't work. I tried to look for libtensorflow_framework.so
and couldn't find it but found libtensorflow_framework.so.1
instead inside /usr/local/lib/python2.7/dist-packages/tensorflow
. So I made a copy called libtensorflow_framework.so
and that fixed it. Hope that helps!
my error @soldatjiang same error:
tensorflow.python.framework.errors_impl.NotFoundError: /home/ii/app/Faster-RCNN_TF-master/tools/../lib/roi_pooling_layer/roi_pooling.so: undefined symbol: _ZTIN10tensorflow8OpKernelE
@awilliamson Hope your helps!! my environment is: cuda 9.0 ; cudnn7.1.2 tensorflow 1.10.0 python3.5.5
soldat@soldat:~/Program/Faster-RCNN_TF$ python ./tools/demo.py --model ./data/faster_rcnn_models/VGG16_faster_rcnn_final.caffemodel Traceback (most recent call last): File "./tools/demo.py", line 11, in
from networks.factory import get_network
File "/home/soldat/Program/Faster-RCNN_TF/tools/../lib/networks/init.py", line 8, in
from .VGGnet_train import VGGnet_train
File "/home/soldat/Program/Faster-RCNN_TF/tools/../lib/networks/VGGnet_train.py", line 2, in
from networks.network import Network
File "/home/soldat/Program/Faster-RCNN_TF/tools/../lib/networks/network.py", line 3, in
import roi_pooling_layer.roi_pooling_op as roi_pool_op
File "/home/soldat/Program/Faster-RCNN_TF/tools/../lib/roi_pooling_layer/roi_pooling_op.py", line 5, in
_roi_pooling_module = tf.load_op_library(filename)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/load_library.py", line 56, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename, status)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/errors_impl.py", line 473, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: /home/soldat/Program/Faster-RCNN_TF/tools/../lib/roi_pooling_layer/roi_pooling.so: undefined symbol: _ZTIN10tensorflow8OpKernelE