Open teddybearz opened 7 years ago
to reproduce (after download VGGnet_fast_rcnn_iter_70000.ckpt to ~/):
` docker run -v ~/VGGnet_fast_rcnn_iter_70000.ckpt:/VGGnet_fast_rcnn_iter_70000.ckpt -it -p 8888:8888 tensorflow/tensorflow bash
sudo apt-get update sudo apt-get install -y git sudo apt-get install -y python-opencv sudo apt-get install -y python-tk
pip install cython pip install easydict pip install image
sudo ln /dev/null /dev/raw1394
git clone --recursive https://github.com/smallcorgi/Faster-RCNN_TF.git
cd Faster-RCNN_TF/lib make cd .. python ./tools/demo.py --model /VGGnet_fast_rcnn_iter_70000.ckpt
`
I also encounter the same problem.
I have encountered the same fault, too. And I wonder the solution to this problem. Thanks!
me too
I am facing the similar problem when I start to train it on CPU or run a demo. Solution for this ?
Hi, I am getting the same error while trying to run demo.py: tensorflow.python.framework.errors_impl.NotFoundError: /home/fmc/rcnn/Faster-RCNN_TF/tools/../lib/roi_pooling_layer/roi_pooling.so: undefined symbol: _Z22ROIPoolBackwardLaucherPKffiiiiiiiS0_PfPKiRKN5Eigen9GpuDeviceE I added "-D_GLIBCXX_USE_CXX11_ABI=0" in make,sh. I use g++ version 5.4.0 and TF V0.12. Btw, am trying to run this on CPU. Any help is highly appreciated. -Siva
Can we train this model using CPU itself?
I meet similar issue in ubuntu16.04 with g++ version 5.4.0 and TF v0.12.Befor add "-D_GLIBCXX_USE_CXX11_ABI=0" in make.sh, show "_ZN10tensorflow7strings6StrCatB5cxx11ERKNS0_8AlphaNumE" when run the demo, and after add ,show "_Z22ROIPoolBackwardLaucherPKffiiiiiiiS0_PfPKiRKN5Eigen9GpuDeviceE" when run the demo.
I have'nt GPU,How can I run the demo in CPU-noly mode?
Having the same problem (_Z22ROIPoolBackwardLaucherPKffiiiiiiiS0_PfPKiRKN5Eigen9GpuDeviceE) when trying to train on CPU. Adding "-D_GLIBCXX_USE_CXX11_ABI=0" to the g++ command in make.sh and re-making didn't help. Thanks.
g++ -std=c++11 -shared -o roi_pooling.so roi_pooling_op.cc \
roi_pooling_op.cu.o -I $TF_INC -D GOOGLE_CUDA=1 -fPIC $CXXFLAGS -D_GLIBCXX_USE_CXX11_ABI=0 \
-lcudart -L $CUDA_PATH/lib64
same problem here.
same problem here as well
me too :(
Had to modify the make.sh file to change the GPU architecture to match mine (sm_61), then had to change the Cuda path (in Arch linux is /opt/cuda).
same problem! Befor add "-D_GLIBCXX_USE_CXX11_ABI=0" in make.sh, show "_ZN10tensorflow7strings6StrCatB5cxx11ERKNS0_8AlphaNumE" when run the demo, and after add ,show "_Z22ROIPoolBackwardLaucherPKffiiiiiiiS0_PfPKiRKN5Eigen9GpuDeviceE" when run the demo.
@googleios @raviv have u solve the problem?
Hi all, I've figured out a workaround to use only the CPU. I have only tested this method for the demo script, not sure if it will work for training, but it should.
Download and Install CUDA: https://developer.nvidia.com/cuda-downloads
Compile for GPU OR Copy my .so You can download my .so file from here: https://drive.google.com/open?id=0B-0d5quIGY5XVEJvYU9XRkVJTWM Or you can run make.sh and compile with CUDA (not sure if this will work)
Include these lines of code at the top of your Python scripts import os os.environ['CUDA_VISIBLE_DEVICES'] = ''
I succeed to run another faster-rcnn on CPU from this repo
@louisquinn I succeed with your method. thx~
@louisquinn, I would like to know how you managed to install and run the example you mentioned without a GPU
@guotong1988, tf-faster-rcnn requires GPU. How you managed to install without a GPU
@louisquinn, I was able to reproduce your script. It worked.
How you managed to install without a GPU ?
I installed the CUDA driver, although the machine does not have the card. Then I set it to use CPU only. It worked!
@louisquinn, hi, I add " import os os.environ['CUDA_VISIBLE_DEVICES'] = ''" " to file "demo.py" and "_init_paths.py" and "setup.py". But it seems do not work , the error message is "RuntimeError: Invalid DISPLAY variable". Which file should "os.environ['CUDA_VISIBLE_DEVICES'] = ''" " be add to?
The method of installing Cuda mentioned by @louisquinn works for me! Thanks! :smile:
@xiaoqo Apologies for the late reply! You should add the line to "demo.py", however it MUST be before the Tensorflow session is created, so before line 112.
Also, you guys will be interested in this: https://github.com/tensorflow/models/tree/master/object_detection Official API for deep learning object detection with various state of the art models and frameworks, no more VGG16! It's really easy to use. If you install Tensorflow for CPU it will run out of the box, however if you installed for GPU and wish to run CPU only, you will have to use the same method I mentioned in this thread.
Hi
i find the root causes of the issue. when use CPU only mode without installing cude , library roi_pooling.so compile function "ROIPoolBackwardLaucher" into it.However, the function is implemented in cuda related module and only for GPU.So when execute demo, can't find the implement of function ROIPoolBackwardLaucher,crash happen.
i prepare a patch for that issue, and verified the issue is gone after applying the patch. when i try to push the patch, i find there was a patch there but isn't merged:
you can refer to: https://github.com/smallcorgi/Faster-RCNN_TF/pull/183/commits/0dcb55cebeaa85c9f0a46ff62384bbeaae98323e
or use my patch: https://drive.google.com/file/d/0BxlQuWrSazOxd29PNjVIenZneHM/view?usp=sharing
Best wishes! Zhuojin
@louisquinn i did following:
`python setup.py build_ext --inplace running build_ext skipping 'utils/bbox.c' Cython extension (up-to-date) skipping 'utils/nms.c' Cython extension (up-to-date) skipping 'nms/cpu_nms.c' Cython extension (up-to-date) skipping 'nms/gpu_nms.cpp' Cython extension (up-to-date) rm -rf build bash make.sh /home/liverpool/.local/lib/python3.5/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1254): warning: calling a constexpr host function("real") from a host device function("abs") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.
/home/liverpool/.local/lib/python3.5/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1254): warning: calling a constexpr host function("imag") from a host device function("abs") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.
/home/liverpool/.local/lib/python3.5/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1254): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.
/home/liverpool/.local/lib/python3.5/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1254): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.
/home/liverpool/.local/lib/python3.5/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1259): warning: calling a constexpr host function("real") from a host device function("abs") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.
/home/liverpool/.local/lib/python3.5/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1259): warning: calling a constexpr host function("imag") from a host device function("abs") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.
/home/liverpool/.local/lib/python3.5/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1259): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.
/home/liverpool/.local/lib/python3.5/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1259): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.
/home/liverpool/.local/lib/python3.5/site-packages/tensorflow/include/unsupported/Eigen/CXX11/src/Tensor/TensorRandom.h(133): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.
/home/liverpool/.local/lib/python3.5/site-packages/tensorflow/include/unsupported/Eigen/CXX11/src/Tensor/TensorRandom.h(138): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.
/home/liverpool/.local/lib/python3.5/site-packages/tensorflow/include/unsupported/Eigen/CXX11/src/Tensor/TensorRandom.h(212): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.
/home/liverpool/.local/lib/python3.5/site-packages/tensorflow/include/unsupported/Eigen/CXX11/src/Tensor/TensorRandom.h(217): warning: calling a constexpr host function from a host device function is not all`
And if i run sudo make, i receive following:
`
Can anyone help me with that?
Kind Regards Igor
@louisquinn , Hi, i follow your advices, and i copy your roi_pooling.so fie to my repo. And modify demo.py file to add os.environ['CUDA_VISIBLE_DEVICES'] = ''. Then i run the demo, but it display:
Traceback (most recent call last):
File "./tools/demo.py", line 11, in
@liuqi05 It looks like you didn't install CUDA 8.0 and CuDNN 5.1. For this method to work, you have to set your system up as though you do have a GPU. Replacing the roi_pooling.so file is just so you don't have to compile it yourself.
I would like to refer you to the official Tensorflow Object Detection API: https://github.com/tensorflow/models/tree/master/object_detection All you need to do is add the os.environ['CUDA_VISIBLE_DEVICES'] = '' line to run on CPU with this framework
@louisquinn, thank you for your quick reply. But i want to know which file i should add the os.environ['CUDA_VISIBLE_DEVICES'] = '' line to run on CPU with the framework you suggest. train.py and eval.py files ?
For the official framework: If you installed Tensorflow without GPU support and you don't have a GPU, it will automatically process on the CPU.
If you have a GPU and installed with GPU support you will have to add the os.environ line.
If you add the os.environ line it should be defined at any point before you define your tf.Session
@louisquinn, Now i understand. I do not need add the line to files. Because i installed Tensorflow without GPU support. Thank you for your patience. Now i am trying to run locally step by step. When i encounter problem, may be i need your help again. And thank you again.
@liuqi05 No worries! I recommend starting with one of the pre-trained models to learn how the framework works. You can email me direct at louisquinn.contact@gmail.com
@louisquinn, Thank you very much. I will send mail to you.
Hi all, I want know if the anaconda3 and python3.6 can be run the project? In my environment the error log report as below: ModuleNotFoundError: No module named 'easydict' (tensorflow) dongdong@ubuntu:~/ai/tensorflow/Faster-RCNN_TF$ conda install -c https://conda.anaconda.org/auto easydict Fetching package metadata ............. Solving package specifications: .
UnsatisfiableError: The following specifications were found to be in conflict:
Thanks!
@louisquinn Hi , I want know if the anaconda3 and python3.6 can be run the project? In my environment the error log report as below: ModuleNotFoundError: No module named 'easydict' (tensorflow) dongdong@ubuntu:~/ai/tensorflow/Faster-RCNN_TF$ conda install -c https://conda.anaconda.org/auto easydict Fetching package metadata ............. Solving package specifications: .
UnsatisfiableError: The following specifications were found to be in conflict:
easydict -> python 2.7 -> openssl 1.0.1 python 3.6* Use "conda info " to see the dependencies for each package. Thanks!
@jhcruvinel ,I have no NVIDIA's card ,but i run make.sh and compile with CUDA, installed the CUDA driver ,when i do "python demo.py --cpu --model /Faster-RCNN_TF-master/input_model/VGGnet_fast_rcnn_iter_70000.ckpt".The result is this :Loaded network /Faster-RCNN_TF-master/input_model/VGGnet_fast_rcnn_iter_70000.ckpt NVIDIA: no NVIDIA devices found unknown error so what should i do?thanks!~~~
running inside the latest docker tensorflow:
docker run -it -p 8888:8888 tensorflow/tensorflow
`
root@f54905c5bdaf:/notebooks/Faster-RCNN_TF# python ./tools/demo.py --model /VGGnet_fast_rcnn_iter_70000.ckpt Traceback (most recent call last): File "./tools/demo.py", line 11, in
from networks.factory import get_network
File "/notebooks/Faster-RCNN_TF/tools/../lib/networks/init.py", line 8, in
from .VGGnet_train import VGGnet_train
File "/notebooks/Faster-RCNN_TF/tools/../lib/networks/VGGnet_train.py", line 2, in
from networks.network import Network
File "/notebooks/Faster-RCNN_TF/tools/../lib/networks/network.py", line 3, in
import roi_pooling_layer.roi_pooling_op as roi_pool_op
File "/notebooks/Faster-RCNN_TF/tools/../lib/roi_pooling_layer/roi_pooling_op.py", line 5, in
_roi_pooling_module = tf.load_op_library(filename)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/load_library.py", line 63, in load_op_library
raise errors._make_specific_exception(None, None, error_msg, error_code)
tensorflow.python.framework.errors.NotFoundError: /notebooks/Faster-RCNN_TF/tools/../lib/roi_pooling_layer/roi_pooling.so: undefined symbol: _Z22ROIPoolBackwardLaucherPKffiiiiiiiS0_PfPKiRKN5Eigen9GpuDeviceE
root@f54905c5bdaf:/notebooks/Faster-RCNN_TF# nm -gC lib/roi_pooling_layer/roi_pooling.so |grep GpuDevice U ROIPoolForwardLaucher(float const, float, int, int, int, int, int, int, float const, float, int, Eigen::GpuDevice const&) U ROIPoolBackwardLaucher(float const, float, int, int, int, int, int, int, int, float const, float, int const, Eigen::GpuDevice const&) U Eigen::GpuDevice const& tensorflow::OpKernelContext::eigen_device() const
`