smallcorgi / Faster-RCNN_TF

Faster-RCNN in Tensorflow
MIT License
2.34k stars 1.12k forks source link

cannot run demo on CPU mode #36

Open teddybearz opened 7 years ago

teddybearz commented 7 years ago

running inside the latest docker tensorflow:

docker run -it -p 8888:8888 tensorflow/tensorflow

`

root@f54905c5bdaf:/notebooks/Faster-RCNN_TF# python ./tools/demo.py --model /VGGnet_fast_rcnn_iter_70000.ckpt Traceback (most recent call last): File "./tools/demo.py", line 11, in from networks.factory import get_network File "/notebooks/Faster-RCNN_TF/tools/../lib/networks/init.py", line 8, in from .VGGnet_train import VGGnet_train File "/notebooks/Faster-RCNN_TF/tools/../lib/networks/VGGnet_train.py", line 2, in from networks.network import Network File "/notebooks/Faster-RCNN_TF/tools/../lib/networks/network.py", line 3, in import roi_pooling_layer.roi_pooling_op as roi_pool_op File "/notebooks/Faster-RCNN_TF/tools/../lib/roi_pooling_layer/roi_pooling_op.py", line 5, in _roi_pooling_module = tf.load_op_library(filename) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/load_library.py", line 63, in load_op_library raise errors._make_specific_exception(None, None, error_msg, error_code) tensorflow.python.framework.errors.NotFoundError: /notebooks/Faster-RCNN_TF/tools/../lib/roi_pooling_layer/roi_pooling.so: undefined symbol: _Z22ROIPoolBackwardLaucherPKffiiiiiiiS0_PfPKiRKN5Eigen9GpuDeviceE

root@f54905c5bdaf:/notebooks/Faster-RCNN_TF# nm -gC lib/roi_pooling_layer/roi_pooling.so |grep GpuDevice U ROIPoolForwardLaucher(float const, float, int, int, int, int, int, int, float const, float, int, Eigen::GpuDevice const&) U ROIPoolBackwardLaucher(float const, float, int, int, int, int, int, int, int, float const, float, int const, Eigen::GpuDevice const&) U Eigen::GpuDevice const& tensorflow::OpKernelContext::eigen_device() const

`

teddybearz commented 7 years ago

to reproduce (after download VGGnet_fast_rcnn_iter_70000.ckpt to ~/):

` docker run -v ~/VGGnet_fast_rcnn_iter_70000.ckpt:/VGGnet_fast_rcnn_iter_70000.ckpt -it -p 8888:8888 tensorflow/tensorflow bash

sudo apt-get update sudo apt-get install -y git sudo apt-get install -y python-opencv sudo apt-get install -y python-tk

pip install cython pip install easydict pip install image

sudo ln /dev/null /dev/raw1394

git clone --recursive https://github.com/smallcorgi/Faster-RCNN_TF.git

cd Faster-RCNN_TF/lib make cd .. python ./tools/demo.py --model /VGGnet_fast_rcnn_iter_70000.ckpt

`

tyyyang commented 7 years ago

I also encounter the same problem.

donnyyou commented 7 years ago

I have encountered the same fault, too. And I wonder the solution to this problem. Thanks!

jacobunderlinebenseal commented 7 years ago

me too

jaig commented 7 years ago

I am facing the similar problem when I start to train it on CPU or run a demo. Solution for this ?

nsivaramakrishnan commented 7 years ago

Hi, I am getting the same error while trying to run demo.py: tensorflow.python.framework.errors_impl.NotFoundError: /home/fmc/rcnn/Faster-RCNN_TF/tools/../lib/roi_pooling_layer/roi_pooling.so: undefined symbol: _Z22ROIPoolBackwardLaucherPKffiiiiiiiS0_PfPKiRKN5Eigen9GpuDeviceE I added "-D_GLIBCXX_USE_CXX11_ABI=0" in make,sh. I use g++ version 5.4.0 and TF V0.12. Btw, am trying to run this on CPU. Any help is highly appreciated. -Siva

jaig commented 7 years ago

Can we train this model using CPU itself?

oplkqingy commented 7 years ago

I meet similar issue in ubuntu16.04 with g++ version 5.4.0 and TF v0.12.Befor add "-D_GLIBCXX_USE_CXX11_ABI=0" in make.sh, show "_ZN10tensorflow7strings6StrCatB5cxx11ERKNS0_8AlphaNumE" when run the demo, and after add ,show "_Z22ROIPoolBackwardLaucherPKffiiiiiiiS0_PfPKiRKN5Eigen9GpuDeviceE" when run the demo.

I have'nt GPU,How can I run the demo in CPU-noly mode?

raviv commented 7 years ago

Having the same problem (_Z22ROIPoolBackwardLaucherPKffiiiiiiiS0_PfPKiRKN5Eigen9GpuDeviceE) when trying to train on CPU. Adding "-D_GLIBCXX_USE_CXX11_ABI=0" to the g++ command in make.sh and re-making didn't help. Thanks.

civilman628 commented 7 years ago
g++ -std=c++11 -shared -o roi_pooling.so roi_pooling_op.cc \
    roi_pooling_op.cu.o -I $TF_INC  -D GOOGLE_CUDA=1 -fPIC $CXXFLAGS  -D_GLIBCXX_USE_CXX11_ABI=0 \
    -lcudart -L $CUDA_PATH/lib64
DiegoGLagash commented 7 years ago

same problem here.

pbarker commented 7 years ago

same problem here as well

EunmiKang commented 7 years ago

me too :(

andresrommier commented 7 years ago

Had to modify the make.sh file to change the GPU architecture to match mine (sm_61), then had to change the Cuda path (in Arch linux is /opt/cuda).

wxwang0601 commented 7 years ago

same problem! Befor add "-D_GLIBCXX_USE_CXX11_ABI=0" in make.sh, show "_ZN10tensorflow7strings6StrCatB5cxx11ERKNS0_8AlphaNumE" when run the demo, and after add ,show "_Z22ROIPoolBackwardLaucherPKffiiiiiiiS0_PfPKiRKN5Eigen9GpuDeviceE" when run the demo.

@googleios @raviv have u solve the problem?

louisquinn commented 7 years ago

Hi all, I've figured out a workaround to use only the CPU. I have only tested this method for the demo script, not sure if it will work for training, but it should.

Download and Install CUDA: https://developer.nvidia.com/cuda-downloads

Compile for GPU OR Copy my .so You can download my .so file from here: https://drive.google.com/open?id=0B-0d5quIGY5XVEJvYU9XRkVJTWM Or you can run make.sh and compile with CUDA (not sure if this will work)

Include these lines of code at the top of your Python scripts import os os.environ['CUDA_VISIBLE_DEVICES'] = ''

guotong1988 commented 7 years ago

I succeed to run another faster-rcnn on CPU from this repo

shinyke commented 7 years ago

@louisquinn I succeed with your method. thx~

jhcruvinel commented 7 years ago

@louisquinn, I would like to know how you managed to install and run the example you mentioned without a GPU

jhcruvinel commented 7 years ago

@guotong1988, tf-faster-rcnn requires GPU. How you managed to install without a GPU

jhcruvinel commented 7 years ago

@louisquinn, I was able to reproduce your script. It worked.

BrahimMefgouda commented 7 years ago

How you managed to install without a GPU ?

jhcruvinel commented 7 years ago

I installed the CUDA driver, although the machine does not have the card. Then I set it to use CPU only. It worked!

liydxl commented 7 years ago

@louisquinn, hi, I add " import os os.environ['CUDA_VISIBLE_DEVICES'] = ''" " to file "demo.py" and "_init_paths.py" and "setup.py". But it seems do not work , the error message is "RuntimeError: Invalid DISPLAY variable". Which file should "os.environ['CUDA_VISIBLE_DEVICES'] = ''" " be add to?

sidak commented 7 years ago

The method of installing Cuda mentioned by @louisquinn works for me! Thanks! :smile:

louisquinn commented 7 years ago

@xiaoqo Apologies for the late reply! You should add the line to "demo.py", however it MUST be before the Tensorflow session is created, so before line 112.

Also, you guys will be interested in this: https://github.com/tensorflow/models/tree/master/object_detection Official API for deep learning object detection with various state of the art models and frameworks, no more VGG16! It's really easy to use. If you install Tensorflow for CPU it will run out of the box, however if you installed for GPU and wish to run CPU only, you will have to use the same method I mentioned in this thread.

sunzj commented 6 years ago

Hi

i find the root causes of the issue. when use CPU only mode without installing cude , library roi_pooling.so compile function "ROIPoolBackwardLaucher" into it.However, the function is implemented in cuda related module and only for GPU.So when execute demo, can't find the implement of function ROIPoolBackwardLaucher,crash happen.

i prepare a patch for that issue, and verified the issue is gone after applying the patch. when i try to push the patch, i find there was a patch there but isn't merged:

you can refer to: https://github.com/smallcorgi/Faster-RCNN_TF/pull/183/commits/0dcb55cebeaa85c9f0a46ff62384bbeaae98323e

or use my patch: https://drive.google.com/file/d/0BxlQuWrSazOxd29PNjVIenZneHM/view?usp=sharing

Best wishes! Zhuojin

lfc87 commented 6 years ago

@louisquinn i did following:

  1. installed cuda
  2. downloaded your .so file and replaced it here Faster-RCNN_TF/lib/roi_pooling_layer
  3. these two rows i’ve pasted to Faster-RCNN_TF/lib in setup.py import os os.environ['CUDA_VISIBLE_DEVICES'] = ''
  4. now i do make and receive an error

`python setup.py build_ext --inplace running build_ext skipping 'utils/bbox.c' Cython extension (up-to-date) skipping 'utils/nms.c' Cython extension (up-to-date) skipping 'nms/cpu_nms.c' Cython extension (up-to-date) skipping 'nms/gpu_nms.cpp' Cython extension (up-to-date) rm -rf build bash make.sh /home/liverpool/.local/lib/python3.5/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1254): warning: calling a constexpr host function("real") from a host device function("abs") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/liverpool/.local/lib/python3.5/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1254): warning: calling a constexpr host function("imag") from a host device function("abs") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/liverpool/.local/lib/python3.5/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1254): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/liverpool/.local/lib/python3.5/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1254): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/liverpool/.local/lib/python3.5/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1259): warning: calling a constexpr host function("real") from a host device function("abs") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/liverpool/.local/lib/python3.5/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1259): warning: calling a constexpr host function("imag") from a host device function("abs") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/liverpool/.local/lib/python3.5/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1259): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/liverpool/.local/lib/python3.5/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1259): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/liverpool/.local/lib/python3.5/site-packages/tensorflow/include/unsupported/Eigen/CXX11/src/Tensor/TensorRandom.h(133): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/liverpool/.local/lib/python3.5/site-packages/tensorflow/include/unsupported/Eigen/CXX11/src/Tensor/TensorRandom.h(138): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/liverpool/.local/lib/python3.5/site-packages/tensorflow/include/unsupported/Eigen/CXX11/src/Tensor/TensorRandom.h(212): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/liverpool/.local/lib/python3.5/site-packages/tensorflow/include/unsupported/Eigen/CXX11/src/Tensor/TensorRandom.h(217): warning: calling a constexpr host function from a host device function is not all`

And if i run sudo make, i receive following:

`

  1. python setup.py build_ext --inplace
  2. running build_ext
  3. skipping 'utils/bbox.c' Cython extension (up-to-date)
  4. skipping 'utils/nms.c' Cython extension (up-to-date)
  5. skipping 'nms/cpu_nms.c' Cython extension (up-to-date)
  6. skipping 'nms/gpu_nms.cpp' Cython extension (up-to-date)
  7. rm -rf build
  8. bash make.sh
  9. Traceback (most recent call last):
  10. File "", line 1, in
  11. ImportError: No module named tensorflow
  12. make.sh: line 13: nvcc: command not found
  13. g++: error: GOOGLE_CUDA=1: No such file or directory `

Can anyone help me with that?

Kind Regards Igor

liuqi05 commented 6 years ago

@louisquinn , Hi, i follow your advices, and i copy your roi_pooling.so fie to my repo. And modify demo.py file to add os.environ['CUDA_VISIBLE_DEVICES'] = ''. Then i run the demo, but it display: Traceback (most recent call last): File "./tools/demo.py", line 11, in from networks.factory import get_network File "/home/joseph/test/Faster-RCNN_TF/tools/../lib/networks/init.py", line 8, in from .VGGnet_train import VGGnet_train File "/home/joseph/test/Faster-RCNN_TF/tools/../lib/networks/VGGnet_train.py", line 2, in from networks.network import Network File "/home/joseph/test/Faster-RCNN_TF/tools/../lib/networks/network.py", line 3, in import roi_pooling_layer.roi_pooling_op as roi_pool_op File "/home/joseph/test/Faster-RCNN_TF/tools/../lib/roi_pooling_layer/roi_pooling_op.py", line 5, in _roi_pooling_module = tf.load_op_library(filename) File "/home/joseph/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/load_library.py", line 64, in load_op_library None, None, error_msg, error_code) tensorflow.python.framework.errors_impl.NotFoundError: libcudart.so.8.0: cannot open shared object file: No such file or directory. Btw, am trying to run this on CPU and my computer has no GPU. So can you give me some advice about this error? Thank you in advance.

louisquinn commented 6 years ago

@liuqi05 It looks like you didn't install CUDA 8.0 and CuDNN 5.1. For this method to work, you have to set your system up as though you do have a GPU. Replacing the roi_pooling.so file is just so you don't have to compile it yourself.

I would like to refer you to the official Tensorflow Object Detection API: https://github.com/tensorflow/models/tree/master/object_detection All you need to do is add the os.environ['CUDA_VISIBLE_DEVICES'] = '' line to run on CPU with this framework

liuqi05 commented 6 years ago

@louisquinn, thank you for your quick reply. But i want to know which file i should add the os.environ['CUDA_VISIBLE_DEVICES'] = '' line to run on CPU with the framework you suggest. train.py and eval.py files ?

louisquinn commented 6 years ago

For the official framework: If you installed Tensorflow without GPU support and you don't have a GPU, it will automatically process on the CPU.

If you have a GPU and installed with GPU support you will have to add the os.environ line.
If you add the os.environ line it should be defined at any point before you define your tf.Session

liuqi05 commented 6 years ago

@louisquinn, Now i understand. I do not need add the line to files. Because i installed Tensorflow without GPU support. Thank you for your patience. Now i am trying to run locally step by step. When i encounter problem, may be i need your help again. And thank you again.

louisquinn commented 6 years ago

@liuqi05 No worries! I recommend starting with one of the pre-trained models to learn how the framework works. You can email me direct at louisquinn.contact@gmail.com

liuqi05 commented 6 years ago

@louisquinn, Thank you very much. I will send mail to you.

dongdongrj commented 6 years ago

Hi all, I want know if the anaconda3 and python3.6 can be run the project? In my environment the error log report as below: ModuleNotFoundError: No module named 'easydict' (tensorflow) dongdong@ubuntu:~/ai/tensorflow/Faster-RCNN_TF$ conda install -c https://conda.anaconda.org/auto easydict Fetching package metadata ............. Solving package specifications: .

UnsatisfiableError: The following specifications were found to be in conflict:

Thanks!

dongdongrj commented 6 years ago

@louisquinn Hi , I want know if the anaconda3 and python3.6 can be run the project? In my environment the error log report as below: ModuleNotFoundError: No module named 'easydict' (tensorflow) dongdong@ubuntu:~/ai/tensorflow/Faster-RCNN_TF$ conda install -c https://conda.anaconda.org/auto easydict Fetching package metadata ............. Solving package specifications: .

UnsatisfiableError: The following specifications were found to be in conflict:

easydict -> python 2.7 -> openssl 1.0.1 python 3.6* Use "conda info " to see the dependencies for each package. Thanks!

Nofcity commented 6 years ago

@jhcruvinel ,I have no NVIDIA's card ,but i run make.sh and compile with CUDA, installed the CUDA driver ,when i do "python demo.py --cpu --model /Faster-RCNN_TF-master/input_model/VGGnet_fast_rcnn_iter_70000.ckpt".The result is this :Loaded network /Faster-RCNN_TF-master/input_model/VGGnet_fast_rcnn_iter_70000.ckpt NVIDIA: no NVIDIA devices found unknown error so what should i do?thanks!~~~