linkinpark213 / linkinpark213.github.io

Blog of linkinpark213
https://linkinpark213.com
1 stars 1 forks source link

[MineSweeping] The Long Struggle of DensePose Installation | Linkin213's Park #12

Open linkinpark213 opened 5 years ago

linkinpark213 commented 5 years ago

https://linkinpark213.com/2018/11/18/densepose-minesweeping/#more

DensePose is a great work in real-time human pose estimation, which is based on Caffe2 and Detectron framework. It extracts dense human body 3D surface based on RGB images. The installation instructions are provided here. During my installation proce

Johnqczhang commented 5 years ago

Hi, I build the op successfully, but I still got stuck in the last step when running test_zero_even_op.py. I adopted your solution but it didn't work (TAT~~~). BTW, I didn't install caffe2 directly from source but only installed pytorch-1.0 from conda, since I found that caffe2 has been integrated into the latest stable pytorch. Before installing DensePose, I installed Detectron and run the demo both successfully. I add libprotobuf.a from my anaconda installation in the CMakeLists.txt. I also tried reinstalling protobuf-3.5.1 from google and compiling it myself, but the same error occurred as that in 2.9. So, can you find another solution or the reason why this depressing error occurs?

Johnqczhang commented 5 years ago

Hi, I build the op successfully, but I still got stuck in the last step when running test_zero_even_op.py. I adopted your solution but it didn't work (TAT~~~). BTW, I didn't install caffe2 directly from source but only installed pytorch-1.0 from conda, since I found that caffe2 has been integrated into the latest stable pytorch. Before installing DensePose, I installed Detectron and run the demo both successfully. I add libprotobuf.a from my anaconda installation in the CMakeLists.txt. I also tried reinstalling protobuf-3.5.1 from google and compiling it myself, but the same error occurred as that in 2.9. So, can you find another solution or the reason why this depressing error occurs?

Sorry to bother you, I found the problem is that I forgot to replace the new built libraries with the old one because previously the test program searches these libraries from my $DETECTRON/build rather than $DENSEPOSE/build which is weird. So, after replacing the new one, no error occurs. A better solution to avoid this problem is that one can put $DENSEPOSE/build in your $PYTHONPATH environment variable and activate this change before running the test program.

Thanks again for your very helpful blog which saves my time a lot!

linkinpark213 commented 5 years ago

@Johnqczhang LOL, I was trying hard to think what the cause could be, when you finally made it. Congratulations!

Johnqczhang commented 5 years ago

However, even though I run test_zero_even_op.py without any error occurred like aforementioned, it seems that the program stuck when it loads the compiled library and I can't get any output to make sure all is well. More specifically, I locate the problem in c2_utils.import_custom_ops() and with deeper inspection, I found that the program cannot go forward til into this function in caffe2/python/dyndep.py:

def _init_impl(path):
    # path: /absolute/path/to/my/built/libcaffe2_detectron_custom_ops_gpu.so
    _IMPORTED_DYNDEPS.add(path)
    # extension_loader is imported from caffe2.python
    with extension_loader.DlopenGuard():
        """Create an instance of this class represents a loaded dll/shared library, 
           exporting functions using the standard C calling convention
        """
        ctypes.CDLL(path)  # this is the line which cause my program stuck 
    # reinitialize available ops
    core.RefreshRegisteredOperators()

I cannot go into deeper because it involves a function called dlopen from the system, but I cannot find its source code. Here's the configurations of my environment:

System: CentOS 7 Python: 3.6 (from Anaconda) GPU: Tesla P100 CUDA: 9.0

Can you show me what's your output after running this test program? Thanks in advance.

linkinpark213 commented 5 years ago

@Johnqczhang Sure. By running python detectron/tests/test_zero_even_op.py, I got the output below.

(caffe2py27)  ~/Source/densepose $ python detectron/tests/test_zero_even_op.py
[E init_intrinsics_check.cc:43] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
[E init_intrinsics_check.cc:43] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
[E init_intrinsics_check.cc:43] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
............
----------------------------------------------------------------------
Ran 12 tests in 1.599s

OK

Please ignore the warnings. The test program ended after printing OK.

Although this may not be why you're stuck here, but DensePose requires Python 2.7 environment. Would you consider creating a new conda env with Python 2.7?

Johnqczhang commented 5 years ago

@Johnqczhang Sure. By running python detectron/tests/test_zero_even_op.py, I got the output below.

(caffe2py27)  ~/Source/densepose $ python detectron/tests/test_zero_even_op.py
[E init_intrinsics_check.cc:43] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
[E init_intrinsics_check.cc:43] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
[E init_intrinsics_check.cc:43] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
............
----------------------------------------------------------------------
Ran 12 tests in 1.599s

OK

Please ignore the warnings. The test program ended after printing OK.

Although this may not be why you're stuck here, but DensePose requires Python 2.7 environment. Would you consider creating a new conda env with Python 2.7?

I created a new conda env with python 2.7 and followed all steps. The compilation of the custom operator was fine. But when I run the test program in which it imports this build .so lib, I met this error:

libcaffe2_detectron_custom_ops_gpu.so: undefined symbol: _ZN6caffe219CPUOperatorRegistryB5cxx11Ev

So, any idea of how to solve this problem?

linkinpark213 commented 5 years ago

As far as I have experienced, undefined symbol is caused by the incompatibility between the build environment and the runtime environment. This could be the default compiler, protobuf, or any library. What I'm most suspicious about is protobuf, because it's not rare to have multiple versions of protobuf installed, while only one (may not even be the system default one) is chosen by the program. And it's possible that another version of protobuf was installed when you create your new conda env.

Try whereis protoc under your new conda env and see what happens?

Johnqczhang commented 5 years ago

suspicious

I'm quite sure that I have only one version of protobuf installed. I installed protobuf-3.5.0 from source because this version is specified in the head of the file /my-conda-env/lib/python2.7/site-packages/torch/lib/include/caffe2/proto/caffe2.pb.h. I tried whereis protoc and it showed that the path which it is installed by myself.

Johnqczhang commented 5 years ago

@Johnqczhang Sure. By running python detectron/tests/test_zero_even_op.py, I got the output below.

(caffe2py27)  ~/Source/densepose $ python detectron/tests/test_zero_even_op.py
[E init_intrinsics_check.cc:43] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
[E init_intrinsics_check.cc:43] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
[E init_intrinsics_check.cc:43] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
............
----------------------------------------------------------------------
Ran 12 tests in 1.599s

OK

Please ignore the warnings. The test program ended after printing OK. Although this may not be why you're stuck here, but DensePose requires Python 2.7 environment. Would you consider creating a new conda env with Python 2.7?

I created a new conda env with python 2.7 and followed all steps. The compilation of the custom operator was fine. But when I run the test program in which it imports this build .so lib, I met this error:

libcaffe2_detectron_custom_ops_gpu.so: undefined symbol: _ZN6caffe219CPUOperatorRegistryB5cxx11Ev

So, any idea of how to solve this problem?

Finally, I found the reason which causes this problem and solved it successfully (TAT)~~~.

Firstly, the error message undefined symbol: _ZN6caffe219CPUOperatorRegistryB5cxx11Ev gave me some intuitive information which made me find that it couldn't find this symbol when linking dependent libraries. Then, I checked my compiled ops library by using ldd -r libcaffe2_detectron_custom_ops.so and found that there were several undefined symbols showing similar names. I found that these symbols were declared in some header files of caffe2 and c10 modules, so they should be compiled into the libcaffe2.so by PyTorch official before. By executing strings -a libcaffe2.so | grep _ZN6caffe219CPUOperator, it does show the corresponding symbol however without B5cxx11Ev in the end of the symbol name. So, this is why the error occurred when importing the compiled ops library.

By searching this specific string B5cxx11Ev, I found that it was appended in the symbol because set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11 -O2 -fPIC -Wno-narrowing") specified in the CMakeLists.txt (which means the compilation uses C++11 standard) when compiling the ops and my gcc/g++ version is 5.4.0. Since my caffe2 was installed from conda install pytorch, so I checked /path/to/my/env/lib/python2.7/site-packages/torch/lib/libcaffe2.so by executing strings -a libcaffe2.so | grep "GCC: (" and I found that the caffe2 were pre-compiled with GCC 4.9.2. So, this is the essential reason why I could compile the ops without any error but always couldn't pass the test program with annoying "undefined symbol" error!

After compiling and installing gcc-4.9.2 from source (because I couldn't find this version of gcc from conda) and re-compiling the ops, I can finally pass all tests in test_zero_even_op.py without any error.

What a long long installation of DensePose! orz~~~

linkinpark213 commented 5 years ago

@Johnqczhang Good job! Would you mind if I added this case to the blog post? It will be really helpful.

Johnqczhang commented 5 years ago

@Johnqczhang Good job! Would you mind if I added this case to the blog post? It will be really helpful.

Sure~ In fact, I'm also going to summarize my bumpy traveling of this installation and post it which may be helpful to others.

Johnqczhang commented 5 years ago

So, I create a repository densepose_installation providing step-by-step instructions to install DensePose in less time and with less pain, for people who don't want to build the entire pytorch or caffe2 project from source code.

yzhq97 commented 5 years ago

Dude you are awesome!

icicle4 commented 5 years ago

Hello.I get a error when test detectron/tests/test_zero_even_op.py. OSError: "undefined symbol: _ZN6caffe219CPUOperatorRegistryB5cxx11Ev"

As mentioned in this link, it seems caused by different gcc version with conda installed caffe2 and make ops.

But my problem seems different: my caffe2 is build from source. so the gcc version should be same .'

yzhq97 commented 5 years ago

Hello.I get a error when test detectron/tests/test_zero_even_op.py. OSError: "undefined symbol: _ZN6caffe219CPUOperatorRegistryB5cxx11Ev"

As mentioned in this link, it seems caused by different gcc version with conda installed caffe2 and make ops.

But my problem seems different: my caffe2 is build from source. so the gcc version should be same .'

Did you clone Detectron and use Detectron/detectron to replace DensePose/detectron? I recently found out that this causes problems. The DensePose/detectron should work just fine.

icicle4 commented 5 years ago

Hello.I get a error when test detectron/tests/test_zero_even_op.py. OSError: "undefined symbol: _ZN6caffe219CPUOperatorRegistryB5cxx11Ev" As mentioned in this link, it seems caused by different gcc version with conda installed caffe2 and make ops. But my problem seems different: my caffe2 is build from source. so the gcc version should be same .'

Did you clone Detectron and use Detectron/detectron to replace DensePose/detectron? I recently found out that this causes problems. The DensePose/detectron should work just fine.

Thanks for your answer. I do not replace. And I thought this problem may caused by wrong gcc or g++ version due to the CmakeLists.txt not special gcc/g++ version.So I reinstall Caffe2 and make ops using g++7 and gcc7 solved it.

Johnqczhang commented 5 years ago

Hello.I get a error when test detectron/tests/test_zero_even_op.py. OSError: "undefined symbol: _ZN6caffe219CPUOperatorRegistryB5cxx11Ev" As mentioned in this link, it seems caused by different gcc version with conda installed caffe2 and make ops. But my problem seems different: my caffe2 is build from source. so the gcc version should be same .'

Did you clone Detectron and use Detectron/detectron to replace DensePose/detectron? I recently found out that this causes problems. The DensePose/detectron should work just fine.

Thanks for your answer. I do not replace. And I thought this problem may caused by wrong gcc or g++ version due to the CmakeLists.txt not special gcc/g++ version.So I reinstall Caffe2 and make ops using g++7 and gcc7 solved it.

So, in the end, it seems that your problem is still caused by using different versions of gcc between installing caffe2 and compiling custom ops. Keeping the same gcc in all installation settings is very important:joy:

sudiptabiswas22 commented 5 years ago

HI, I have passed the caffe2 installation test as well as the $DENSEPOSE/detectron/tests/test_spatial_narrow_as_op.py test. However for testing $DESNEPOSE/detectron/tests/test_zero_even_op.py, I get the following error:

Screenshot from 2019-04-02 20-20-56

Kindly help me resolve this issue. Thanks in advance :)

@Johnqczhang @linkinpark213 Any help will be highly appreciated.

Johnqczhang commented 5 years ago

@sudiptabiswas22 How did you install caffe2? For this error, my guess is that the c10 module is missed when compiling the custom ops.

sudiptabiswas22 commented 5 years ago

@Johnqczhang I built the caffe2 the exact same way as mentioned in your https://github.com/Johnqczhang/densepose_installation. How do I verify if c10 module is missed?

Johnqczhang commented 5 years ago

@sudiptabiswas22 Check this issue to see if it can solve your problem.

sudiptabiswas22 commented 5 years ago

Hey @Johnqczhang, if you look at the screenshot, the caffe2 package is called upon from the "/usr/local/lib/python2.7/dist-packages" instead of the virtual env scope: '' anaconda2/envs/bodypose2/lib/python2.7/site-packages ''. Is that the source of the problem?

Johnqczhang commented 5 years ago

@sudiptabiswas22 I think you're right. Can you run the test program without sudo to see what will happen?

Kevinstiff commented 4 years ago

I encountered a problem while make ops, Main error message: /usr/bin/ld: cannot find -lcaffe2_library /usr/bin/ld: cannot find -lcaffe2_gpu_library

(caffe2) kevin@ubuntu:~/project/DensePose$ make ops mkdir -p build && cd build && cmake .. && make -j8 -- Caffe2: CUDA detected: 10.1 -- Caffe2: CUDA nvcc is: /usr/local/cuda-10.1/bin/nvcc -- Caffe2: CUDA toolkit directory: /usr/local/cuda-10.1 -- Caffe2: Header version is: 10.1 -- Found cuDNN: v7.6.5 (include: /usr/local/cuda-10.1/include, library: /usr/local/cuda-10.1/lib64/libcudnn.so) -- Autodetected CUDA architecture(s): 7.5 7.5 -- Added CUDA NVCC flags for: -gencode;arch=compute_75,code=sm_75 -- Summary: -- CMake version : 3.5.1 -- CMake command : /usr/bin/cmake -- System name : Linux -- C++ compiler : /usr/bin/c++ -- C++ compiler version : 5.4.0 -- CXX flags : -std=c++14 -O2 -fPIC -Wno-narrowing -- Caffe2 version : 1.4.0 -- Caffe2 include path : /home/kevin/anaconda3/envs/caffe2/lib/python2.7/site-packages/torch/include -- Caffe2 found CUDA : True -- CUDA version : 10.1 -- CuDNN version : 7.6.5 -- Configuring done -- Generating done -- Build files have been written to: /home/kevin/project/DensePose/build make[1]: Entering directory '/home/kevin/project/DensePose/build' make[2]: Entering directory '/home/kevin/project/DensePose/build' make[3]: Entering directory '/home/kevin/project/DensePose/build' make[3]: Leaving directory '/home/kevin/project/DensePose/build' make[3]: Entering directory '/home/kevin/project/DensePose/build' make[3]: Entering directory '/home/kevin/project/DensePose/build' [ 12%] Linking CXX shared library libcaffe2_detectron_custom_ops.so make[3]: Leaving directory '/home/kevin/project/DensePose/build' make[3]: Entering directory '/home/kevin/project/DensePose/build' /usr/bin/ld: cannot find -lcaffe2_library [ 25%] Linking CXX shared library libcaffe2_detectron_custom_ops_gpu.so collect2: error: ld returned 1 exit status CMakeFiles/caffe2_detectron_custom_ops.dir/build.make:120: recipe for target 'libcaffe2_detectron_custom_ops.so' failed make[3]: [libcaffe2_detectron_custom_ops.so] Error 1 make[3]: Leaving directory '/home/kevin/project/DensePose/build' CMakeFiles/Makefile2:104: recipe for target 'CMakeFiles/caffe2_detectron_custom_ops.dir/all' failed make[2]: [CMakeFiles/caffe2_detectron_custom_ops.dir/all] Error 2 make[2]: Waiting for unfinished jobs.... /usr/bin/ld: cannot find -lcaffe2_gpu_library collect2: error: ld returned 1 exit status CMakeFiles/caffe2_detectron_custom_ops_gpu.dir/build.make:1335: recipe for target 'libcaffe2_detectron_custom_ops_gpu.so' failed make[3]: [libcaffe2_detectron_custom_ops_gpu.so] Error 1 make[3]: Leaving directory '/home/kevin/project/DensePose/build' CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/caffe2_detectron_custom_ops_gpu.dir/all' failed make[2]: [CMakeFiles/caffe2_detectron_custom_ops_gpu.dir/all] Error 2 make[2]: Leaving directory '/home/kevin/project/DensePose/build' Makefile:127: recipe for target 'all' failed make[1]: [all] Error 2 make[1]: Leaving directory '/home/kevin/project/DensePose/build' Makefile:13: recipe for target 'ops' failed make: *** [ops] Error 2

linkinpark213 commented 4 years ago

@Kevinstiff Hi! I'm sorry for my late reply. In another issue of DensePose I found that compiling the PyTorch code later than version 1.2 would cause you not to have libcaffe2.so or libcaffe2_gpu.so in the compiled lib (I guess it's that PyTorch has stopped support for Caffe2 compatibility). And it looks like you were compiling the latest version of PyTorch code?

Before you try anything, please check if libcaffe2.so and libcaffe2_gpu.so exist in your /path/to/pytorch/torch/lib/. If they do, then you'll only need to add /path/to/pytorch/torch/lib/ to your LD_LIBRARY_PATH by executing

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/pytorch/torch/lib/

If these files don't exist, I suggest rolling back to PyTorch version 1.1 and compiling it.

williamdwl commented 4 years ago

thanks a lot!

Kevinstiff commented 4 years ago

@linkinpark213 First, thanks a lot! Yes! I use PyTorch version 1.3.1,These files really don't exist!I rolling back to PyTorch version 1.1.0 and make ops success!But the method you recommended is incorrectly connected, I use is the conda method conda install pytorch=1.1 ;

Second,After reading your blog, I am readying to use Detectron2 and Python3 to build DensePose. Do you have any suggestions?

linkinpark213 commented 4 years ago

@Kevinstiff Good to hear that!

A new version of DensePose is now a feature of Detectron2, and getting started with them doesn't require compiling a whole lot of things. You only need to install a few requirements using conda or pip, run the building script and start your exploration. The INSTALL.md is all you need :D

Kevinstiff commented 4 years ago

@linkinpark213 Thank you for your suggestion!

I have questions about the old and new versions of the DensePose project. How do I configure them correctly? Do you clone Detectron2 to replace the Detectron files in DensePose?

linkinpark213 commented 4 years ago

@Kevinstiff Hi, you don't need to configure them or preplace anything at all! The old DensePose can be deleted.

After installing Detectron2 (following its INSTALL.md), go to the detectron2/projects/DensePose directory, and simply follow the GETTING_STARTED.md guide of new DensePose (I think perhaps FAIR should have named it DensePose2?).

Or speaking in code,

# Forget about the old version
rm DensePose -rf

# Start from a new beginning
git clone https://github.com/facebookresearch/detectron2.git
cd detectron2
python setup.py build develop

# Find the new DensePose project in Detectron2
cd projects/DensePose

# Grab a pre-trained ResNet-50 model from its model zoo
wget https://dl.fbaipublicfiles.com/densepose/densepose_rcnn_R_50_FPN_s1x/143908701/model_final_dd99d2.pkl

# Run inference
python apply_net.py show configs/densepose_rcnn_R_50_FPN_s1x.yaml model_final_dd99d2.pkl image.jpg dp_contour,bbox --output image_densepose_contour.png
Kevinstiff commented 4 years ago

@linkinpark213 It's really an honor for you to continue to reply! I am a novice, and I want to ask you another question. If I want to use a video to do inference, what should I do?

linkinpark213 commented 4 years ago

@Kevinstiff That might be a longer story to tell... What about we talk about this through wechat? My wechat ID is linkinpark213 too.

MikePelton commented 4 years ago

You are a hero for documenting this so extensively - I wish more people did the same for other projects! Many thanks! Am also a Linkin Park fan :-)