zengarden / light_head_rcnn

Light-Head R-CNN
834 stars 222 forks source link

compile lib_kernel/lib_fast_nms/fast_nms using GPU V100 #35

Closed emedinac closed 6 years ago

emedinac commented 6 years ago

Hi, I tried to compile this code using two GPUs V100 using sm_70 and I'm getting this warning during compiling and this error when I run the test.py:

/usr/local/cuda-9.0/bin/../targets/x86_64-linux/include/sm_30_intrinsics.hpp(213): here was declared deprecated ("__shfl_down() is not valid on compute_70 and above, and should be replaced with __shfl_down_sync().To continue using __shfl_down(), specify virtual architecture compute_60 when targeting sm_70 and above, for example, using the pair of compiler options: -arch=compute_60 -code=sm_70.")
NotFoundError: /home/edgar/light_head_rcnn/lib/lib_kernel/lib_fast_nms/fast_nms.so: undefined symbol: _ZN10tensorflow7strings6StrCatERKNS0_8AlphaNumE

Also, when I use -arch=compute_60 -code=sm_70, I got this warning during compiling and the same error when I run the test.py:

/usr/local/cuda/bin/../targets/x86_64-linux/include/sm_30_intrinsics.hpp(213): here was declared deprecated ("__shfl_down() is deprecated in favor of __shfl_down_sync() and may be removed in a future release (Use -Wno-deprecated-declarations to suppress this warning).")

The lines to be compiled are:

CUDA_PATH=/usr/local/cuda-9.0/
nvcc -std=c++11 -c -o nms_op.cu.o nms_op.cu.cc \
    -I $TF_INC -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -arch=compute_60 -code=sm_70 --expt-relaxed-constexpr -Wno-deprecated-declarations
bl0 commented 6 years ago

After a long time of debugging, I find the solution: Edit the file /src/detection/lib/lib_kernel/lib_fast_nms/make.sh and replace -D_GLIBCXX_USE_CXX11_ABI=0 to -D_GLIBCXX_USE_CXX11_ABI=1 and recompile, the annoying problem disappears.

g++ -std=c++11 -shared -D_GLIBCXX_USE_CXX11_ABI=1 -o fast_nms.so nms_op.cc \
        nms_op.cu.o -I $TF_INC -fPIC -lcudart -L $CUDA_PATH/lib64 -L$TF_LIB -ltensorflow_framework -I$TF_INC/external/nsync/public
emedinac commented 6 years ago

it worked, thanks. Also, I recommend working in root mode, because I initially installed TF using Conda.

fay0505 commented 6 years ago

@bl0 hello, your solution worked, Thanks! Can you account for it?

bl0 commented 6 years ago

I just search on the issue page of Tensorflow. The following page may help: https://github.com/tensorflow/tensorflow/issues/20899#issuecomment-408264523

fay0505 commented 6 years ago

@bl0 Thanks!