pytorch / extension-cpp

C++ extensions in PyTorch
1.02k stars 214 forks source link

segmentation fault for pcl icp implementation in pytorch cpp extension #20

Closed onlytailei closed 6 years ago

onlytailei commented 6 years ago

I’m trying to build a cpp extension for point cloud iterative closest point using the icp function in pcl-1.7 http://pointclouds.org/documentation/tutorials/iterative_closest_point.php.

The data transforming from at::tensor to pcl::Pointcloud is fine. However, as soon as I declare a new icp object, there will be a segmentation fault.

image

I also tried to add more arguments to the CppExtension as https://github.com/strawlab/python-pcl/blob/master/setup.py. But it doesn’t help.

To repeat the bug, you can clone the related files from https://github.com/onlytailei/icp_extension. There should be pcl and eigen in the system

sudo apt-get install libpcl-all
sudo apt-get install libeigen3-dev

Then build the extension through:

python setup install.py

Comment/Uncomment this line in icp_op.cpp.

pcl::IterativeClosestPoint<pcl::PointXYZ, pcl::PointXYZ> icp;

And rebuild the extension, you will see the difference.

python icp_test.py
goldsborough commented 6 years ago

I'll check out the code, but I don't think this is a bug with C++ extensions per se. Did you run tests of your C++ code before binding it into Python, and made sure it was generally correct and does not segfault?

onlytailei commented 6 years ago

Yes. I tried the pure C++ example. There is no segfault. You can download this C++ example from here.

mkdir build
cd build
cmake ../
make 
./iterative_closest_point
goldsborough commented 6 years ago

https://github.com/onlytailei/icp_extension/blob/master/icp_op.cpp#L88 looks wrong to me. tensorFromBlob does not copy data. It only references the blob you give it. You have to call .clone() to deep copy the data.

Change

at::Tensor output = torch::CPU(at::kFloat).tensorFromBlob(output_array, {batch_size,p_cloud_size, 3});

to

at::Tensor output = torch::CPU(at::kFloat).tensorFromBlob(output_array, {batch_size,p_cloud_size, 3}).clone();

or even better, to

at::Tensor output = torch::from_blob(output_array, {batch_size, p_cloud_size, 3}).clone();
onlytailei commented 6 years ago

Thank you! And any idea about this line?

pcl::IterativeClosestPoint<pcl::PointXYZ, pcl::PointXYZ> icp;

As soon as you uncomment this one, the segfault happen. In the cpp_example, it is fine.

onlytailei commented 6 years ago

From the debug info, it seems that some attributes of icp cannot be released successfully through boost smart pointer. However, I have no idea why nothing is wrong in pure cpp example. Maybe there is some conflict between boost and torch.

0 0x00007fffba96fd12 in boost::detail::atomic_exchange_and_add (dv=-1, pw=0x656572546453) at /usr/include/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:50

1 boost::detail::sp_counted_base::release (this=0x65657254644b) at /usr/include/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:144

2 boost::detail::shared_count::~shared_count (this=0x13d4660, __in_chrg=) at /usr/include/boost/smart_ptr/detail/shared_count.hpp:443

3 boost::shared_ptr<pcl::KdTreeFLANN<pcl::PointXYZ, flann::L2_Simple > >::~shared_ptr (this=0x13d4658, __in_chrg=)

at /usr/include/boost/smart_ptr/shared_ptr.hpp:323

4 pcl::search::KdTree<pcl::PointXYZ, pcl::KdTreeFLANN<pcl::PointXYZ, flann::L2_Simple > >::~KdTree (this=0x13d4620, __in_chrg=)

at /usr/local/include/pcl-1.8/pcl/search/kdtree.h:99

5 pcl::search::KdTree<pcl::PointXYZ, pcl::KdTreeFLANN<pcl::PointXYZ, flann::L2_Simple > >::~KdTree (this=0x13d4620, __in_chrg=)

at /usr/local/include/pcl-1.8/pcl/search/kdtree.h:99

6 boost::checked_delete<pcl::search::KdTree<pcl::PointXYZ, pcl::KdTreeFLANN<pcl::PointXYZ, flann::L2_Simple > > > (x=0x13d4620)

at /usr/include/boost/core/checked_delete.hpp:34

7 boost::detail::sp_counted_impl_p<pcl::search::KdTree<pcl::PointXYZ, pcl::KdTreeFLANN<pcl::PointXYZ, flann::L2_Simple > > >::dispose (this=)

at /usr/include/boost/smart_ptr/detail/sp_counted_impl.hpp:78

8 0x00007fffba96b8fa in boost::detail::sp_counted_base::release (this=0x13d4570) at /usr/include/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:146

9 0x00007fffba976c5d in boost::detail::sp_counted_base::release (this=) at /usr/local/include/pcl-1.8/pcl/registration/correspondence_estimation.h:109

10 boost::detail::shared_count::~shared_count (this=0x13b3950, __in_chrg=) at /usr/include/boost/smart_ptr/detail/shared_count.hpp:443

11 boost::shared_ptr<pcl::search::KdTree<pcl::PointXYZ, pcl::KdTreeFLANN<pcl::PointXYZ, flann::L2_Simple > > >::~shared_ptr (this=0x13b3948,

__in_chrg=) at /usr/include/boost/smart_ptr/shared_ptr.hpp:323

12 pcl::registration::CorrespondenceEstimationBase<pcl::PointXYZ, pcl::PointXYZ, float>::~CorrespondenceEstimationBase (this=this@entry=0x13b3900,

__in_chrg=) at /usr/local/include/pcl-1.8/pcl/registration/correspondence_estimation.h:109

13 0x00007fffba976d10 in pcl::registration::CorrespondenceEstimation<pcl::PointXYZ, pcl::PointXYZ, float>::~CorrespondenceEstimation (this=0x13b3900,

__in_chrg=) at /usr/local/include/pcl-1.8/pcl/registration/correspondence_estimation.h:419

14 pcl::registration::CorrespondenceEstimation<pcl::PointXYZ, pcl::PointXYZ, float>::~CorrespondenceEstimation (this=0x13b3900, __in_chrg=)

at /usr/local/include/pcl-1.8/pcl/registration/correspondence_estimation.h:419

15 boost::checked_delete<pcl::registration::CorrespondenceEstimation<pcl::PointXYZ, pcl::PointXYZ, float> > (x=0x13b3900)

at /usr/include/boost/core/checked_delete.hpp:34

16 boost::detail::sp_counted_impl_p<pcl::registration::CorrespondenceEstimation<pcl::PointXYZ, pcl::PointXYZ, float> >::dispose (this=)

at /usr/include/boost/smart_ptr/detail/sp_counted_impl.hpp:78

17 0x00007fffba96b8fa in boost::detail::sp_counted_base::release (this=0x13d4590) at /usr/include/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:146

18 0x00007fffba973ed5 in boost::detail::sp_counted_base::release (this=) at /usr/include/boost/function/function_template.hpp:510

19 boost::detail::shared_count::~shared_count (this=0x13bfa50, __in_chrg=) at /usr/include/boost/smart_ptr/detail/shared_count.hpp:443

20 boost::shared_ptr<pcl::registration::CorrespondenceEstimationBase<pcl::PointXYZ, pcl::PointXYZ, float> >::~shared_ptr (this=0x13bfa48, __in_chrg=)

at /usr/include/boost/smart_ptr/shared_ptr.hpp:323

21 pcl::Registration<pcl::PointXYZ, pcl::PointXYZ, float>::~Registration (this=this@entry=0x13bf8c0, __in_chrg=)

at /usr/local/include/pcl-1.8/pcl/registration/registration.h:132

22 0x00007fffba973fc5 in pcl::IterativeClosestPoint<pcl::PointXYZ, pcl::PointXYZ, float>::~IterativeClosestPoint (this=0x13bf8c0, __in_chrg=)

at /usr/local/include/pcl-1.8/pcl/registration/icp.h:155

23 pcl::IterativeClosestPoint<pcl::PointXYZ, pcl::PointXYZ, float>::~IterativeClosestPoint (this=0x13bf8c0, __in_chrg=)

at /usr/local/include/pcl-1.8/pcl/registration/icp.h:155

24 0x00007fffba96d062 in boost::movelib::default_delete<pcl::IterativeClosestPoint<pcl::PointXYZ, pcl::PointXYZ, float> >::operator()<pcl::IterativeClosestPoint<pcl::PointXYZ, pcl::PointXYZ, float> > (this=, ptr=) at /usr/include/boost/move/default_delete.hpp:181

25 boost::movelib::unique_ptr<pcl::IterativeClosestPoint<pcl::PointXYZ, pcl::PointXYZ, float>, boost::movelib::default_delete<pcl::IterativeClosestPoint<pcl::PointXYZ, pcl::PointXYZ, float> > >::~unique_ptr (this=, __in_chrg=) at /usr/include/boost/move/unique_ptr.hpp:559

26 icp_forward (p_cloud=..., q_cloud=...) at icp_op.cpp:77

Another problem is that you mentioned torch::from_blob. Which extra header file should I include to use it? With torch/torch.h, it cannot find this function.

goldsborough commented 6 years ago

For the from_blob: That was introduced in a later version of PyTorch, it wasn't available in 0.4.0 -- my bad.

I spent some time today trying to reproduce your bug in a docker container but I could not. I use this Dockerfile:

FROM ubuntu:xenial

RUN apt-get update  -y \
  && apt-get install -y git cmake vim make wget gnupg build-essential software-properties-common gdb

RUN apt-get install -y libpcl-dev

RUN wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh \
  && chmod +x miniconda.sh \
  && ./miniconda.sh -b -p ~/local/miniconda

RUN . ~/local/miniconda/bin/activate && conda install -c pytorch pytorch==0.4.0

WORKDIR /home

and then everything seems to work pretty well:

(base) root@7354d80b0db7:/home# cat /home/
.git/        .gitignore   Dockerfile   README.md    __pycache__/ icp.py       icp_op.cpp   icp_test.py  setup.py
(base) root@7354d80b0db7:/home# cat /home/^C
(base) root@7354d80b0db7:/home# python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())'
/root/local/miniconda/lib/python3.6/site-packages
(base) root@7354d80b0db7:/home# find /root/local/miniconda/lib/python3.6/site-packages -name cpp_extension.pty
(base) root@7354d80b0db7:/home# find /root/local/miniconda/lib/python3.6/site-packages -name cpp_extension.py
/root/local/miniconda/lib/python3.6/site-packages/torch/utils/cpp_extension.py
(base) root@7354d80b0db7:/home# ^Cnd /root/local/miniconda/lib/python3.6/site-packages -name cpp_extension.py
(base) root@7354d80b0db7:/home# less /root/local/miniconda/lib/python3.6/site-packages/torch/utils/cpp_extension.py
(base) root@7354d80b0db7:/home# ls
Dockerfile  README.md  __pycache__  icp.py  icp_op.cpp  icp_test.py  setup.py
(base) root@7354d80b0db7:/home# python setup.py build develop
running build
running build_ext
building 'icp_cpp' extension
creating build
creating build/temp.linux-x86_64-3.6
gcc -pthread -B /root/local/miniconda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DEIGEN_YES_I_KNOW_SPARSE_MODULE_IS_NOT_STABLE_YET=1 -I/root/local/miniconda/lib/python3.6/site-packages/numpy/core/include -I/usr/include/pcl-1.7 -I/usr/include/ni -I/usr/include/eigen3 -I/usr/include/ni -I/root/local/miniconda/lib/python3.6/site-packages/torch/lib/include -I/root/local/miniconda/lib/python3.6/site-packages/torch/lib/include/TH -I/root/local/miniconda/lib/python3.6/site-packages/torch/lib/include/THC -I/root/local/miniconda/include/python3.6m -c icp_op.cpp -o build/temp.linux-x86_64-3.6/icp_op.o -DTORCH_EXTENSION_NAME=icp_cpp -std=c++11
cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for C++
creating build/lib.linux-x86_64-3.6
g++ -pthread -shared -B /root/local/miniconda/compiler_compat -L/root/local/miniconda/lib -Wl,-rpath=/root/local/miniconda/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.6/icp_op.o -lpcl_registration -lpcl_segmentation -lpcl_features -lpcl_surface -lpcl_tracking -lpcl_filters -lpcl_sample_consensus -lpcl_visualization -lpcl_io -lOpenNI -lpcl_search -lpcl_kdtree -lflann_cpp -lpcl_octree -lpcl_common -o build/lib.linux-x86_64-3.6/icp_cpp.cpython-36m-x86_64-linux-gnu.so -lboost_system
running develop
running egg_info
creating icp_cpp.egg-info
writing icp_cpp.egg-info/PKG-INFO
writing dependency_links to icp_cpp.egg-info/dependency_links.txt
writing top-level names to icp_cpp.egg-info/top_level.txt
writing manifest file 'icp_cpp.egg-info/SOURCES.txt'
reading manifest file 'icp_cpp.egg-info/SOURCES.txt'
writing manifest file 'icp_cpp.egg-info/SOURCES.txt'
running build_ext
copying build/lib.linux-x86_64-3.6/icp_cpp.cpython-36m-x86_64-linux-gnu.so ->
Creating /root/local/miniconda/lib/python3.6/site-packages/icp-cpp.egg-link (link to .)
Adding icp-cpp 1.0 to easy-install.pth file

Installed /home
Processing dependencies for icp-cpp==1.0
Finished processing dependencies for icp-cpp==1.0
(base) root@7354d80b0db7:/home# python
.git/                                    README.md                                icp.py                                   icp_op.cpp
.gitignore                               __pycache__/                             icp_cpp.cpython-36m-x86_64-linux-gnu.so  icp_test.py
Dockerfile                               build/                                   icp_cpp.egg-info/                        setup.py
(base) root@7354d80b0db7:/home# python
.git/                                    README.md                                icp.py                                   icp_op.cpp
.gitignore                               __pycache__/                             icp_cpp.cpython-36m-x86_64-linux-gnu.so  icp_test.py
Dockerfile                               build/                                   icp_cpp.egg-info/                        setup.py
(base) root@7354d80b0db7:/home# python icp_test.py
tensor([[[-1.2634e+00, -2.3912e-01,  3.1981e-01],
         [-7.7116e-01,  3.9494e-02, -3.2341e-01],
         [-2.0449e+00,  6.7875e-01, -9.5829e-01],
         ...,
         [-1.1826e-01, -1.0028e+00, -8.6894e-02],
         [ 2.6089e-01, -8.6151e-02,  3.6891e-01],
         [ 1.4749e-01,  9.5050e-01, -4.9166e-01]]]) tensor([[[-1.6041, -0.4577,  0.8348],
         [ 1.0041, -1.2082, -0.4258],
         [ 0.1405, -1.9008, -1.2343],
         ...,
         [-0.1316, -0.4407, -0.4610],
         [ 1.6476, -1.3544,  0.5584],
         [-0.1154,  0.6452, -0.3808]]])

Did you make any progress on your end?

onlytailei commented 6 years ago

Thank you @goldsborough ! I tried your docker. It really works!! I will check my environment and close the issue. Many thanks!

joshi-bharat commented 5 years ago

@onlytailei I am having the same issue. Were you able to resolve this problem? I am having segmentation fault at IterativeClosestPoint<pcl::PointXYZ, pcl::PointXYZ> icp; I am using stable pytorch 1.1 and cuda-toolkit 10.0.

healthysong commented 4 years ago

I am having the same issue. Were you able to resolve this problem? I am having segmentation fault at kdtree.setInputCloud(clouds);