plasmodic / ecto

ecto is a dynamically configurable Directed Acyclic processing Graph (DAG) framework.
http://ecto.willowgarage.com/
BSD 3-Clause "New" or "Revised" License
97 stars 37 forks source link

segmentation fault in version 0.6.8 #272

Open lannersluc opened 9 years ago

lannersluc commented 9 years ago

Hi everyone,

I am working with the object recognition kitchen package for detecting objects. This package uses ecto and using the new version of ecto (0.6.8) results in a segmentation fault.

First I am going to describe which packages I am running which cause this error and then you'll find a backtrace of GDB.

Basically I am following the procedure presented in the ORK tutorial. When executing the following command:

rosrun object_recognition_core detection -c rospack find object_recognition_tabletop/conf/detection.table.ros.ork

results in a segmentation fault.

Running the same script with GDB gives the following backtrace:

gdb --args python ./detection -c $(rospack find object_detection)/conf/filter_kinect.object.detection.ros.ork GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04 Copyright (C) 2012 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". For bug reporting instructions, please see: http://bugs.launchpad.net/gdb-linaro/... Reading symbols from /usr/bin/python...(no debugging symbols found)...done. (gdb) r Starting program: /usr/bin/python ./detection -c /home/luc/ws/src/pr2_tidyup/object_detection/conf/filter_kinect.object.detection.ros.ork warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7ffff7ffa000 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". [New Thread 0x7fffda7a5700 (LWP 8027)] [New Thread 0x7fffd9fa4700 (LWP 8028)] [New Thread 0x7fffd97a3700 (LWP 8029)] [New Thread 0x7fffd8fa2700 (LWP 8036)] [New Thread 0x7fffcbfff700 (LWP 8037)] [New Thread 0x7fffcb7fe700 (LWP 8038)] [New Thread 0x7fffcaffd700 (LWP 8039)] [New Thread 0x7fffca7fc700 (LWP 8040)] [New Thread 0x7fffc9ffb700 (LWP 8041)] [New Thread 0x7fffc97fa700 (LWP 8042)] [New Thread 0x7fffc8ff9700 (LWP 8043)] [New Thread 0x7fffaffff700 (LWP 8044)] [ INFO] [1426193567.081669851]: Initialized ROS. node_name: /object_recognition_server [ INFO] [1426193568.665809328, 3478.890000000]: System already initialized. node_name: /object_recognition_server [ INFO] [1426193571.401531358, 3480.321000000]: System already initialized. node_name: /object_recognition_server

Program received signal SIGSEGV, Segmentation fault. 0x000000000043a8a0 in PyObject_Call () (gdb) where

0 0x000000000043a8a0 in PyObject_Call ()

1 0x000000000048ae4e in PyObject_CallFunction ()

2 0x00007ffff5b2ae40 in boost::python::detail::dict_base::dict_base(boost::python::api::object const&)

() from /usr/lib/libboost_python-py27.so.1.46.1

3 0x00007fffeb4012db in dictboost::python::api::object (data=..., this=0x7fffffffb960)

at /usr/include/boost/python/dict.hpp:93

4 ecto_ros::Synchronizer::configure (this=0x2114db0, p=..., in=..., out=...)

at /home/luc/ws/src/ecto/ros/src/Synchronizer.cpp:75

5 0x00007ffff50c2df6 in ecto::cell::configure (this=0x206d3c0)

at /home/luc/ws/src/ecto/ecto/src/lib/cell.cpp:177

6 0x00007ffff517d7e8 in ecto::graph::invoke_configuration (graph=..., vd=0)

at /home/luc/ws/src/ecto/ecto/src/lib/graph/utilities.cpp:147

7 0x00007ffff50e1965 in ecto::plasm::configure_all (this=0x20680e0)

at /home/luc/ws/src/ecto/ecto/src/lib/plasm.cpp:285

8 0x00007ffff2560601 in process (this=0x2080340)

at /home/luc/ws/src/ecto/ecto/src/pybindings/cells/BlackBox.cpp:66

9 process (outputs=..., inputs=..., this=)

at /home/luc/ws/src/ecto/ecto/include/ecto/cell.hpp:422

10 ecto::cell_ecto::py::BlackBox::dispatch_process (this=, inputs=..., outputs=...)

at /home/luc/ws/src/ecto/ecto/include/ecto/cell.hpp:426

11 0x00007ffff50c3673 in ecto::cell::process_with_only_these_inputs (this=0x2080a60,

connected_inputs=...) at /home/luc/ws/src/ecto/ecto/src/lib/cell.cpp:253

12 0x00007ffff517d46a in ecto::graph::invoke_process (graph=..., vd=0)

at /home/luc/ws/src/ecto/ecto/src/lib/graph/utilities.cpp:181

13 0x00007ffff5173f53 in ecto::scheduler::execute_iter (this=0x20841b0, cur_iter=0, num_iters=1,

stack_idx=0) at /home/luc/ws/src/ecto/ecto/src/lib/scheduler.cpp:208

14 0x00007ffff5178061 in operator() (a3=, a2=, p=,

this=<synthetic pointer>, a1=<optimized out>) at /usr/include/boost/bind/mem_fn_template.hpp:393

15 operator()<boost::_mfi::mf3<void,

ecto::scheduler, unsigned int, unsigned int, long unsigned int>, boost::_bi::list0> (f=, this=<synthetic pointer>, a=...) at /usr/include/boost/bind/bind.hpp:457

16 operator() (this=) at /usr/include/boost/bind/bind_template.hpp:20

17 asio_handler_invoke<boost::_bi::bind_t<void,

boost::_mfi::mf3<void, ecto::scheduler, unsigned int, unsigned int, unsigned long>, boost::_bi::list4boost::_bi::value<ecto::scheduler*, boost::_bi::value, boost::_bi::value, boost::_bi::value > > > (function=...) at /usr/include/boost/asio/handler_invoke_hook.hpp:64

18 invoke<boost::_bi::bind_t<void,

boost::_mfi::mf3<void, ecto::scheduler, unsigned int, unsigned int, unsigned long>, boost::_bi::list4boost::_bi::value<ecto::scheduler, boost::_bi::value, boost::_bi::value, boost::_bi::value > >, boost::_bi::bind_t<void, boost::_mfi::mf3<void, ecto::scheduler, unsigned int, unsigned int, unsigned long>, boost::_bi::list4boost::_bi::value<ecto::scheduler, boost::_bi::value, boost::_bi::value, boost::_bi::value > > > (function=..., context=...) at /usr/include/boost/asio/detail/handler_invoke_helpers.hpp:39

19 boost::asio::detail::completion_handler<boost::_bi::bind_t<void,

boost::_mfi::mf3<void, ecto::scheduler, unsigned int, unsigned int, unsigned long>, boost::_bi::list4boost::_bi::value<ecto::scheduler*, boost::_bi::value, boost::_bi::value, boost::_bi::value > > >::do_complete ( owner=0x2084260, base=) at /usr/include/boost/asio/detail/completion_handler.hpp:63

20 0x00007ffff5176de9 in complete (owner=..., this=0x211af00)

at /usr/include/boost/asio/detail/task_io_service_operation.hpp:35

21 boost::asio::detail::task_io_service::do_one (this=0x2084260, lock=...,

this_idle_thread=0x7fffffffc550) at /usr/include/boost/asio/detail/impl/task_io_service.ipp:278

22 0x00007ffff5176fa2 in boost::asio::detail::task_io_service::run (this=0x2084260, ec=...)

at /usr/include/boost/asio/detail/impl/task_io_service.ipp:130

23 0x00007ffff5174f73 in run (this=) at /usr/include/boost/asio/impl/io_service.ipp:57

---Type to continue, or q to quit---

24 ecto::scheduler::run (this=0x20841b0) at /home/luc/ws/src/ecto/ecto/src/lib/scheduler.cpp:147

25 0x00007ffff5175381 in ecto::scheduler::execute (this=0x20841b0, num_iters=)

at /home/luc/ws/src/ecto/ecto/src/lib/scheduler.cpp:89

26 0x00007ffff2560596 in process (this=0x2083b30)

at /home/luc/ws/src/ecto/ecto/src/pybindings/cells/BlackBox.cpp:79

27 process (outputs=..., inputs=..., this=)

at /home/luc/ws/src/ecto/ecto/include/ecto/cell.hpp:422

28 ecto::cell_ecto::py::BlackBox::dispatch_process (this=, inputs=..., outputs=...)

at /home/luc/ws/src/ecto/ecto/include/ecto/cell.hpp:426

29 0x00007ffff50c3673 in ecto::cell::process_with_only_these_inputs (this=0x2083040,

connected_inputs=...) at /home/luc/ws/src/ecto/ecto/src/lib/cell.cpp:253

30 0x00007ffff517d46a in ecto::graph::invoke_process (graph=..., vd=0)

at /home/luc/ws/src/ecto/ecto/src/lib/graph/utilities.cpp:181

31 0x00007ffff5173f53 in ecto::scheduler::execute_iter (this=0x1fadfb0, cur_iter=0, num_iters=0,

stack_idx=0) at /home/luc/ws/src/ecto/ecto/src/lib/scheduler.cpp:208

32 0x00007ffff5178061 in operator() (a3=, a2=, p=,

this=<synthetic pointer>, a1=<optimized out>) at /usr/include/boost/bind/mem_fn_template.hpp:393

33 operator()<boost::_mfi::mf3<void,

ecto::scheduler, unsigned int, unsigned int, long unsigned int>, boost::_bi::list0> (f=, this=<synthetic pointer>, a=...) at /usr/include/boost/bind/bind.hpp:457

34 operator() (this=) at /usr/include/boost/bind/bind_template.hpp:20

35 asio_handler_invoke<boost::_bi::bind_t<void,

boost::_mfi::mf3<void, ecto::scheduler, unsigned int, unsigned int, unsigned long>, boost::_bi::list4boost::_bi::value<ecto::scheduler*, boost::_bi::value, boost::_bi::value, boost::_bi::value > > > (function=...) at /usr/include/boost/asio/handler_invoke_hook.hpp:64

36 invoke<boost::_bi::bind_t<void,

boost::_mfi::mf3<void, ecto::scheduler, unsigned int, unsigned int, unsigned long>, boost::_bi::list4boost::_bi::value<ecto::scheduler, boost::_bi::value, boost::_bi::value, boost::_bi::value > >, boost::_bi::bind_t<void, boost::_mfi::mf3<void, ecto::scheduler, unsigned int, unsigned int, unsigned long>, boost::_bi::list4boost::_bi::value<ecto::scheduler, boost::_bi::value, boost::_bi::value, boost::_bi::value > > > (function=..., context=...) at /usr/include/boost/asio/detail/handler_invoke_helpers.hpp:39

37 boost::asio::detail::completion_handler<boost::_bi::bind_t<void,

boost::_mfi::mf3<void, ecto::scheduler, unsigned int, unsigned int, unsigned long>, boost::_bi::list4boost::_bi::value<ecto::scheduler*, boost::_bi::value, boost::_bi::value, boost::_bi::value > > >::do_complete ( owner=0x1fae060, base=) at /usr/include/boost/asio/detail/completion_handler.hpp:63

38 0x00007ffff5176de9 in complete (owner=..., this=0x211af00)

at /usr/include/boost/asio/detail/task_io_service_operation.hpp:35

39 boost::asio::detail::task_io_service::do_one (this=0x1fae060, lock=...,

this_idle_thread=0x7fffffffcdf0) at /usr/include/boost/asio/detail/impl/task_io_service.ipp:278

40 0x00007ffff5176fa2 in boost::asio::detail::task_io_service::run (this=0x1fae060, ec=...)

at /usr/include/boost/asio/detail/impl/task_io_service.ipp:130

41 0x00007ffff5174f73 in run (this=) at /usr/include/boost/asio/impl/io_service.ipp:57

42 ecto::scheduler::run (this=0x1fadfb0) at /home/luc/ws/src/ecto/ecto/src/lib/scheduler.cpp:147

43 0x00007ffff5175381 in ecto::scheduler::execute (this=0x1fadfb0, num_iters=)

at /home/luc/ws/src/ecto/ecto/src/lib/scheduler.cpp:89

44 0x00007ffff25c77da in invoke<boost::python::to_python_value<bool const&>, bool (*)(ecto::scheduler&, unsigned int), boost::python::arg_from_pythonecto::scheduler&, boost::python::arg_from_python > (ac1=..., ac0=, rc=..., f=)

at /usr/include/boost/python/detail/invoke.hpp:75

45 operator() (args_=, this=)

at /usr/include/boost/python/detail/caller.hpp:223

46 boost::python::objects::caller_py_function_impl<boost::python::detail::caller<bool (*)(ecto::scheduler&, unsigned int), boost::python::default_call_policies,

boost::mpl::vector3<bool, ecto::scheduler&, unsign---Type

to continue, or q to quit--- ed int> > >::operator() (this=, args=, kw=) at /usr/include/boost/python/object/py_function.hpp:38 #47 0x00007ffff5b37c1f in boost::python::objects::function::call(_object_, _object_) const () from /usr/lib/libboost_python-py27.so.1.46.1 #48 0x00007ffff5b37e78 in ?? () from /usr/lib/libboost_python-py27.so.1.46.1 #49 0x00007ffff5b41203 in boost::python::detail::exception_handler::operator()(boost::function0 const&) const () from /usr/lib/libboost_python-py27.so.1.46.1 #50 0x00007ffff2583ee3 in operator() ( ``` translate=0x7ffff2584600 ::translate(ecto::except::NullTendril const&)>, f=..., handler=..., this=) at /usr/include/boost/python/detail/translate_exception.hpp:48 ``` #51 operator(), boost::_bi::list2&> > (a=, this=, f=...) at /usr/include/boost/bind/bind.hpp:382 #52 operator() > (a2=..., a1=..., ``` this=) at ```
When downgrading to version 0.6.7 everything works fine again. If any further information is needed to debug this issue, please let me know. Regards, Luc
stonier commented 9 years ago

Original discussion started here.

@vrabaud I can't kick the problem with any simple tests, so seeing if luc can send me his source workspace so I can at least reproduce the problem...and hopefully git it fixed within a couple of days. Failing all that, we at least know we can rollback to 0.6.7.

lannersluc commented 9 years ago

In the appendix, you will find a .tar.gz file containing: ros.rosinstall and the object_detection package.

Open the ros.rosinstall, there you'll find a description on how to reproduce the error. (It is only a minimal installation to reproduce the error)

Hope this will help you to debug, if you need further help to setup the packages, please let me know

2015-03-15 23:42 GMT+01:00 Daniel Stonier notifications@github.com:

Original discussion started here https://groups.google.com/a/plasmodic.org/forum/#!topic/ecto-dev/BlDYoMW25vQ .

@vrabaud https://github.com/vrabaud I can't kick the problem with any simple tests, so seeing if luc can send me his source workspace so I can at least reproduce the problem...and hopefully git it fixed within a couple of days. Failing all that, we at least know we can rollback to 0.6.7.

— Reply to this email directly or view it on GitHub https://github.com/plasmodic/ecto/issues/272#issuecomment-81275353.

stonier commented 9 years ago

Appendix of?

lannersluc commented 9 years ago

In the appendix of my previous mail. Don't you see the appendix? If not, I'll upload the files somewhere and send you the link

2015-03-17 9:17 GMT+01:00 Daniel Stonier notifications@github.com:

Appendix of?

— Reply to this email directly or view it on GitHub https://github.com/plasmodic/ecto/issues/272#issuecomment-82196353.

stonier commented 9 years ago

On 17 March 2015 at 17:38, lannersluc notifications@github.com wrote:

In the appendix of my previous mail. Don't you see the appendix? If not, I'll upload the files somewhere and send you the link

Nope, no attachments.

Daniel.

2015-03-17 9:17 GMT+01:00 Daniel Stonier notifications@github.com:

Appendix of?

— Reply to this email directly or view it on GitHub https://github.com/plasmodic/ecto/issues/272#issuecomment-82196353.

— Reply to this email directly or view it on GitHub https://github.com/plasmodic/ecto/issues/272#issuecomment-82201391.

Phone : +82-10-5400-3296 (010-5400-3296) Home: http://snorriheim.dnsdojo.com/ Yujin Inno: http://inno.yujinrobot.com/ http://rnd.yujinrobot.com/

vrabaud commented 9 years ago

the Synchronizer is only used in the Kinect. I need to get my hands on one again .....

lannersluc commented 9 years ago

I created a repository for the bug: https://github.com/lannersluc/ecto-SegFault

There you'll find a description to setup the environment in order to recreate the segFault. Hope this helps

stonier commented 9 years ago

the Synchronizer is only used in the Kinect. I need to get my hands on one again

There's two synchronizer's, one in the openni code, another in the core ecto code and they seem to be unrelated. The synchroniser in the backtrace is from the core ecto code.

stonier commented 9 years ago

Ok, found a kinect and a hydro installation, dropped in that rosinstaller, but the frieberg pr2_tidyup has various missing, or typo'd dependencies and wouldn't compile.

I noticed though that you have the object_detection package from pr2_tidyup in your ecto-SegFault repo though, so I dropped pr2_tidyup and just installed the ecto-SegFault repo. Launching was fine:

$ roslaunch object_detection filtered_kinect_detection.launch --screen
... logging to /home/may/.ros/log/211dabd0-d520-11e4-9995-74d435132f21/roslaunch-mayheim-10282.log
Checking log directory for disk usage. This may take awhile.
Press Ctrl-C to interrupt
Done checking log file disk usage. Usage is <1GB.

SUMMARY
========

PARAMETERS
 * /rosdistro
 * /rosversion

NODES
  /
    filter_kinect (object_detection/filter_kinect)
    object_detection (object_recognition_core/detection)
    object_information_server (object_recognition_ros/object_information_server)
    table_detection (object_recognition_core/detection)

auto-starting new master
process[master]: started with pid [10296]
ROS_MASTER_URI=http://localhost:11311

setting /run_id to 211dabd0-d520-11e4-9995-74d435132f21
process[rosout-1]: started with pid [10309]
started core service [/rosout]
process[filter_kinect-2]: started with pid [10312]
process[table_detection-3]: started with pid [10313]
process[object_detection-4]: started with pid [10314]
process[object_information_server-5]: started with pid [10315]
[ INFO] [1427529473.496856929]: Initialized ROS. node_name: /object_detection
[ INFO] [1427529473.496927186]: Initialized ROS. node_name: /table_detection
[ INFO] [1427529473.886672662]: System already initialized. node_name: /object_detection
[ INFO] [1427529474.116872713]: System already initialized. node_name: /table_detection
[ INFO] [1427529474.119896261]: Subscribed to topic:/head_mount_kinect/depth/camera_info with queue size of 1
[ INFO] [1427529474.120804791]: Subscribed to topic:/head_mount_kinect/rgb/camera_info with queue size of 1
[ INFO] [1427529474.121744887]: Subscribed to topic:/head_mount_kinect/depth/image_fixed with queue size of 1
[ INFO] [1427529474.122661952]: Subscribed to topic:/head_mount_kinect/rgb/image_raw with queue size of 1
[ INFO] [1427529474.125945109]: System already initialized. node_name: /object_detection
[ INFO] [1427529474.128506575]: Subscribed to topic:/head_mount_kinect/rgb/camera_info with queue size of 1
[ INFO] [1427529474.129054930]: Subscribed to topic:/head_mount_kinect/depth/camera_info with queue size of 1
[ INFO] [1427529474.129687675]: Subscribed to topic:/head_mount_kinect/depth/image_fixed with queue size of 1
[ INFO] [1427529474.130189453]: Subscribed to topic:/head_mount_kinect/rgb/image_raw with queue size of 1

No problems. Perhaps it's not quite the same as you were running though and not quite getting to whatever part of the code you were crashing on. Thoughts?

stonier commented 9 years ago

Just to confirm, the scheduler was indeed looping around calling invoke_process and processing all inputs (not just connected inputs only as indicated in the backtrace). So something else is definitely awry here.

lannersluc commented 9 years ago

The original package "object_detection" is in GKIFreiburg/pr2_tidyup which contains also a lot of other stuff. Therefore I created a minimal package, the ecto-SegFault package, which only contains the object_detection package with the segmentation fault. So you did it right by dropping the pr2_tidyup and only using ecto-SegFault.

However, I just repeated the procedure and checked out the packages and rebuilt it. On my machine I still get an segFault when subscribing to the topics. But your output seems good, there is no segFault and the node is able to subscribe to the topics.

Are you sure you are using ecto/ecto version 0.6.8? Otherwise, I don't have an explanation why it is running on your machine and on mine it isn't.

stonier commented 9 years ago

Yeah, definitely 0.6.8. Last commit is the 0.6.8 and I edited/recompiled some sources with logging to check that it was picking up exactly this ecto.

You don't happen to have a binary ecto in /opt/ros?