jsk-ros-pkg / jsk_common

common programs for jsk-ros-pkg
42 stars 81 forks source link

Nodelets respawn unexpectedly and fail reloading #1531

Open furushchev opened 7 years ago

furushchev commented 7 years ago

Nodeletを動かしているとなんらかの理由で時折プロセスごと落ちてしまいます。 この時respawn="true"にしていると一定の確率で、respawnにも失敗してしまうようです。

process[restaurant_perception_nodelet_manager-1]: started with pid [29789]
process[people_detection/input_image_relay-2]: started with pid [29795]
process[people_detection/throttle-3]: started with pid [29803]
process[people_detection/face_detection-4]: started with pid [29817]
process[in_shelf_object_detection_nodelet_manager-5]: started with pid [29852]
process[in_shelf_object_detection/input_relay-6]: started with pid [29856]
process[in_shelf_object_detection/floor_removal-7]: started with pid [29875]
process[in_shelf_object_detection/multi_plane_segmentation-8]: started with pid [29913]
process[in_shelf_object_detection/plane_reasoner-9]: started with pid [29939]
process[in_shelf_object_detection/plane_reasoner_decomposer-10]: started with pid [29963]
process[in_shelf_object_detection/robot_workspace_tf_publisher-11]: started with pid [30006]
process[in_shelf_object_detection/plane_distance_likelihood-12]: started with pid [30009]
process[in_shelf_object_detection/plane_likelihood_filter-13]: started with pid [30018]
process[in_shelf_object_detection/plane_magnifier-14]: started with pid [30067]
process[in_shelf_object_detection/polygon_array_transformer-15]: started with pid [30094]
process[in_shelf_object_detection/bilateral_filter-16]: started with pid [30106]
process[in_shelf_object_detection/voxel_grid-17]: started with pid [30120]
process[in_shelf_object_detection/plane_extraction-18]: started with pid [30124]
process[in_shelf_object_detection/euclidean_clustering-19]: started with pid [30126]
process[in_shelf_object_detection/cluster_decomposer-20]: started with pid [30136]
process[take_from_table_nodelet_manager-21]: started with pid [30145]
process[tabletop_object_detector/input_relay-22]: started with pid [30170]
process[tabletop_object_detector/passthrough-23]: started with pid [30189]
process[tabletop_object_detector/multi_plane_estimate-24]: started with pid [30211]
process[tabletop_object_detector/table_extractor-25]: started with pid [30221]
process[tabletop_object_detector/table_extractor_decomposer-26]: started with pid [30233]
process[tabletop_object_detector/table_polygon_likelihood_filter-27]: started with pid [30257]
process[tabletop_object_detector/filtering_table_polygon-28]: started with pid [30311]
process[tabletop_object_detector/polygon_to_polygon_array-29]: started with pid [30315]
process[tabletop_object_detector/polygon_array_transformer-30]: started with pid [30353]
[ INFO] [1494833581.120339038]: Initializing nodelet with 8 worker threads.
[ INFO] [1494833581.179103479]: Initializing nodelet with 8 worker threads.
process[tabletop_object_detector/voxel_filter-31]: started with pid [30369]
process[tabletop_object_detector/table_surface_object_extraction-32]: started with pid [30459]
process[tabletop_object_detector/clustering-33]: started with pid [30495]
process[tabletop_object_detector/cluster_decomposer-34]: started with pid [30524]
process[tabletop_object_detector/bbox_array_to_bbox-35]: started with pid [30576]
process[tabletop_object_detector/publish_tf_bbox-36]: started with pid [30619]
[ INFO] [1494833581.839976191]: Initializing nodelet with 8 worker threads.
[INFO] [WallTime: 1494833584.410268] launch bbox tf publisher
[ INFO] [1494833584.777466170]: instantiating tf::TransformListener
[ INFO] [1494833586.425024214]: instantiating tf::TransformListener
[ WARN] [1494833587.943898020]: '/people_detection/face_detection' subscribes topics only with child subscribers.
[ WARN] [1494833588.781218196]: ~output%02d are not published before subscribed, you should subscribe ~debug_output in debuging.
[ WARN] [1494833589.290510048]: '/in_shelf_object_detection/plane_likelihood_filter' subscribes topics only with child subscribers.
[WARN] [WallTime: 1494833589.419369] [/tabletop_object_detector/publish_tf_bbox] subscribes topics only with child subscribers. Set '~always_subscribe' as True to have it subscribe always.
[ WARN] [1494833589.572431583]: '/in_shelf_object_detection/plane_extraction' subscribes topics only with child subscribers.
[ WARN] [1494833589.754981714]: '/tabletop_object_detector/multi_plane_estimate' subscribes topics only with child subscribers.
[ WARN] [1494833590.179708182]: '/in_shelf_object_detection/plane_reasoner_decomposer' subscribes topics only with child subscribers.
[ WARN] [1494833590.567020005]: '/in_shelf_object_detection/polygon_array_transformer' subscribes topics only with child subscribers.
[ WARN] [1494833590.629797604]: '/in_shelf_object_detection/plane_magnifier' subscribes topics only with child subscribers.
[ WARN] [1494833591.423378086]: '/tabletop_object_detector/polygon_array_transformer' subscribes topics only with child subscribers.
[ WARN] [1494833591.623547605]: '/in_shelf_object_detection/euclidean_clustering' subscribes topics only with child subscribers.
[ WARN] [1494833592.419575686]: '/tabletop_object_detector/cluster_decomposer' subscribes topics only with child subscribers.
[ WARN] [1494833592.742550809]: '/in_shelf_object_detection/cluster_decomposer' subscribes topics only with child subscribers.
[ WARN] [1494833593.052135433]: '/tabletop_object_detector/clustering' subscribes topics only with child subscribers.
[ WARN] [1494833593.466370398]: '/tabletop_object_detector/bbox_array_to_bbox' subscribes topics only with child subscribers.
[ WARN] [1494833593.595885797]: '/tabletop_object_detector/table_polygon_likelihood_filter' subscribes topics only with child subscribers.
[ WARN] [1494833593.669023446]: '/tabletop_object_detector/table_extractor_decomposer' subscribes topics only with child subscribers.
[ WARN] [1494833594.177390984]: '/tabletop_object_detector/table_surface_object_extraction' subscribes topics only with child subscribers.
[ WARN] [1494833594.661611797]: '/tabletop_object_detector/table_extractor' subscribes topics only with child subscribers.
[ WARN] [1494833594.946915592]: '/tabletop_object_detector/polygon_to_polygon_array' subscribes topics only with child subscribers.
[ WARN] [1494833595.049390388]: '/tabletop_object_detector/filtering_table_polygon' subscribes topics only with child subscribers.
[in_shelf_object_detection/floor_removal-7] process has finished cleanly
log file: /home/m-takeda/.ros/log/20170509-180013_e9c27bf2-3495-11e7-acce-00306444a934/in_shelf_object_detection-floor_removal-7*.log
[in_shelf_object_detection/floor_removal-7] restarting process
process[in_shelf_object_detection/floor_removal-7]: started with pid [6411]
[ERROR] [1494833780.963584485]: Cannot load nodelet /in_shelf_object_detection/floor_removal for one exists with that name already
[FATAL] [1494833780.964170440]: Failed to load nodelet '/in_shelf_object_detection/floor_removal` of type `pcl/PassThrough` to manager `/in_shelf_object_detection_nodelet_manager'
[in_shelf_object_detection/floor_removal-7] process has died [pid 6411, exit code 255, cmd /opt/ros/indigo/lib/nodelet/nodelet load pcl/PassThrough /in_shelf_object_detection_nodelet_manager ~input:=input_relay/output __name:=floor_removal __log:=/home/m-takeda/.ros/log/20170509-180013_e9c27bf2-3495-11e7-acce-00306444a934/in_shelf_object_detection-floor_removal-7.log].
log file: /home/m-takeda/.ros/log/20170509-180013_e9c27bf2-3495-11e7-acce-00306444a934/in_shelf_object_detection-floor_removal-7*.log
[in_shelf_object_detection/floor_removal-7] restarting process
process[in_shelf_object_detection/floor_removal-7]: started with pid [6440]
[ERROR] [1494833782.185400345]: Cannot load nodelet /in_shelf_object_detection/floor_removal for one exists with that name already
[FATAL] [1494833782.185789113]: Failed to load nodelet '/in_shelf_object_detection/floor_removal` of type `pcl/PassThrough` to manager `/in_shelf_object_detection_nodelet_manager'
[in_shelf_object_detection/floor_removal-7] process has died [pid 6440, exit code 255, cmd /opt/ros/indigo/lib/nodelet/nodelet load pcl/PassThrough /in_shelf_object_detection_nodelet_manager ~input:=input_relay/output __name:=floor_removal __log:=/home/m-takeda/.ros/log/20170509-180013_e9c27bf2-3495-11e7-acce-00306444a934/in_shelf_object_detection-floor_removal-7.log].
log file: /home/m-takeda/.ros/log/20170509-180013_e9c27bf2-3495-11e7-acce-00306444a934/in_shelf_object_detection-floor_removal-7*.log
[in_shelf_object_detection/floor_removal-7] restarting process
process[in_shelf_object_detection/floor_removal-7]: started with pid [6484]
[ERROR] [1494833782.565997253]: Cannot load nodelet /in_shelf_object_detection/floor_removal for one exists with that name already
[FATAL] [1494833782.566426413]: Failed to load nodelet '/in_shelf_object_detection/floor_removal` of type `pcl/PassThrough` to manager `/in_shelf_object_detection_nodelet_manager'
[in_shelf_object_detection/floor_removal-7] process has died [pid 6484, exit code 255, cmd /opt/ros/indigo/lib/nodelet/nodelet load pcl/PassThrough /in_shelf_object_detection_nodelet_manager ~input:=input_relay/output __name:=floor_removal __log:=/home/m-takeda/.ros/log/20170509-180013_e9c27bf2-3495-11e7-acce-00306444a934/in_shelf_object_detection-floor_removal-7.log].
log file: /home/m-takeda/.ros/log/20170509-180013_e9c27bf2-3495-11e7-acce-00306444a934/in_shelf_object_detection-floor_removal-7*.log
[in_shelf_object_detection/floor_removal-7] restarting process
process[in_shelf_object_detection/floor_removal-7]: started with pid [6532]
[ERROR] [1494833785.231884430]: Lookup would require extrapolation into the past.  Requested time 1494833783.202391739 but the earliest data is at time 1494833785.066103925, when looking up transform from frame [head_rgbd_sensor_rgb_frame] to frame [base_link]
[ERROR] [1494833785.245710413]: [/in_shelf_object_detection/floor_removal::input_indices_callback] Error converting input dataset from head_rgbd_sensor_rgb_frame to base_link.
[ERROR] [1494833785.719620659]: Lookup would require extrapolation into the past.  Requested time 1494833784.059441360 but the earliest data is at time 1494833785.066103925, when looking up transform from frame [head_rgbd_sensor_rgb_frame] to frame [base_link]
[ERROR] [1494833785.819443191]: [/in_shelf_object_detection/floor_removal::input_indices_callback] Error converting input dataset from head_rgbd_sensor_rgb_frame to base_link.
[ERROR] [1494833785.880720792]: Lookup would require extrapolation into the past.  Requested time 1494833784.775936540 but the earliest data is at time 1494833785.066103925, when looking up transform from frame [head_rgbd_sensor_rgb_frame] to frame [base_link]
[ERROR] [1494833785.880819505]: [/in_shelf_object_detection/floor_removal::input_indices_callback] Error converting input dataset from head_rgbd_sensor_rgb_frame to base_link.
[ERROR] [1494833786.059197347]: Lookup would require extrapolation into the past.  Requested time 1494833785.061303784 but the earliest data is at time 1494833785.066103925, when looking up transform from frame [head_rgbd_sensor_rgb_frame] to frame [base_link]
[ERROR] [1494833786.059992646]: [/in_shelf_object_detection/floor_removal::input_indices_callback] Error converting input dataset from head_rgbd_sensor_rgb_frame to base_link.
[in_shelf_object_detection/voxel_grid-17] process has finished cleanly
log file: /home/m-takeda/.ros/log/20170509-180013_e9c27bf2-3495-11e7-acce-00306444a934/in_shelf_object_detection-voxel_grid-17*.log
[in_shelf_object_detection/voxel_grid-17] restarting process
process[in_shelf_object_detection/voxel_grid-17]: started with pid [10372]
[ERROR] [1494833917.455049234]: Cannot load nodelet /in_shelf_object_detection/voxel_grid for one exists with that name already
[FATAL] [1494833917.456441748]: Failed to load nodelet '/in_shelf_object_detection/voxel_grid` of type `pcl/VoxelGrid` to manager `/in_shelf_object_detection_nodelet_manager'
[in_shelf_object_detection/voxel_grid-17] process has died [pid 10372, exit code 255, cmd /opt/ros/indigo/lib/nodelet/nodelet load pcl/VoxelGrid /in_shelf_object_detection_nodelet_manager ~input:=bilateral_filter/output __name:=voxel_grid __log:=/home/m-takeda/.ros/log/20170509-180013_e9c27bf2-3495-11e7-acce-00306444a934/in_shelf_object_detection-voxel_grid-17.log].
log file: /home/m-takeda/.ros/log/20170509-180013_e9c27bf2-3495-11e7-acce-00306444a934/in_shelf_object_detection-voxel_grid-17*.log
[in_shelf_object_detection/voxel_grid-17] restarting process
process[in_shelf_object_detection/voxel_grid-17]: started with pid [10433]
[ERROR] [1494833918.140655873]: Cannot load nodelet /in_shelf_object_detection/voxel_grid for one exists with that name already
[FATAL] [1494833918.141456785]: Failed to load nodelet '/in_shelf_object_detection/voxel_grid` of type `pcl/VoxelGrid` to manager `/in_shelf_object_detection_nodelet_manager'
[in_shelf_object_detection/voxel_grid-17] process has died [pid 10433, exit code 255, cmd /opt/ros/indigo/lib/nodelet/nodelet load pcl/VoxelGrid /in_shelf_object_detection_nodelet_manager ~input:=bilateral_filter/output __name:=voxel_grid __log:=/home/m-takeda/.ros/log/20170509-180013_e9c27bf2-3495-11e7-acce-00306444a934/in_shelf_object_detection-voxel_grid-17.log].
log file: /home/m-takeda/.ros/log/20170509-180013_e9c27bf2-3495-11e7-acce-00306444a934/in_shelf_object_detection-voxel_grid-17*.log
[in_shelf_object_detection/voxel_grid-17] restarting process
process[in_shelf_object_detection/voxel_grid-17]: started with pid [10468]
[ERROR] [1494833918.612126023]: Cannot load nodelet /in_shelf_object_detection/voxel_grid for one exists with that name already
[FATAL] [1494833918.612695311]: Failed to load nodelet '/in_shelf_object_detection/voxel_grid` of type `pcl/VoxelGrid` to manager `/in_shelf_object_detection_nodelet_manager'
[in_shelf_object_detection/voxel_grid-17] process has died [pid 10468, exit code 255, cmd /opt/ros/indigo/lib/nodelet/nodelet load pcl/VoxelGrid /in_shelf_object_detection_nodelet_manager ~input:=bilateral_filter/output __name:=voxel_grid __log:=/home/m-takeda/.ros/log/20170509-180013_e9c27bf2-3495-11e7-acce-00306444a934/in_shelf_object_detection-voxel_grid-17.log].
log file: /home/m-takeda/.ros/log/20170509-180013_e9c27bf2-3495-11e7-acce-00306444a934/in_shelf_object_detection-voxel_grid-17*.log
[in_shelf_object_detection/voxel_grid-17] restarting process
process[in_shelf_object_detection/voxel_grid-17]: started with pid [10502]
[in_shelf_object_detection/plane_reasoner-9] process has finished cleanly
log file: /home/m-takeda/.ros/log/20170509-180013_e9c27bf2-3495-11e7-acce-00306444a934/in_shelf_object_detection-plane_reasoner-9*.log
[in_shelf_object_detection/plane_reasoner-9] restarting process
process[in_shelf_object_detection/plane_reasoner-9]: started with pid [13954]
[ERROR] [1494834027.099067796]: Cannot load nodelet /in_shelf_object_detection/plane_reasoner for one exists with that name already
[FATAL] [1494834027.099525436]: Failed to load nodelet '/in_shelf_object_detection/plane_reasoner` of type `jsk_pcl_utils/PlaneReasoner` to manager `/in_shelf_object_detection_nodelet_manager'
[in_shelf_object_detection/plane_reasoner_decomposer-10] process has finished cleanly
log file: /home/m-takeda/.ros/log/20170509-180013_e9c27bf2-3495-11e7-acce-00306444a934/in_shelf_object_detection-plane_reasoner_decomposer-10*.log
[in_shelf_object_detection/plane_reasoner_decomposer-10] restarting process
process[in_shelf_object_detection/plane_reasoner_decomposer-10]: started with pid [13981]
[in_shelf_object_detection/plane_reasoner-9] process has died [pid 13954, exit code 255, cmd /opt/ros/indigo/lib/nodelet/nodelet load jsk_pcl_utils/PlaneReasoner /in_shelf_object_detection_nodelet_manager ~input:=input_relay/output ~input_inliers:=multi_plane_segmentation/output_refined ~input_polygons:=multi_plane_segmentation/output_refined_polygon ~input_coefficients:=multi_plane_segmentation/output_refined_coefficients __name:=plane_reasoner __log:=/home/m-takeda/.ros/log/20170509-180013_e9c27bf2-3495-11e7-acce-00306444a934/in_shelf_object_detection-plane_reasoner-9.log].
log file: /home/m-takeda/.ros/log/20170509-180013_e9c27bf2-3495-11e7-acce-00306444a934/in_shelf_object_detection-plane_reasoner-9*.log
[in_shelf_object_detection/plane_reasoner-9] restarting process
process[in_shelf_object_detection/plane_reasoner-9]: started with pid [14000]
[ERROR] [1494834027.415259800]: Cannot load nodelet /in_shelf_object_detection/plane_reasoner_decomposer for one exists with that name already
[FATAL] [1494834027.416571464]: Failed to load nodelet '/in_shelf_object_detection/plane_reasoner_decomposer` of type `jsk_pcl/ClusterPointIndicesDecomposer` to manager `/in_shelf_object_detection_nodelet_manager'
[ERROR] [1494834027.480746146]: Cannot load nodelet /in_shelf_object_detection/plane_reasoner for one exists with that name already
[FATAL] [1494834027.481500521]: Failed to load nodelet '/in_shelf_object_detection/plane_reasoner` of type `jsk_pcl_utils/PlaneReasoner` to manager `/in_shelf_object_detection_nodelet_manager'
[in_shelf_object_detection/plane_reasoner-9] process has died [pid 14000, exit code 255, cmd /opt/ros/indigo/lib/nodelet/nodelet load jsk_pcl_utils/PlaneReasoner /in_shelf_object_detection_nodelet_manager ~input:=input_relay/output ~input_inliers:=multi_plane_segmentation/output_refined ~input_polygons:=multi_plane_segmentation/output_refined_polygon ~input_coefficients:=multi_plane_segmentation/output_refined_coefficients __name:=plane_reasoner __log:=/home/m-takeda/.ros/log/20170509-180013_e9c27bf2-3495-11e7-acce-00306444a934/in_shelf_object_detection-plane_reasoner-9.log].
log file: /home/m-takeda/.ros/log/20170509-180013_e9c27bf2-3495-11e7-acce-00306444a934/in_shelf_object_detection-plane_reasoner-9*.log
[in_shelf_object_detection/plane_reasoner_decomposer-10] process has died [pid 13981, exit code 255, cmd /opt/ros/indigo/lib/nodelet/nodelet load jsk_pcl/ClusterPointIndicesDecomposer /in_shelf_object_detection_nodelet_manager ~input:=input_relay/output ~target:=plane_reasoner/output_inliers ~align_planes:=plane_reasoner/output_polygons ~align_planes_coefficients:=plane_reasoner/output_coefficients __name:=plane_reasoner_decomposer __log:=/home/m-takeda/.ros/log/20170509-180013_e9c27bf2-3495-11e7-acce-00306444a934/in_shelf_object_detection-plane_reasoner_decomposer-10.log].
log file: /home/m-takeda/.ros/log/20170509-180013_e9c27bf2-3495-11e7-acce-00306444a934/in_shelf_object_detection-plane_reasoner_decomposer-10*.log
furushchev commented 7 years ago

プロセスごと落ちてしまう原因はわかっていませんが、ros全体のログを眺めているとJSKのNodeletだけでなく一般的にそうなっている気がします。(→nodelet_core or bondの問題?)

その時のgdbのログは以下のようで、nodeletのunloadに失敗していると思われます。 Nodeletのloaderは自分でloadしたnodeletの辞書を持っていて、loadをrequestされた時に参照しているようです。 私見ではunloadの時に失敗して辞書からnodeletが削除されずにrespawnするとこうなるのではないかと思っています。 https://github.com/ros/nodelet_core/blob/6c561224958a575b604a067e149a55feb07044dc/nodelet/src/loader.cpp#L269

(gdb) 
#0  0x00007ffff60bfc37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007ffff60c3028 in __GI_abort () at abort.c:89
#2  0x00007ffff66c7535 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007ffff66c56d6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007ffff66c5703 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007ffff66c5922 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007ffff7bb0601 in void boost::throw_exception<boost::lock_error>(boost::lock_error const&) () from /opt/ros/indigo/lib/libnodeletlib.so
#7  0x00007ffff7bb0705 in boost::unique_lock<boost::mutex>::lock() () from /opt/ros/indigo/lib/libnodeletlib.so
#8  0x00007fffd71e2205 in unique_lock (m_=..., this=0x7fffffffa0f0) at /usr/include/boost/thread/lock_types.hpp:124
#9  message_filters::Signal1<jsk_recognition_msgs::PolygonArray_<std::allocator<void> > >::removeCallback (this=0x1486608, helper=...)
    at /opt/ros/indigo/include/message_filters/signal1.h:102
#10 0x00007fffd72634ca in disconnectAll (this=0x148cc00) at /opt/ros/indigo/include/message_filters/synchronizer.h:351
#11 message_filters::Synchronizer<message_filters::sync_policies::ExactTime<jsk_recognition_msgs::PolygonArray_<std::allocator<void> >, jsk_recognition_msgs::ModelCoefficientsAr
ray_<std::allocator<void> >, message_filters::NullType, message_filters::NullType, message_filters::NullType, message_filters::NullType, message_filters::NullType, message_filte
rs::NullType, message_filters::NullType> >::~Synchronizer (this=0x148cc00, __in_chrg=<optimized out>) at /opt/ros/indigo/include/message_filters/synchronizer.h:228
#12 0x00007fffd7263689 in destroy (this=0x148cbf8) at /usr/include/boost/smart_ptr/make_shared_object.hpp:57
#13 operator() (this=0x148cbf8) at /usr/include/boost/smart_ptr/make_shared_object.hpp:87
#14 boost::detail::sp_counted_impl_pd<message_filters::Synchronizer<message_filters::sync_policies::ExactTime<jsk_recognition_msgs::PolygonArray_<std::allocator<void> >, jsk_rec
ognition_msgs::ModelCoefficientsArray_<std::allocator<void> >, message_filters::NullType, message_filters::NullType, message_filters::NullType, message_filters::NullType, messag
e_filters::NullType, message_filters::NullType, message_filters::NullType> >*, boost::detail::sp_ms_deleter<message_filters::Synchronizer<message_filters::sync_policies::ExactTi
me<jsk_recognition_msgs::PolygonArray_<std::allocator<void> >, jsk_recognition_msgs::ModelCoefficientsArray_<std::allocator<void> >, message_filters::NullType, message_filters::
NullType, message_filters::NullType, message_filters::NullType, message_filters::NullType, message_filters::NullType, message_filters::NullType> > > >::dispose (this=0x148cbe0)
    at /usr/include/boost/smart_ptr/detail/sp_counted_impl.hpp:153
#15 0x000000000040624e in boost::detail::sp_counted_base::release() ()
#16 0x00007fffd725aaf8 in ~shared_count (this=0x14865f8, __in_chrg=<optimized out>) at /usr/include/boost/smart_ptr/detail/shared_count.hpp:371
#17 ~shared_ptr (this=0x14865f0, __in_chrg=<optimized out>) at /usr/include/boost/smart_ptr/shared_ptr.hpp:328
#18 ~PolygonArrayLikelihoodFilter (this=0x1486470, __in_chrg=<optimized out>)
    at /home/m-takeda/catkin_ws/src/jsk-ros-pkg/jsk_recognition/jsk_pcl_ros_utils/include/jsk_pcl_ros_utils/polygon_array_likelihood_filter.h:53
#19 jsk_pcl_ros_utils::PolygonArrayLikelihoodFilter::~PolygonArrayLikelihoodFilter (this=0x1486470, __in_chrg=<optimized out>)
    at /home/m-takeda/catkin_ws/src/jsk-ros-pkg/jsk_recognition/jsk_pcl_ros_utils/include/jsk_pcl_ros_utils/polygon_array_likelihood_filter.h:53
#20 0x00007ffff7bb0831 in void class_loader::ClassLoader::onPluginDeletion<nodelet::Nodelet>(nodelet::Nodelet*) () from /opt/ros/indigo/lib/libnodeletlib.so
#21 0x000000000040624e in boost::detail::sp_counted_base::release() ()
#22 0x00007ffff7baac09 in nodelet::Loader::unload(std::string const&) () from /opt/ros/indigo/lib/libnodeletlib.so
#23 0x00007ffff7bb3efb in nodelet::LoaderROS::unload(std::string const&) () from /opt/ros/indigo/lib/libnodeletlib.so
#24 0x00007ffff77644ed in bond::Bond::flushPendingCallbacks() () from /opt/ros/indigo/lib/libbondcpp.so
#25 0x00007ffff776467b in bond::Bond::onHeartbeatTimeout() () from /opt/ros/indigo/lib/libbondcpp.so
#26 0x00007ffff748a0a0 in ros::TimerManager<ros::WallTime, ros::WallDuration, ros::WallTimerEvent>::TimerQueueCallback::call() () from /opt/ros/indigo/lib/libroscpp.so
#27 0x00007ffff74b3107 in ros::CallbackQueue::callOneCB(ros::CallbackQueue::TLS*) () from /opt/ros/indigo/lib/libroscpp.so
#28 0x00007ffff74b3c53 in ros::CallbackQueue::callAvailable(ros::WallDuration) () from /opt/ros/indigo/lib/libroscpp.so
#29 0x00007ffff74fc175 in ros::SingleThreadedSpinner::spin(ros::CallbackQueue*) () from /opt/ros/indigo/lib/libroscpp.so
#30 0x00007ffff74e3d9b in ros::spin() () from /opt/ros/indigo/lib/libroscpp.so
#31 0x000000000040494e in main ()
furushchev commented 7 years ago

@k-okada @YoheiKakiuchi @mmurooka さん DRCの時に認識でnodeletをよく使っていたと思いますが、この問題は起きていましたでしょうか?(なんとなく起きていた記憶がある気がする)

mmurooka commented 7 years ago

cc @wkentaro

起動時ではなく起動は終わって実行している途中に突然落ちることがあるということでしょうか. DRCのときにもその問題はあって, どうしても解決できないのでstandalone_complexed_nodeletというのを@garaemonさんが作ってそれを使っていました. https://github.com/jsk-ros-pkg/jsk_demos/blob/master/jsk_2015_06_hrp_drc/drc_task_common/launch/fc/valve_recognition.launch がDRCのバルブ認識のlaunchでstandalone_complexed_nodeletを使っています. http://jsk-docs.readthedocs.io/en/latest/jsk_common/doc/jsk_topic_tools/lib/standalone_complexed_nodelet.html にちょっと長いですが普通のnodeletで落ちる理由が書いてあります.

DRCではこのようにして対応していましたが,その後あまり引き継がれていませんし, ベストは普通のnodeletを落ちないようにすることだとは思います.

YoheiKakiuchi commented 7 years ago

DRCではこのようにして対応していましたが,その後あまり引き継がれていませんし, ベストは普通のnodeletを落ちないようにすることだとは思います.

メンテはできていませんが、multisense関連の点群等でnodeletを使っているものはほぼstandaloneになっていますね。 https://github.com/jsk-ros-pkg/jsk_common/blob/master/jsk_tilt_laser/launch/multisense_laser_pipeline.launch https://github.com/jsk-ros-pkg/jsk_robot/blob/master/jsk_robot_common/jsk_robot_startup/launch/multisense_local.launch

YoheiKakiuchi commented 7 years ago

問題は2つあるような気がするが、これは分けられない問題だったのだろうか?

  1. boundに起因する unload/load が起こる
  2. unload/load 時に落ちる

2.が解決すればunload/loadでたまにトピックが途切れるがなんとなく動き続けるようにならないのかな。 あと、@garaemon の文書にあるhartbeatが途切れたと判断する時間を十分に大きくするようにはできないのだろうか。

furushchev commented 7 years ago

@mmurooka @YoheiKakiuchi コメントありがとうございます。 @mmurooka さんに貼っていただいたリンクに書いてあったことを踏まえると、ご指摘の通り問題は2つになりそうです。

hartbeatが途切れたと判断する時間を十分に大きくするようにはできないのだろうか

ソースコードを見ると、nodeletパッケージを再コンパイルすれば可能のようです。 それで一度様子を見てみます。 (デフォルトはタイムアウトが1秒)

unload/load 時に落ちる

こちらはプログラムの何処かで、エラー(リークとか?)が起きて、unloadができなくなったのか、unload処理自体に問題があるのかをまず切り分けてわかりそうな範囲でデバックしていこうと思います。

k-okada commented 5 years ago

こちらはプログラムの何処かで、エラー(リークとか?)が起きて、unloadができなくなったのか、unload処理自体に問題があるのかをまず切り分けて

@furushchev 切り分けてテストコードを作りましょう.