osrf / mbari_wec_gz

Simulation of wave energy harvesting buoy
Apache License 2.0
18 stars 2 forks source link

fix batch loop to exit gracefully on sigint #149

Closed andermi closed 1 year ago

andermi commented 1 year ago

Should now be able to control-c once when batching and have the sim exit gracefully!

hamilton8415 commented 1 year ago

For me this is a little too good at quitting, see below for what it's doing, without me hitting Ctrl-C...

[mbari_wec_batch-1] [ruby $(which gz) sim-17] #10 Object "/home/hamilton/mbari_wec_ws/install/buoy_gazebo/lib/libNoOpController.so", at 0x7f508022336b, in void gnu_cxx::new_allocator::construct<rclcpp::Node, std::cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::cxx11::basic_string<char, std::char_traits, std::allocator > const&>(rclcpp::Node*, std::cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::cxx11::basic_string<char, std::char_traits, std::allocator > const&) [mbari_wec_batch-1] [ruby $(which gz) sim-17] #9 Object "/opt/ros/humble/lib/librclcpp.so", at 0x7f508048ffd6, in rclcpp::Node::Node(std::cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::cxx11::basic_string<char, std::char_traits, std::allocator > const&, rclcpp::NodeOptions const&) [mbari_wec_batch-1] [ruby $(which gz) sim-17] #8 Object "/opt/ros/humble/lib/librclcpp.so", at 0x7f50804982f0, in rclcpp::node_interfaces::NodeBase::NodeBase(std::cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::shared_ptr, rcl_node_options_s const&, bool, bool) [mbari_wec_batch-1] [ruby $(which gz) sim-17] #7 Object "/opt/ros/humble/lib/librclcpp.so", at 0x7f5080471838, in rclcpp::exceptions::throw_from_rcl_error(int, std::cxx11::basic_string<char, std::char_traits, std::allocator > const&, rcutils_error_state_s const, void ()()) [mbari_wec_batch-1] [ruby $(which gz) sim-17] #6 Object "/lib/x86_64-linux-gnu/libstdc++.so.6", at 0x7f50b2eae23d, in std::rethrow_exception(std::__exception_ptr::exception_ptr) [mbari_wec_batch-1] [ruby $(which gz) sim-17] #5 Object "/lib/x86_64-linux-gnu/libstdc++.so.6", at 0x7f50b2eae2b6, in std::terminate() [mbari_wec_batch-1] [ruby $(which gz) sim-17] #4 Object "/lib/x86_64-linux-gnu/libstdc++.so.6", at 0x7f50b2eae24b, in [mbari_wec_batch-1] [ruby $(which gz) sim-17] #3 Object "/lib/x86_64-linux-gnu/libstdc++.so.6", at 0x7f50b2ea2bbd, in [mbari_wec_batch-1] [ruby $(which gz) sim-17] #2 Object "/lib/x86_64-linux-gnu/libc.so.6", at 0x7f50b76287f2, in abort [mbari_wec_batch-1] [ruby $(which gz) sim-17] #1 Object "/lib/x86_64-linux-gnu/libc.so.6", at 0x7f50b7642475, in raise [mbari_wec_batch-1] [ruby $(which gz) sim-17] #0 Object "/lib/x86_64-linux-gnu/libc.so.6", at 0x7f50b7696a7c, in pthread_kill [mbari_wec_batch-1] [ruby $(which gz) sim-17] Aborted (Signal sent by tkill() 2862466 11967) [mbari_wec_batch-1] [ERROR] [robot_state_publisher-20]: process has died [pid 2862463, exit code -6, cmd '/opt/ros/humble/lib/robot_state_publisher/robot_state_publisher --ros-args -r __node:=robot_state_publisher --params-file /tmp/launch_params_l59darmb --params-file /tmp/launch_params_nmkm6df1']. ^C[WARNING] [launch]: user interrupted with ctrl-c (SIGINT) [mbari_wec_batch-1] [WARNING] [launch]: user interrupted with ctrl-c (SIGINT)

andermi commented 1 year ago

A crash in the noop? That's.... different

andermi commented 1 year ago

I can't reproduce your crash... could you try sudo apt upgrade, sudo apt dist-upgrade, reboot, rebuild? Also, ensure you're on main for mbari_wec_utils and make sure you've checked out commit 7c62a1c in ros_gz: git checkout 7c62a1c

hamilton8415 commented 1 year ago

Hmm, I checked those items but problems persist, we may have to look at this together...

andermi commented 1 year ago

Is this the only branch that fails for you?

And just checking all the boxes, did you do an update before the upgrade (can throw a dist-upgrade in there for good measure too)?

$ sudo apt update
$ sudo apt upgrade
$ sudo apt dist-upgrade
$ sudo reboot

I can be in the office this afternoon, if you are available.

andermi commented 1 year ago

@quarkytale can you see if you get the same crash as @hamilton8415 ?

quarkytale commented 1 year ago

Tried this branch on docker with other repos updated and ros_gz on 7c62a1c, and ran the batch launch file. Apart from sim_pblog failing with:

[mbari_wec_batch-1] [sim_pblog-14]     from tf_transformations import euler_from_quaternion
[mbari_wec_batch-1] [sim_pblog-14] ModuleNotFoundError: No module named 'tf_transformations'
[mbari_wec_batch-1] [ERROR] [sim_pblog-14]: process has died [pid 5957, exit code 1, cmd '/home/developer/mbari_wec_ws/install/sim_pblog/lib/sim_pblog/sim_pblog --loghome batch_results_20230607113627 --logdir results_run_2_20230607113709/pblog --ros-args -r __node:=sim_pblog --params-file /home/developer/mbari_wec_ws/install/sim_pblog/share/sim_pblog/config/sim_pblog.yaml'].

(which is weird all rosdeps are installed)

I am not getting any crashes with the batch run, sim run 33 and still counting

[mbari_wec_batch-1] Sim run [33] for 5.0 seconds: door state='closed', scale factor=0.5, battery state=0.5, mean piston position=0.9, IncidentWaveSpectrumType=Bretschneider;Hs:3.0;Tp:14.0
andermi commented 1 year ago

@quarkytale @hamilton8415 I merged this into my other PR #150 and have some more fixes there. Please try that one