Open craigh92 opened 4 years ago
As far as I can tell, there is no way of doing this through the rclpy, rmw, or rclcpp API, so changes may need to be made in other ros2 projects to make this possible.
I moved this issue to the design repo because it seems like it would be beneficial to write a design document first. This feature could touch a lot of things.
ros1 had the ability to kill a node from the command line using rosnode kill
, or kill all nodes using rosnode kill -a. This would end the process running each ros node. It would be usefull to have this feature in ros2.
ROS 1 also only had one node per process. There are a few more cases to consider in ROS 2:
I moved this issue to the design repo because it seems like it would be beneficial to write a design document first. This feature could touch a lot of things.
I'm guessing I just follow these instructions to do that? I will try and convince the powers that be at my workplace that this is worth the effort (we are using ros2 in our project), and if successful I will brainstorm with a few other ros users at my workplace and then put this together. If not I may be able to do this in my own time but I will be considerably slower.
What to do if two nodes are manually composed in a process? It would be undesirable to kill one node and have the whole process die.
How about instead of ros2 node kill
, we add a new command to the CLI ros2 process
, and one of the verbs for this command is ros2 process kill
. If users want individual control over a node in a process with multiple nodes they will have to use the ManagedNode
and lifecycle
tools. As I mentioned earlier this is not yet available in the python API but I believe they will eventually come.
This utility then just serves as a simpler way to manage the processes without the boilerplate of using a ManagedNode
(and for the cases where you are not using a ManagedNode
).
What if the node was dynamically composed in a process? It would be undesirable to kill the container and have every node inside of it die.
Sometimes this is desirable. For example, if you encounter a bug and just need to restart the process. If using launch files this is currently difficult as you do not have an easy handle to each process. Again, if the user wants more fine-grained control they should use a ManagedNode
.
What if a single node is in a single process, but that process is on a different machine?
For ros2 process kill
, then the intuitive behaviour would be for it to only work on the machine running the process. Perhaps an optional argument could be added to find and kill remote processes, but this could be a separate feature and out of scope for this design.
I can see value in a simpler ros2 process kill
tool, which is clearly meant to support debugging.
@sloretz Do you think ros2 process kill
requires a design document? Or since it is smaller scope should I just reopen a new issue in the ros2/rcli
repo, and close this issue?
I think that directly managing system processes via ros2
is a dangerous game. Trying to interact with running nodes via the ros2 node
command is more reasonable, but probably when you're wanting to kill a node it's already not responding to anything over the ROS interfaces, and unfortunately ros2 node
doesn't know anything about the PID of the node.
There are two situations, I think:
ros2 run
In the case of 1., you have direct access to the process that started the node via the shell. It should be trivial to force it to stop, so I don't think a new command is needed in this situation.
Number 2 is the situation that is relevant to this discussion.
I think that this situation is best resolved by making the launch system more capable for managing running systems, not just starting them. This is something I've wanted to do for quite a while: build a full-fledged system orchestrator that provides a runtime interface to managing a complete system, or even individual sub-systems, with ros2 launch
becoming the interface for management commands. Most of that is beyond the scope of this issue, probably, but I can imagine that one of those commands could be to kill a container.
Currently if you use Python-based launch scripts, rather than the XML interface, you could build a way into your launch scripts to let you respond to keyboard events that kill individual nodes. It would be very custom and probably very hacky, but it would work.
Number 2 is the situation that is relevant to this discussion.
Yep. This is what prompted me to make this issue. I would like to be able to start ros2 launch
in the background and not have to keep hold of it's PID in my bash session. Also, sending a SIGINT to the launch process sometimes killed all of the nodes, and sometimes didn't. It just wasn't a very friendly way of process orchestration.
I think that this situation is best resolved by making the launch system more capable for managing running systems, not just starting them.
I agree. Is it possible for the launch system to add services to the nodes that it launches? If it is, one way of doing this could be that every node launched with ros2 launch
has a service added to it that calls self.executor.shutdown
when triggered. The launch command could also start an extra node that hosts a service that can shut down all nodes (by calling the aforementioned service of each). This behavior could be optionally added using a flag, e.g ros2 launch --killable
.
I think that this situation is best resolved by making the launch system more capable for managing running systems, not just starting them. This is something I've wanted to do for quite a while: build a full-fledged system orchestrator that provides a runtime interface to managing a complete system, or even individual sub-systems, with
ros2 launch
becoming the interface for management commands.
While I agree with the general desire, I'll strongly recommend looking into accomplishing this by hooking into existing process monitoring mechanisms (namely systemd on Linux, launchd on macOS, and whatever Windows uses for services). There are a lot of corner cases here, and it would be a real shame to essentially reimplement those system services in launch
again.
That is the way I would go about it, yes.
I think that directly managing system processes via
ros2
is a dangerous game. Trying to interact with running nodes via theros2 node
command is more reasonable, but probably when you're wanting to kill a node it's already not responding to anything over the ROS interfaces, and unfortunatelyros2 node
doesn't know anything about the PID of the node.There are two situations, I think:
- Nodes launched via
ros2 run
- Nodes launched via the launch system
In the case of 1., you have direct access to the process that started the node via the shell. It should be trivial to force it to stop, so I don't think a new command is needed in this situation.
Number 2 is the situation that is relevant to this discussion.
I think that this situation is best resolved by making the launch system more capable for managing running systems, not just starting them. This is something I've wanted to do for quite a while: build a full-fledged system orchestrator that provides a runtime interface to managing a complete system, or even individual sub-systems, with
ros2 launch
becoming the interface for management commands. Most of that is beyond the scope of this issue, probably, but I can imagine that one of those commands could be to kill a container.Currently if you use Python-based launch scripts, rather than the XML interface, you could build a way into your launch scripts to let you respond to keyboard events that kill individual nodes. It would be very custom and probably very hacky, but it would work.
Hi, sorry if I sound like a noob. I'm new to this stuff. For the 1st case, what is the recommended way to kill such node (started with ros2 run). Should I use subprocess popen to get pid of the initiated thread and kill it subsequently? If so, can someone give example for killing a ROS2 executable (single node) started with such method?
Thanks in advance.
the use cases that i think of,
and to support the 1st one to control each node instance, i think eventually we need to have something similar with ManagedNode
to stop/start the node.
For the 1st case, what is the recommended way to kill such node (started with ros2 run). Should I use subprocess popen to get pid of the initiated thread and kill it subsequently?
i am not sure if i understand the question, how about the followng?
root@12956542b338:~/ros2_ws/colcon_ws# ros2 run demo_nodes_cpp talker
[INFO] [1627068976.974343304] [talker]: Publishing: 'Hello World: 1'
[INFO] [1627068977.974083996] [talker]: Publishing: 'Hello World: 2'
...
[ros2run]: Terminated
root@12956542b338:~/ros2_ws/colcon_ws# ps -ef | grep talker
root 5134 5066 0 12:36 pts/3 00:00:00 /usr/bin/python3 /root/ros2_ws/colcon_ws/install/ros2cli/bin/ros2 run demo_nodes_cpp talker
root 5135 5134 0 12:36 pts/3 00:00:00 /root/ros2_ws/colcon_ws/install/demo_nodes_cpp/lib/demo_nodes_cpp/talker
root@12956542b338:~/ros2_ws/colcon_ws# kill 5135
Sorry @fujitatomoya for the late reply. Thanks for sharing your solution. I found a similar solution which uses the built-in pidof function to get the pid of talker and then kill the process using the kill command. I was able to make this work using 'popen function'
c++ code reference : stackoverflow answer
std::string exec(const char* cmd)
{
std::array<char, 128> buffer;
std::string result;
auto pipe = popen(cmd, "r"); // get rid of shared_ptr
if (!pipe) throw std::runtime_error("popen() failed!");
while (!feof(pipe)) {
if (fgets(buffer.data(), 128, pipe) != nullptr)
result += buffer.data();
}
auto rc = pclose(pipe);
if (rc == EXIT_SUCCESS) { // == 0
}
else if (rc == EXIT_FAILURE) { // EXIT_FAILURE is not used by all programs, maybe needs some adaptation.
}
return result;
}
I was wondering the performance comparison between this approach and a ROS2 lifecycle node. From my understanding, from efficiency perspective, the overhead of killing a process and associated resources is much greater compared to how ROS2 manages states of lifecycle node. Please correct if I'm wrong in my assumptions.
Thanks in advance.
@suryajayaraman
ROS 2 Managed Node is not to manage the process, but ROS 2 Node. Process space can be composed with multiple ROS 2 Node.
@suryajayaraman
ROS 2 Managed Node is not to manage the process, but ROS 2 Node. Process space can be composed with multiple ROS 2 Node.
Sorry, I jumped the gun. Now I get your point. Thanks for the clarifiation
Hi, @craigh92 and @gbiggs and @suryajayaraman . I too see desire for a ros 2 process kill
node for use cases. IMO lifecycle nodes are not a beginner level topic and for that reason maybe have not yet translated to some projects. Also regarding:
There are two situations, I think:
- Nodes launched via
ros2 run
- Nodes launched via the launch system
I have actually found a use case for #1 . Some nodes when launched create PID's outside of the most recent terminal command. For example:
ros2 runimage_view video_recorder
echo $!
The returned PID number, is one of several processes spun up by this exe. And therefore a simple o.s. level:
kill $!
does not capture required pid's and opens up for additional complexities.
I would like to add my two cent here.
Killing the process through ps | grep ros
or other generic process searching method is not ideal. This involves the risk of killing a wrong process that happen to have ros in the middle of its name.
My argument of having the kill feature in ros2 command is :
ros2 node list
, the same method should be able to find out the PID info about it, or if the node doesn't belong to localhost. Killing the process through ps | grep ros or other generic process searching method is not ideal.
no, it is not. i just showed what we can do at this moment, https://github.com/ros2/design/issues/287#issuecomment-885872263
With the PID, the kill feature could be easily added.
we can do this, but that comes back to the question https://github.com/ros2/design/issues/287#issuecomment-649731329
i think at lease user understands the which nodes are going to be destroyed if we can kill the process ID.
i think at lease user understands the which nodes are going to be destroyed if we can kill the process ID. Agree, it does have these question to be answered before implementing the
ros2 node kill
I think a quick alternative is to provided the PID s of each node. User can using these to figure out what to do if nodes are in the same process (same PID), node on other machine: no PID or PID with hostname.
you are welcome to create PR based on that, probably we might need to add rcl
and rmw
interfaces for doing that.
besides that, i believe PID
is not really unique identification.
for example, what if we have the container running with the different pid
namespace from host system? there would be the same PID
but different process space. (Similar problem actually happened before with DDS GUID, now it has been addressed with adding random number field.)
i am more inclined to consider the user experience with ros2 node kill
with <node name>
or/with --all
, --dry-run
--force
?
showing PID
and user is responsible to figure out where to log in, which nodes are affected... does not look useful and does not scale as distributed system.
that is just my opinion, i would like to hear more.
you are welcome to create PR based on that, probably we might need to add
rcl
andrmw
interfaces for doing that.
I think we should come to agreement here in the design before we start opening pull requests. Otherwise we may end up with only partial solutions.
In the case of 1., you have direct access to the process that started the node via the shell. It should be trivial to force it to stop, so I don't think a new command is needed in this situation.
I don't see any other approach than PID to kill the node. I recently encountered the issue... for example, I use docker and don't want to open a new instance (terminal) each time I want to run the node. Instead, I prefer to use the same terminal to run multiple nodes (can't use launch because of some reasons) or even to check the other things while a node is running in the background and node doesn't have a launch file associated with it.
If I run node using a command like this ros2 run turtlesim turtlesim_node &
(note the &
in the end)
I can't ctrl+c it as it is no longer running in the current terminal after the starting and I am using the same terminal to observe the data from the process.
I can't ctrl+c it as it is no longer running in the current terminal after the starting and I am using the same terminal to observe the data from the process.
That shouldn't be the case. Putting an &
at the end of a command just runs it in the background; it is still attached to the terminal. You should be able to run fg
to bring it to the foreground, then kill it.
I don't see any other approach than PID to kill the node.
The big problem is that in ROS 2 a node does not equal a PID. You can have many nodes inside of a process, so running ros2 node kill
, in my opinion, should not kill them all.
You should be able to run
fg
to bring it to the foreground, then kill it.
Ah, didn't know fg
. Thank you for the info.
in my opinion, should not kill them all
Yes, it shouldn't.
But we do need ros2 node kill
in some sense. Maybe it can be started with the killing PID and if some new idea comes up, it can be replaced later.
Feature request
Feature description
ros1 had the ability to kill a node from the command line using
rosnode kill <node_name>
, or kill all nodes usingrosnode kill -a
. This would end the process running each ros node. It would be usefull to have this feature in ros2.We currently have
ros2 lifecycle set
which can be used for a similar purpose, but this only works forManagedNodes
which is currently a c++ only feature. Even when this feature does come to python, not all nodes will be managed nodes so it makes sence to have a simpler kill utility that doesn't come with the complexity of the lifecycle states.Implementation considerations
As far as I can tell, there is no way of doing this through the rclpy, rmw, or rclcpp API, so changes may need to be made in other ros2 projects to make this possible.
As a test I created new workspace
ros2kill_ws
with two packages,test_py_pkg
andtest_cpp_pkg
, each containing a minimal publisher written in the corresponding language. I then created a python launch file that launches these nodes, and launched it withros2 launch -a launch/launch.py
. After investigating withhtop
, I found this created 3 new processes:If the nodes are instead started with
ros2 run
, there is no third process that hangs around, and they do not have the --ros-args argument passed to them.So maybe it would be possible to kill the nodes with the following method:
ros2 pkg prefix <pkg_name>
to get the path to where the package isros2 node list -t
to get the currently running nodes and the name of the executable for that node-t
option doesn't exist yet, but could be added)ros2 pkg executables
to get the executable nameskillall
command on linux, ortaskkill
command on windows.ros2 node kill
would then trigger this trough asys
call.