v-lopez commented 2 years ago

Bug report

Required Info:

Operating System:
- Ubuntu 20.04
Installation type:
- binaries
Version or commit hash:
- Foxy 0.10.6-1focal.20210901.033555

DDS implementation:

Fast

Steps to reproduce issue


import launch_testing
import unittest
import os

from launch import LaunchDescription from launch.actions import DeclareLaunchArgument, ExecuteProcess from launch.substitutions import PathJoinSubstitution, LaunchConfiguration from launch_ros.actions import Node from launch.actions import IncludeLaunchDescription from launch.substitutions import PathJoinSubstitution from launch_ros.substitutions import FindPackageShare from launch.launch_description_sources import PythonLaunchDescriptionSource

def generate_test_description():

This is necessary to get unbuffered output from the process under test

proc_env = os.environ.copy()
proc_env['PYTHONUNBUFFERED'] = '1'

dut_process = ExecuteProcess(cmd=['/usr/bin/sleep', '5']) 
pkg_dir = FindPackageShare('gazebo_ros')
full_path = PathJoinSubstitution([pkg_dir] + ['launch', 'gazebo.launch.py'])

gazebo_launch = IncludeLaunchDescription(
    PythonLaunchDescriptionSource(
        full_path))

return LaunchDescription([
    gazebo_launch,
    dut_process,

    launch_testing.actions.ReadyToTest(),
]), {'dut_process': dut_process}

class TestWait(unittest.TestCase):

def test_wait_for_end(self, proc_output, proc_info, dut_process):
    # Wait until process ends
    proc_info.assertWaitForShutdown(process=dut_process, timeout=10)

@ launch_testing.post_shutdown_test() class TestSuccessfulExit(unittest.TestCase):

def test_exit_code(self, proc_info, dut_process):
    # Check that dut_process finishes with code 0
    launch_testing.asserts.assertExitCodes(proc_info, process=dut_process)



Execute it with `launch_test` or as part of `add_launch_test` in CMake.

#### Expected behavior
When it ends, gazebo should be killed.

#### Actual behavior
Gazebo is not killed.

#### Additional information
Happens with other nodes besides gazebo. But gazebo is particularly troublesome because I cannot have 2 tests running sequentially using gazebo, because the first test didn't kill the server, and only one server may be running in a machine.

Might be related to https://github.com/ros2/launch/issues/495 because if instead of `launch_testing` I use `ros2 launch gazebo_ros gazebo.launch.py`, I can kill it properly with Ctrl+C, but not with `kill -SIGINT`. Even if I use `ros2 launch gazebo_ros gazebo.launch.py --noninteractive`
<!-- If you are reporting a bug delete everything below
     If you are requesting a feature deleted everything above this line -->

hidmic commented 2 years ago

Hmm, @v-lopez this looks like Gazebo not forwarding signals properly. Would you mind trying to force launch_testing's LaunchService instance to be non-interactive? If that fixes the issue, we can submit it as a patch.

v-lopez commented 2 years ago

I tried that (and verified that it was not being overwritten to False), and the behavior was the same.

benjinne commented 2 years ago

I have the same problem with a standalone unity executable not being killed, but I can kill all the processes correctly if I Ctrl+C before the test is complete

benjinne commented 2 years ago

My current workaround is to add a post_shutdown_test to kill the remaining processes

@launch_testing.post_shutdown_test()
class TestShutdown(unittest.TestCase):
    def test_kill_sim(self):
        unity_pid = check_output(["pidof", "-z", "unity_executable.x86_64"])
        print('got unity pid')
        unity_pid = int(unity_pid)
        print('unity pid is: ' + str(unity_pid))
        os.kill(unity_pid, signal.SIGKILL)
        print('Killed unity')

I think the reason this is happening is because the executable is ran from a series of launch files, then the last launch file calls a bash script that uses ros2 run unity_executable.x86_64. I can provide more details if needed

jacobperron commented 2 years ago

I'm experiencing this issue with gzserver, gzclient, and rqt. Here's a minimal reproducible example with Gazebo:

import sys
import time
import unittest

import launch
from launch.actions import IncludeLaunchDescription
from launch.launch_description_sources import PythonLaunchDescriptionSource
from launch_ros.substitutions import FindPackageShare
from launch_testing.actions import ReadyToTest
import launch_testing.markers
import pytest

@pytest.mark.launch_test
def generate_test_description():
    return launch.LaunchDescription(
        [
            IncludeLaunchDescription(
                PythonLaunchDescriptionSource([FindPackageShare('gazebo_ros'), '/launch/gazebo.launch.py'])
            ),
            ReadyToTest(),
        ])

class MyTestFixture(unittest.TestCase):

    def test_something(self):
        time.sleep(10.0)
        print('END TEST', file=sys.stderr)

adityapande-1995 commented 2 years ago

We were facing a similar issue in ignition fortress, where launch was not able to kill the gazebo process. As @hidmic mentioned, it was related to Gazebo not forwarding signals. That was fixed using this PR, not sure if that is related : https://github.com/gazebosim/gz-launch/pull/151

adityapande-1995 commented 2 years ago

I'm experiencing this issue with gzserver, gzclient, and rqt. Here's a minimal reproducible example with Gazebo:

import sys
import time
import unittest

import launch
from launch.actions import IncludeLaunchDescription
from launch.launch_description_sources import PythonLaunchDescriptionSource
from launch_ros.substitutions import FindPackageShare
from launch_testing.actions import ReadyToTest
import launch_testing.markers
import pytest

@pytest.mark.launch_test
def generate_test_description():
    return launch.LaunchDescription(
        [
            IncludeLaunchDescription(
                PythonLaunchDescriptionSource([FindPackageShare('gazebo_ros'), '/launch/gazebo.launch.py'])
            ),
            ReadyToTest(),
        ])

class MyTestFixture(unittest.TestCase):

    def test_something(self):
        time.sleep(10.0)
        print('END TEST', file=sys.stderr)

Fix here : https://github.com/ros-simulation/gazebo_ros_pkgs/pull/1376

adityapande-1995 commented 2 years ago

Another minimal example :

import sys
import time
import unittest

import launch
from launch.actions import IncludeLaunchDescription, ExecuteProcess
from launch.launch_description_sources import PythonLaunchDescriptionSource
from launch_ros.substitutions import FindPackageShare
from launch_testing.actions import ReadyToTest
import launch_testing.markers
import pytest

@pytest.mark.launch_test
def generate_test_description():
    return launch.LaunchDescription(
        [
            ExecuteProcess(
                cmd=['python3','test1.py'],
                shell=True,
                output='both'
            ),
            ExecuteProcess(
                cmd=['python3','test2.py'],
                shell=False,
                output='both'
            ),
            ReadyToTest(),
        ])

class MyTestFixture(unittest.TestCase):

    def test_something(self):
        time.sleep(10.0)
        print('END TEST', file=sys.stderr)

Where both test files are just :

import time

while 1 :
    time.sleep(0.5)
    print("Hello 1")

test1.py does not terminate, yet test2.py does.

ivanpauno commented 2 years ago

This is not a problem IMO. When you use shell=True, you're running a shell process, which in this case will run python3 test1.py as a subprocess. Launch is correctly sending a signal to the shell process, but that one is not forwarding it to its subprocess.

You probably can fix that be trapping SIGINT/SIGTERM and killing subprocesses before exiting the shell (e.g. with kill -$SIGNAL $(jobs -p)).

I don't think we can do anything by default in launch that's reasonable, people should be careful when using shell=True.

ivanpauno commented 2 years ago

I have updated the issue title, as I don't think the original one was correct.

ivanpauno commented 2 years ago

This is easy to reproduce from a pure launch file:

ros2 launch ./test.launch.py  &
kill -SIGINT $!

# ./test.launch.py
import launch
from launch.actions import ExecuteProcess

PYTHON_SCRIPT="""\
import time

while 1:
    time.sleep(0.5)
"""

def generate_launch_description():
    return launch.LaunchDescription(
        [
            ExecuteProcess(
                cmd=['python3', '-c', f'"{PYTHON_SCRIPT}"'],
                shell=True,
                output='both'
            ),
            ExecuteProcess(
                cmd=['python3', '-c', f'{PYTHON_SCRIPT}'],
                shell=False,
                output='both'
            ),
        ])

after waiting the launch process to finish (it will take some seconds and log some stuff), it's easy to check that the second process was killed and the first wasn't (e.g. using ps -elf).

Note:

The issue isn't noticed if instead of using kill -SIGINT $! the user presses ctrl+c. That's because pressing ctrl+c will send a SIGINT signal to all the processes of the group (all the processes launch from the launch file).
Currently if SIGTERM is sent, instead of SIGINT, no process is being killed. That seems like a separate issue that should be handled as well.
We could maybe try to send a signal to the whole group (in the platforms that support this), or to find all the children of the launch process recursively (e.g. with psutil) to solve the problem.

I will give it a try to the last option, but I'm not sure how multiplatform that's going to be.

cc @jacobperron

ivanpauno commented 2 years ago

I have opened https://github.com/ros2/launch/pull/632, which will fix the issue.

ros-discourse commented 1 year ago

This issue has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/sequentially-starting-launch-files-using-ros-2-python-when-a-specific-log-message-is-detected/30633/3

ros2 / launch

launch not killing processes started with shell=True #545

Bug report

Steps to reproduce issue

This is necessary to get unbuffered output from the process under test