ros2 / rmw_fastrtps

Implementation of the ROS Middleware (rmw) Interface using eProsima's Fast RTPS.
Apache License 2.0
157 stars 117 forks source link

Discovery Server: node.get_service_names_and_types() behaving differently #639

Closed kledom closed 2 years ago

kledom commented 2 years ago

Bug report

Required Info:

I've encountered a problem when using rosbridge_suite in combination with Fast DDS Discovery Server. You can find the corresponding bug report at https://github.com/RobotWebTools/rosbridge_suite/issues/791. As I believe the underlying problem is more likely to be with the Discovery Server, I decided to move the discussion to this repository.

Steps to reproduce issue

Run the following lines in 3 different terminals:

Terminal 1

$ fastdds discovery -i 0 -p 11811
### Server is running ###
  Participant Type:   SERVER
  Server ID:          0
  Server GUID prefix: 44.53.00.5f.45.50.52.4f.53.49.4d.41
  Server Addresses:   UDPv4:[0.0.0.0]:1181

Terminal 2

$ export ROS_DISCOVERY_SERVER=127.0.0.1:11811
$ ros2 run demo_nodes_cpp add_two_ints_server

Terminal 3

Run the following node (modified version of add_two_ints_client):

from example_interfaces.srv import AddTwoInts
import rclpy

def main(args=None):
    rclpy.init(args=args)

    node = rclpy.create_node('get_service_names_and_types')
    result = node.get_service_names_and_types()
    node.get_logger().info(f"{result}")

    node.destroy_node()
    rclpy.try_shutdown()

if __name__ == '__main__':
    main()
$ export ROS_DISCOVERY_SERVER=127.0.0.1:11811
$ ros2 run demo_nodes_py get_service_names_and_types

Expected behavior

Terminal 3:

[INFO] [1664462944.015011255] [get_service_names_and_types]:
[
('/add_two_ints', ['example_interfaces/srv/AddTwoInts']),
('/add_two_ints_server/describe_parameters', ['rcl_interfaces/srv/DescribeParameters']),
('/add_two_ints_server/get_parameter_types', ['rcl_interfaces/srv/GetParameterTypes']),
('/add_two_ints_server/get_parameters', ['rcl_interfaces/srv/GetParameters']),
('/add_two_ints_server/list_parameters', ['rcl_interfaces/srv/ListParameters']),
('/add_two_ints_server/set_parameters', ['rcl_interfaces/srv/SetParameters']),
('/add_two_ints_server/set_parameters_atomically', ['rcl_interfaces/srv/SetParametersAtomically']),
('/get_service_names_and_types/describe_parameters', ['rcl_interfaces/srv/DescribeParameters']),
('/get_service_names_and_types/get_parameter_types', ['rcl_interfaces/srv/GetParameterTypes']),
('/get_service_names_and_types/get_parameters', ['rcl_interfaces/srv/GetParameters']),
('/get_service_names_and_types/list_parameters', ['rcl_interfaces/srv/ListParameters']),
('/get_service_names_and_types/set_parameters', ['rcl_interfaces/srv/SetParameters']),
('/get_service_names_and_types/set_parameters_atomically', ['rcl_interfaces/srv/SetParametersAtomically'])]

Actual behavior

Terminal 3:

[INFO] [1664462815.719237027] [get_service_names_and_types]:
[
('/get_service_names_and_types/describe_parameters', ['rcl_interfaces/srv/DescribeParameters']),
('/get_service_names_and_types/get_parameter_types', ['rcl_interfaces/srv/GetParameterTypes']),
('/get_service_names_and_types/get_parameters', ['rcl_interfaces/srv/GetParameters']),
('/get_service_names_and_types/list_parameters', ['rcl_interfaces/srv/ListParameters']),
('/get_service_names_and_types/set_parameters', ['rcl_interfaces/srv/SetParameters']),
('/get_service_names_and_types/set_parameters_atomically', ['rcl_interfaces/srv/SetParametersAtomically'])
]

Additional information

I'm not quite sure which of the behaviors is correct. The documentation implies that only the node's services shall be returned.

EduPonz commented 2 years ago

Hi @kledom ,

I do not see how could this be related with Discovery Server, as that is just a different discovery protocol for DDS, but in the end the ROS 2 layer discovers the same DDS entities, so I guess the problem may lay there. Do you see the same behaviour when using the out-of-the-box discovery?

kledom commented 2 years ago

Hi @EduPonz ,

the output with the out-of-the-box discovery is what I assumed to be the 'expected behavior'. So this must be somehow related to the discovery protocol I guess.

Without setting the ROS_DISCOVERY_SERVER variable:

Terminal 1

$ ros2 run demo_nodes_cpp add_two_ints_server

Terminal 2

(modified version of add_two_ints_client):

from example_interfaces.srv import AddTwoInts
import rclpy

def main(args=None):
    rclpy.init(args=args)

    node = rclpy.create_node('get_service_names_and_types')
    result = node.get_service_names_and_types()
    node.get_logger().info(f"{result}")

    node.destroy_node()
    rclpy.try_shutdown()

if __name__ == '__main__':
    main()
$ ros2 run demo_nodes_py get_service_names_and_types

Output

[INFO] [1664481307.348203733] [get_service_names_and_types]:
[
('/add_two_ints', ['example_interfaces/srv/AddTwoInts']),
('/add_two_ints_server/describe_parameters', ['rcl_interfaces/srv/DescribeParameters']), 
('/add_two_ints_server/get_parameter_types', ['rcl_interfaces/srv/GetParameterTypes']), 
('/add_two_ints_server/get_parameters', ['rcl_interfaces/srv/GetParameters']),
('/add_two_ints_server/list_parameters', ['rcl_interfaces/srv/ListParameters']),
('/add_two_ints_server/set_parameters', ['rcl_interfaces/srv/SetParameters']),
('/add_two_ints_server/set_parameters_atomically', ['rcl_interfaces/srv/SetParametersAtomically']),
('/get_service_names_and_types/describe_parameters', ['rcl_interfaces/srv/DescribeParameters']),
('/get_service_names_and_types/get_parameter_types', ['rcl_interfaces/srv/GetParameterTypes']),
('/get_service_names_and_types/get_parameters', ['rcl_interfaces/srv/GetParameters']),
('/get_service_names_and_types/list_parameters', ['rcl_interfaces/srv/ListParameters']),
('/get_service_names_and_types/set_parameters', ['rcl_interfaces/srv/SetParameters']),
('/get_service_names_and_types/set_parameters_atomically', ['rcl_interfaces/srv/SetParametersAtomically'])
]
clalancette commented 2 years ago

    node = rclpy.create_node('get_service_names_and_types')
    result = node.get_service_names_and_types()
    node.get_logger().info(f"{result}")

    node.destroy_node()
    rclpy.try_shutdown()

For what it is worth, this sequence is never guaranteed to work (in either Python or C++, with or without the Discovery Server).

The problem is that discovery happens asynchronously, and so it may be the case that your node starts up, calls get_service_names_and_types, and quits before discovery has ever happened. (This, incidentally, is the reason we have a daemon running for the ros2 command-line tools. The daemon is always listening and doing discovery, and when the command-line tools run they actually ask the daemon for the list of services.)

If you wait around and call rclpy.spin() for a little while before calling get_service_names_and_types(), does it ever return the list in the discovery server case?

kledom commented 2 years ago

Hi @clalancette,

This is because I have tried to keep my example as minimal as possible. With rosbridge_suite, where I encountered the problem for the first time, the error still occurs after several minutes.

Same result when waiting:

import threading

import rclpy

def main(args=None):
    rclpy.init(args=args)

    node = rclpy.create_node('get_service_names_and_types')

    # Spin in a separate thread
    thread = threading.Thread(target=rclpy.spin, args=(node, ), daemon=True)
    thread.start()

    rate = node.create_rate(100)
    for i in range(500):
        rate.sleep()

    result = node.get_service_names_and_types()
    node.get_logger().info(f"{result}")

    node.destroy_node()
    rclpy.try_shutdown()
    thread.join()

if __name__ == '__main__':
    main()

Output:

[INFO] [1664484402.768147980] [get_service_names_and_types]:
[
('/get_service_names_and_types/describe_parameters', ['rcl_interfaces/srv/DescribeParameters']),
('/get_service_names_and_types/get_parameter_types', ['rcl_interfaces/srv/GetParameterTypes']),
('/get_service_names_and_types/get_parameters', ['rcl_interfaces/srv/GetParameters']),
('/get_service_names_and_types/list_parameters', ['rcl_interfaces/srv/ListParameters']),
('/get_service_names_and_types/set_parameters', ['rcl_interfaces/srv/SetParameters']),
('/get_service_names_and_types/set_parameters_atomically', ['rcl_interfaces/srv/SetParametersAtomically'])
]

Please note that, contrary to the documentation, node.get_service_names_and_types() with out-of-the-box discovery returns any service running, not just the services related to the node. So maybe the error is in the default discovery. However, rosbridge_suite seems to depend on it to work properly.

kledom commented 2 years ago

@EduPonz Is this problem now considered a bug of FastDDS or should I create another issue in the ros2 repo?

kledom commented 2 years ago

Using the SUPER_CLIENT discovery protocol fixes the issue:

<?xml version="1.0" encoding="UTF-8"?>
<profiles xmlns="http://www.eprosima.com/XMLSchemas/fastRTPS_Profiles">
    <participant profile_name="super_client_profile" is_default_profile="true">
        <rtps>
            <builtin>
                <discovery_config>
                    <discoveryProtocol>SUPER_CLIENT</discoveryProtocol>
                    <discoveryServersList>
                        <RemoteServer prefix="44.53.00.5f.45.50.52.4f.53.49.4d.41">
                            <metatrafficUnicastLocatorList>
                                <locator>
                                    <udpv4>
                                        <address>127.0.0.1</address>
                                        <port>11811</port>
                                    </udpv4>
                                </locator>
                            </metatrafficUnicastLocatorList>
                        </RemoteServer>
                    </discoveryServersList>
                </discovery_config>
            </builtin>
        </rtps>
    </participant>
</profiles>

This somehow makes sense considering the filter feature of discovery server v2. However, I don't know if an optimisation that breaks the existing API makes so much sense. Also, I'm not quite sure about the side effects of running the node as a super client?