Senderport Subscriber Overflow

jeising commented 4 years ago

Description

While running ros2 run demo_nodes_cpp talker and ros2 run demo_nodes_py listner, RouDi terminates ungracefully

Error message

ICEORYX error! POSH__SENDERPORT_SUBSCRIBER_LIST_OVERFLOW
RouDi: /opt/iceoryx/src/iceoryx/iceoryx_utils/source/error_handling/error_handling.cpp:53: static void iox::ErrorHandler::ReactOnErrorLevel(iox::ErrorLevel, const char*): Assertion `false' failed.
Aborted (core dumped)

How to reproduce

Versions & Setup

Using osrf/ros:eloquent-desktop with git clone --branch v0.16.0 https://github.com/eclipse/iceoryx.git src/iceoryx git clone --branch v0.16.0 https://github.com/ros2/rmw_iceoryx.git src/rmw_iceoryx

The docker container gets 2 GB of shared memory.

For all terminals

source "/opt/ros/$ROS_DISTRO/setup.bash"
source "iceoryx/install/local_setup.bash"

export RMW_IMPLEMENTATION=rmw_iceoryx_cpp

Terminal 1

RouDi

Terminal 2

ros2 run demo_nodes_cpp talker

Output of this command is:

[Warning]: Cache size for subscribe too large 1000, limiting to MAX_RECEIVER_QUEUE_SIZE = 256

Terminal 3

ros2 run demo_nodes_py listner

Error

After a few transmissions, which work as expected, RouDi prints the mentioned error message and terminates. It does not stop the connected processes.

[ Error ]: ICEORYX error! POSH__SENDERPORT_SUBSCRIBER_LIST_OVERFLOW
RouDi: /opt/iceoryx/src/iceoryx/iceoryx_utils/source/error_handling/error_handling.cpp:53: static void iox::ErrorHandler::ReactOnErrorLevel(iox::ErrorLevel, const char*): Assertion `false' failed.
Aborted (core dumped)

RouDi in verbose mode

Starting the cpp process makes RouDi print

2020-02-05 13:55:51.898 [ Debug ]: Registered new application /talker_11724
2020-02-05 13:55:51.900 [ Debug ]: Created new ApplicationPort for application /talker_11724
2020-02-05 13:55:51.900 [ Debug ]: Created new ReceiverPortImpl for application /talker_11724
2020-02-05 13:55:51.902 [ Debug ]: Created new ReceiverPortImpl for application /talker_11724
2020-02-05 13:55:51.902 [ Debug ]: Created new runnable /talker for application /talker_11724
2020-02-05 13:55:51.903 [ Debug ]: Created new SenderPortImpl for application /talker_11724
2020-02-05 13:55:51.904 [ Debug ]: Created new SenderPortImpl for application /talker_11724
2020-02-05 13:55:51.905 [ Debug ]: Created new ReceiverPortImpl for application /talker_11724
2020-02-05 13:55:51.906 [ Debug ]: Created new SenderPortImpl for application /talker_11724

From start of the python process RouDi prints below text

2020-02-05 13:57:38.166 [ Debug ]: Registered new application /python3_12438                                                                                                                                                                
2020-02-05 13:57:38.167 [ Debug ]: Created new ApplicationPort for application /python3_12438                                                                                                                                               
2020-02-05 13:57:38.168 [ Debug ]: Created new ReceiverPortImpl for application /python3_12438                                                                                                                                              
2020-02-05 13:57:38.168 [ Debug ]: Created new runnable /listener for application /python3_12438                                                                                                                                            
2020-02-05 13:57:38.168 [Warning]: Multiple sender ports with same CaPro ID currently not supported                                                                                                                                         
 rcl_interfaces/msg/Log /rosout data                                                                                                                                                                                                        
2020-02-05 13:57:38.168 [ Debug ]: Created new SenderPortImpl for application /python3_12438                                                                                                                                                
2020-02-05 13:57:38.194 [Warning]: Multiple sender ports with same CaPro ID currently not supported                                                                                                                                         
 rcl_interfaces/msg/ParameterEvent /parameter_events data                                                                                                                                                                                   
2020-02-05 13:57:38.194 [ Debug ]: Created new SenderPortImpl for application /python3_12438                                                                                                                                                
2020-02-05 13:57:38.196 [ Debug ]: Created new ReceiverPortImpl for application /python3_12438                                                                                                                                              
2020-02-05 13:57:38.196 [ Debug ]: Created new ReceiverPortImpl for application /python3_12438                                                                                                                                              
2020-02-05 13:57:38.245 [ Debug ]: Created new ReceiverPortImpl for application /python3_12438                                                                                                                                              
2020-02-05 13:57:38.346 [ Debug ]: Created new ReceiverPortImpl for application /python3_12438
2020-02-05 13:57:38.447 [ Debug ]: Created new ReceiverPortImpl for application /python3_12438
2020-02-05 13:57:38.548 [ Debug ]: Created new ReceiverPortImpl for application /python3_12438
2020-02-05 13:57:38.649 [ Debug ]: Created new ReceiverPortImpl for application /python3_12438
2020-02-05 13:57:38.750 [ Debug ]: Created new ReceiverPortImpl for application /python3_12438
2020-02-05 13:57:38.851 [ Debug ]: Created new ReceiverPortImpl for application /python3_12438
2020-02-05 13:57:38.984 [ Debug ]: Created new ReceiverPortImpl for application /python3_12438
...
2020-02-05 13:58:02.904 [ Debug ]: Created new ReceiverPortImpl for application /python3_12438
2020-02-05 13:58:03.007 [ Debug ]: Created new ReceiverPortImpl for application /python3_12438
2020-02-05 13:58:03.106 [ Debug ]: Created new ReceiverPortImpl for application /python3_12438
2020-02-05 13:58:03.207 [ Debug ]: Created new ReceiverPortImpl for application /python3_12438
2020-02-05 13:58:03.308 [ Debug ]: Created new ReceiverPortImpl for application /python3_12438
2020-02-05 13:58:03.410 [ Debug ]: Created new ReceiverPortImpl for application /python3_12438
2020-02-05 13:58:03.511 [ Debug ]: Created new ReceiverPortImpl for application /python3_12438
2020-02-05 13:58:03.612 [ Debug ]: Created new ReceiverPortImpl for application /python3_12438
2020-02-05 13:58:03.713 [ Debug ]: Created new ReceiverPortImpl for application /python3_12438
2020-02-05 13:58:03.814 [ Debug ]: Created new ReceiverPortImpl for application /python3_12438
2020-02-05 13:58:03.917 [ Debug ]: Created new ReceiverPortImpl for application /python3_12438
2020-02-05 13:58:04.015 [ Error ]: ICEORYX error! POSH__SENDERPORT_SUBSCRIBER_LIST_OVERFLOW
RouDi: /opt/iceoryx/src/iceoryx/iceoryx_utils/source/error_handling/error_handling.cpp:53: static void iox::ErrorHandler::ReactOnErrorLevel(iox::ErrorLevel, const char*): Assertion `false' failed.
Aborted (core dumped)

Variants of this issue

Running:

RouDi, ros2 run demo_nodes_py talker and ros2 run demo_nodes_py listner
RouDi, ros2 run demo_nodes_py talker and ros2 run demo_nodes_cpp listner
RouDi and ros2 run demo_nodes_py talker
RouDi and ros2 run demo_nodes_py listner
RouDi, ros2 run demo_nodes_cpp talker and ros2 topic echo chatter std_msgs/msg/String

Results in the same error

How not to reproduce

For using only cpp demo nodes ros2 run demo_nodes_cpp talker and ros2 run demo_nodes_cpp listner there seems to be no issue.

budrus commented 4 years ago

Thanks @jeising for the good summary. The Python process seems to create a subscriber every 100ms. Is there a 100 ms loop with the spin(). So far we destroy the publisher and subscriber resources in the shared memory not before cleaning up the whole process. So if the python side would create and destroy the publishers and subscribers with every spin I could explain this behaviour (and would also know what we have to do).

jeising commented 4 years ago

I feel there are two tasks:

Fix the many calls to open a subscriber upstream in ros2's rclpy
Be more robust against opening and discarding subscribers in iceoryx itself

What would you suggest to do, @michael-poehnl? :)

jeising commented 4 years ago

This seems to be the place where the wait_set is recreated on every spin.

https://github.com/ros2/rclpy/blob/c25ac225e4bcb98e29bf33dd0813e71704eb10b1/rclpy/rclpy/executors.py#L499

Is this by intention? @wjwwood @Karsten1987

jeising commented 4 years ago

For the iceoryx side I see, that we could implement the functionality to remove ports from applications or the functionality to remove runnables from applications and put the subscriber for this wait_set in its own runnable. The later functionality is already mentioned in the code.

https://github.com/eclipse/iceoryx/blob/4dac23846febb1040a43856dc04ce12669e7cae0/iceoryx_posh/source/roudi/shared_memory_manager.cpp#L418

@michael-poehnl

budrus commented 4 years ago

rcl_py creates and destroys a waitset with every spin. The question is if this is really necessary but nevertheless the rmw_iceoryx must be able to handle this. In the past iceoryx was used only in setups where resources were created once and not destroyed before the termination of the user process. So it somehow is a bug in the context of rmw_iceoryx and an extension of iceoryx. I created an issue in iceoryx for this https://github.com/eclipse/iceoryx/issues/51

wjwwood commented 4 years ago

Is this by intention? @wjwwood @Karsten1987

I'm not sure if it is required or just an unintentional issue that could be optimized. Maybe @sloretz can comment, he worked on the executor design in Python.

rcl_py creates and destroys a waitset with every spin. The question is if this is really necessary but nevertheless the rmw_iceoryx must be able to handle this.

This, even if rclpy could be changed to avoid this, iceoryx must be able to handle it, imo.

sloretz commented 4 years ago

This seems to be the place where the wait_set is recreated on every spin. I'm not sure if it is required or just an unintentional issue that could be optimized.

@jeising @wjwwood rclpy creating a new wait set every spin() isn't required, just unoptimized. It should be possible to reuse the wait set structure when the guard conditions/timers/clients/service servers/etc are the same between calls.

jeising commented 4 years ago

(..) rclpy creating a new wait set every spin() isn't required, just unoptimized. It should be possible to reuse the wait set structure when the guard conditions/timers/clients/service servers/etc are the same between calls.

@sloretz: Should we create an issue in the ros2 rclpy repo to keep track of this?

jeising commented 4 years ago

I would close this issue for it seems to be fixed on our side

ros2 / rmw_iceoryx