micro-ROS / micro_ros_platformio

micro-ROS library for Platform.IO
Apache License 2.0
224 stars 80 forks source link

Crashing when connected to Agent over UDP #122

Closed patrickwasp closed 8 months ago

patrickwasp commented 11 months ago

Issue template

Steps to reproduce the issue

I made an example repo with Serial and UDP versions of the same basic node, publishing the milliseconds after boot.

The serial version works as expected, but the UDP version reboots every few seconds. The UDP still publishes correctly while it's running.

https://github.com/patrickwasp/esp32-ros-examples

In my platform.ini I set board_microros_transport = custom and I initialize the connection using

set_microros_ethernet_transports(kMicroControllerIp, kSubnetMask, kAgentIp,
                                   kAgentPort);

When I don't have the agent running the micro controller doesn't crash.

The return code error I get is 101 (RCL_RET_NOT_INIT). Is there anything wrong with my setup?

Expected behavior

No restarts

Actual behavior

Restarts after a few seconds

Additional information

Serial output

Ethernet Connected, MAC: 40:22:D8:17:99:2F
Setup complete
...
Waiting for connection
Waiting for connection
Available
Connected
Connected
...
Connected
Connected
Disconnected
Waiting for connection
Available
Connected
...
5594
Connected
...
6441
Connected
...
Connected
Disconnected
Waiting for connection
Available
Failed status on line: 100 with ROS error code: 101
ets Jul 29 2019 12:21:46

rst:0xc (SW_CPU_RESET),boot:0x1b (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0030,len:1184
load:0x40078000,len:13192
load:0x40080400,len:3028
entry 0x400805e4

line 100 corresponds to:

RC_CHECK(rclc_node_init_default(&g_ros_node, kNodeName, kNamespace,
                                &g_ros_support));

so it looks like its disconnecting from the agent regularly and sometimes fails to connect during node initialization.

pablogs9 commented 11 months ago

Hello, if you are working with ESP32, the recommended and tested platform is micro-ROS component for ESP-IDF . Could you test that the basic example there (an UDP example) works properly?

patrickwasp commented 11 months ago

Okay, I'll have to figure out how to use IDF to test that out. Any tips on how to set up IDF with ROS on platformIO?

In the meantime, I have added more debugging output so it looks like the agent regularly disconnects and sometimes fails to reinitialize. This means there are two problems, 1, it is disconnecting, and 2, it cannot reliably reconnect.

This is the function I call when the agent is available, and it fails the check on rclc_node_init_default with the error RCL_RET_NOT_INIT. Are there any options to make that call more robust? Would putting that line in a while loop to wait until it does connect be a good solution?

bool CreateEntities() {
  g_ros_allocator = rcl_get_default_allocator();

  g_ros_init_options = rcl_get_zero_initialized_init_options();
  RC_CHECK(rcl_init_options_init(&g_ros_init_options, g_ros_allocator));
  RC_CHECK(rcl_init_options_set_domain_id(&g_ros_init_options, kDomainId));
  rclc_support_init_with_options(&g_ros_support, 0, NULL, &g_ros_init_options,
                                 &g_ros_allocator);

  RC_CHECK(rclc_node_init_default(&g_ros_node, kNodeName, kNamespace,
                                  &g_ros_support));

  g_ros_executor = rclc_executor_get_zero_initialized_executor();
  RC_CHECK(rclc_executor_init(&g_ros_executor, &g_ros_support.context,
                              kNumberOfHandles, &g_ros_allocator));

  InitializePublishersAndTimers();

  UpdateTimeOffsetFromAgent();
  return true;
}

I'm not sure what I'm doing with IDF but this is what I tried and failed to compile:

  1. Download the humble branch from https://github.com/micro-ROS/micro_ros_espidf_component/tree/humble
  2. extract and cd into the repo
  3. run docker run -it --rm --user espidf --volume="/etc/timezone:/etc/timezone:ro" -v $(pwd):/micro_ros_espidf_component -v /dev:/dev --privileged --workdir /micro_ros_espidf_component microros/esp-idf-microros:latest /bin/bash -c "cd examples/int32_publisher; idf.py menuconfig build flash monitor"
  4. go through the menus to configure ethernet.
    
    #
    # micro-ROS example-app settings
    #
    CONFIG_MICRO_ROS_APP_STACK=16000
    CONFIG_MICRO_ROS_APP_TASK_PRIO=5
    # end of micro-ROS example-app settings

#

micro-ROS Settings

# CONFIG_MICRO_ROS_ESP_XRCE_DDS_MIDDLEWARE=y

CONFIG_MICRO_ROS_ESP_EMBEDDEDRTPS_MIDDLEWARE is not set

CONFIG_MICRO_ROS_ESP_NETIF_WLAN is not set

CONFIG_MICRO_ROS_ESP_NETIF_ENET=y

CONFIG_MICRO_ROS_ESP_UART_TRANSPORT is not set

#

Ethernet Configuration

# CONFIG_MICRO_ROS_USE_INTERNAL_ETHERNET=y

CONFIG_MICRO_ROS_USE_DM9051 is not set

CONFIG_MICRO_ROS_USE_W5500 is not set

CONFIG_MICRO_ROS_ETH_PHY_IP101 is not set

CONFIG_MICRO_ROS_ETH_PHY_RTL8201 is not set

CONFIG_MICRO_ROS_ETH_PHY_LAN8720=y

CONFIG_MICRO_ROS_ETH_PHY_DP83848 is not set

CONFIG_MICRO_ROS_ETH_PHY_KSZ8041 is not set

CONFIG_MICRO_ROS_ETH_MDC_GPIO=23 CONFIG_MICRO_ROS_ETH_MDIO_GPIO=18 CONFIG_MICRO_ROS_ETH_PHY_RST_GPIO=5 CONFIG_MICRO_ROS_ETH_PHY_ADDR=1

end of Ethernet Configuration

CONFIG_MICRO_ROS_AGENT_IP="10.4.4.227" CONFIG_MICRO_ROS_AGENT_PORT="8888"

end of micro-ROS Settings

save and quit, and this is the error it comes up with:

... micro_ros_espidf_component/network_interfaces/uros_ethernet_netif.c:96:26: error: implicit declaration of function 'esp_eth_phy_new_lan8720'; did you mean 'esp_eth_phy_new_lan87xx'? [-Werror=implicit-function-declaration] 96 | esp_eth_phy_t *phy = esp_eth_phy_new_lan8720(&phy_config); | ^~~~~~~ | esp_eth_phy_new_lan87xx ...

patrickwasp commented 11 months ago

I may have found a workaround for the disconnecting problem, but not the crashing yet.

I changed this part of my code while checking for connectivity to the agent:

rmw_uros_ping_agent(100, 1)

to

rmw_uros_ping_agent(200, 3)

this increased the timeout from 100ms to 200ms and the attempts from 1 to 3. My network must be too slow to handle the faster check.

It used to disconnect every 2-5 seconds, but now it's connected for the past hour. I'll let it run for a few more hours to see if this solves problem 1. edit: now up for 14+ hours, so I'll consider this solved.

For problem 2, reconnecting, I tried putting a delay before rclc_node_init_default but that didn't work. It still crashed after calling it.