micro-ROS / micro-ROS-Agent

ROS 2 package using Micro XRCE-DDS Agent.
Apache License 2.0
97 stars 51 forks source link

Agent crashes with `malloc()` error #168

Closed aditya2592 closed 1 year ago

aditya2592 commented 1 year ago

Describe the bug

Agent crashes with the traceback below:

0000: 81 00 00 00 0B 01 05 00 00 00 00 00 80
[1660679983.827294] debug    | UDPv4AgentLinux.cpp | recv_message             | [==>> UDP <<==]        | client_key: 0x6A92BFC5, len: 13, data: 
0000: 81 00 00 00 0B 01 05 00 00 00 00 00 80
[1660679983.927371] debug    | UDPv4AgentLinux.cpp | recv_message             | [==>> UDP <<==]        | client_key: 0x6A92BFC5, len: 13, data: 
0000: 81 00 00 00 0B 01 05 00 00 00 00 00 80
[1660679984.027447] debug    | UDPv4AgentLinux.cpp | recv_message             | [==>> UDP <<==]        | client_key: 0x6A92BFC5, len: 13, data: 
0000: 81 00 00 00 0B 01 05 00 00 00 00 00 80
[1660679984.127570] debug    | UDPv4AgentLinux.cpp | recv_message             | [==>> UDP <<==]        | client_key: 0x6A92BFC5, len: 13, data: 
0000: 81 00 00 00 0B 01 05 00 00 00 00 00 80
[1660679984.227674] debug    | UDPv4AgentLinux.cpp | recv_message             | [==>> UDP <<==]        | client_key: 0x6A92BFC5, len: 13, data: 
0000: 81 00 00 00 0B 01 05 00 00 00 00 00 80
[1660679984.327956] debug    | UDPv4AgentLinux.cpp | recv_message             | [==>> UDP <<==]        | client_key: 0x6A92BFC5, len: 13, data: 
0000: 81 00 00 00 0B 01 05 00 00 00 00 00 80
[1660679984.427964] debug    | UDPv4AgentLinux.cpp | recv_message             | [==>> UDP <<==]        | client_key: 0x6A92BFC5, len: 13, data: 
0000: 81 00 00 00 0B 01 05 00 00 00 00 00 80
[1660679984.528197] debug    | UDPv4AgentLinux.cpp | recv_message             | [==>> UDP <<==]        | client_key: 0x6A92BFC5, len: 12, data: 
0000: 81 00 00 00 03 01 04 00 00 02 FF FE
[1660679984.528275] debug    | UDPv4AgentLinux.cpp | recv_message             | [==>> UDP <<==]        | client_key: 0x00000000, len: 16, data: 
0000: 80 00 00 00 02 01 08 00 00 0A FF FD 02 00 00 00
[1660679985.028510] debug    | UDPv4AgentLinux.cpp | recv_message             | [==>> UDP <<==]        | client_key: 0x00000000, len: 16, data: 
0000: 80 00 00 00 02 01 08 00 00 0A FF FD 02 00 00 00
[1660679985.172802] info     | ProxyClient.cpp    | create_participant       | participant created    | client_key: 0x6A92BFC5, participant_id: 0x000(1)
[1660679985.173052] debug    | UDPv4AgentLinux.cpp | send_message             | [** <<UDP>> **]        | client_key: 0x6A92BFC5, len: 14, data: 
0000: 81 80 00 00 05 01 06 00 00 0A 00 01 00 00
[1660679985.173099] debug    | UDPv4AgentLinux.cpp | send_message             | [** <<UDP>> **]        | client_key: 0x6A92BFC5, len: 13, data: 
0000: 81 00 00 00 0A 01 05 00 01 00 00 00 80
[1660679985.173153] debug    | UDPv4AgentLinux.cpp | send_message             | [** <<UDP>> **]        | client_key: 0x6A92BFC5, len: 13, data: 
0000: 81 00 00 00 0A 01 05 00 01 00 00 00 80
[1660679985.191223] info     | Root.cpp           | delete_client            | delete                 | client_key: 0x6A92BFC5
[1660679985.191305] info     | SessionManager.hpp | destroy_session          | session closed         | client_key: 0x6A92BFC5, address: 10.42.0.3:14285
[1660679985.191472] debug    | UDPv4AgentLinux.cpp | send_message             | [** <<UDP>> **]        | client_key: 0x00000000, len: 14, data: 
0000: 81 00 00 00 05 01 06 00 00 02 FF FE 00 00
[1660679985.191513] debug    | UDPv4AgentLinux.cpp | send_message             | [** <<UDP>> **]        | client_key: 0x00000000, len: 36, data: 
0000: 80 00 00 00 06 01 1C 00 00 0A FF FD 00 00 01 0D 58 52 43 45 01 00 01 0F 00 01 0D 00 01 00 00 00
0020: 00 00 00 00
[1660679985.191556] debug    | UDPv4AgentLinux.cpp | send_message             | [** <<UDP>> **]        | client_key: 0x00000000, len: 36, data: 
0000: 80 00 00 00 06 01 1C 00 00 0A FF FD 00 00 01 0D 58 52 43 45 01 00 01 0F 00 01 0D 00 01 00 00 00
0020: 00 00 00 00
[1660679985.192146] debug    | UDPv4AgentLinux.cpp | recv_message             | [==>> UDP <<==]        | client_key: 0x00000000, len: 24, data: 
0000: 80 00 00 00 00 01 10 00 58 52 43 45 01 00 01 0F 1D 56 AD 52 81 00 FC 03
[1660679985.192296] info     | Root.cpp           | create_client            | create                 | client_key: 0x1D56AD52, session_id: 0x81
[1660679985.192331] info     | SessionManager.hpp | establish_session        | session established    | client_key: 0x1D56AD52, address: 10.42.0.3:12456
[1660679985.194511] debug    | UDPv4AgentLinux.cpp | send_message             | [** <<UDP>> **]        | client_key: 0x1D56AD52, len: 19, data: 
0000: 81 00 00 00 04 01 0B 00 00 00 58 52 43 45 01 00 01 0F 00
[1660679985.196132] debug    | UDPv4AgentLinux.cpp | recv_message             | [==>> UDP <<==]        | client_key: 0x1D56AD52, len: 40, data: 
0000: 81 80 00 00 01 07 1E 00 00 0A 00 01 01 03 00 01 0F 00 00 00 00 01 00 00 07 00 00 00 6E 75 63 6C
0020: 65 6F 00 00 00 00 00 00
malloc(): unaligned tcache chunk detected
[ros2run]: Aborted

To Reproduce Steps to reproduce the behaviour:

  1. Microcontroller is running and pinging the agent to check its availability
  2. Agent is started on the host PC and runs for a few seconds
  3. Agent connects to the microcontroller and data is exchanged normally for few seconds
  4. Agent crashes automatically with malloc traceback

Expected behaviour Agent shoudln't crash

System information (please complete the following information):

Additional context This doesn't happen always, only sometimes. Also didn't face this issue on Galactic. I saw that new version of Fast DDS 2.7.1 was released on 29th July which is around the time we started seeing this. Could that be a cause?

Acuadros95 commented 1 year ago

Could you share the code to replicate this? Does it happen without SHM runnning?

This doesn't happen always, only sometimes.

How often does this problem occur?

I saw that new version of Fast DDS 2.7.1 was released on 29th July which is around the time we started seeing this. Could that be a cause?

Could be, we need to replicate this on our side to trace the bug.

pablogs9 commented 1 year ago

Any update on this? @Acuadros95

Acuadros95 commented 1 year ago

Closing as we could not replicate this.

@aditya2592 Please update us if the issue is still around, as issues like this are a priority for us.

aditya2592 commented 1 year ago

Sorry for not replying earlier, the fix that we used for this :

If agent is disconnected from the client, the client's session is destroyed. However at the same time, we also always have to kill the agent as well. For a new session, agent restarts and client reconnects. With this, the above crash is avoided. So the issue was occurring in cases where agent is not restarted but the session on the client is.