ros-industrial / industrial_core

ROS-Industrial core communication packages (http://wiki.ros.org/industrial_core)
156 stars 181 forks source link

TcpClient won't re-establish connection if physical connection is unplugged, then restored #238

Closed PGlaubitzWork closed 3 years ago

PGlaubitzWork commented 5 years ago

Observed running against ROS indigo. Ethernet cable between PC and Fanuc controller became disconnected. isConnected() properly reported loss of connection. Upon re-connection makeConnect() could not re-establish connection since socket handle was for lost connection.

PGlaubitzWork commented 5 years ago

Proposed fix in PR #239

haudren commented 4 years ago

Sorry to do some archaeology here, but I just encountered a similar scenario when running fanuc_driver against a robot in an unstable network setting. I would sometimes get the following warnings and errors:

1599809178.461369439 WARN /joint_state [industrial_core/simple_message/src/socket/simple_socket.cpp:140(simple_socket::SimpleSocket::receiveBytes)] [topics: /rosout, /feedback_states, /joint_states, /robot_status] Recieved zero bytes: 0
1599809178.461382919 ERROR /joint_state [industrial_core/simple_message/src/smpl_msg_connection.cpp:118(smpl_msg_connection::SmplMsgConnection::receiveMsg)] [topics: /rosout, /feedback_states, /joint_states, /robot_status] Failed to receive message length
1599809178.461388687 ERROR /joint_state [industrial_core/simple_message/src/message_manager.cpp:166(message_manager::MessageManager::spinOnce)] [topics: /rosout, /feedback_states, /joint_states, /robot_status] Failed to receive incoming message
1599809178.461393753 WARN /joint_state [industrial_core/simple_message/include/simple_message/simple_comms_fault_handler.h:87(simple_comms_fault_handler::SimpleCommsFaultHandler::sendFailCB)] [topics: /rosout, /feedback_states, /joint_states, /robot_status] Send failure, no callback support

That would then be followed by a flurry of:

1599209056.621394184 ERROR /joint_state [include/simple_message/socket/simple_socket.h:256(simple_socket::SimpleSocket::logSocketError)] [topics: /rosout, /feedback_states, /joint_states, /robot_status] Failed to connect to server, rc: -1. Error: 'Transport endpoint is already connected' (errno: 106)

I tried to enable REUSE_ADDR in the client, but that only resulted in the following:

1599805480.228408018 ERROR /joint_state [industrial_core/simple_message/include/simple_message/socket/simple_socket.h:256(simple_socket::SimpleSocket::logSocketError)] [topics: /rosout, /feedback_states, /joint_states, /robot_status] Failed to connect to server, rc: -1. Error: 'Connection refused' (errno: 111)

However, once I used #239 , the robot connection was successfully restored, across multiple failures.

Tested on:

@gavanderhoorn : I hope you are the right person to tag here, could you maybe let me know here or in #239 what would need to be done to merge it?

haudren commented 4 years ago

@gavanderhoorn @Levi-Armstrong : I see you are both mentioned as maintainers for simple_message, could you maybe give your opinion on #239 ?

PGlaubitzWork commented 4 years ago

To the best of my knowledge, this change has now been used in our systems to control UR, Yaskawa, and probably ABB robots.

gavanderhoorn commented 3 years ago

Closing as #263 should address this.