roboticslab-uc3m / yarp-devices

A place for YARP devices
https://robots.uc3m.es/yarp-devices/
9 stars 7 forks source link

Improve resilience to faulty wiring (disable heartbeat) #252

Closed PeterBowman closed 3 years ago

PeterBowman commented 3 years ago

As reported in https://github.com/roboticslab-uc3m/teo-hardware-issues/issues/57, TEO is still experiencing degraded communications due to faulty wiring.

All iPOS-PC CAN connections are monitored via heartbeat protocol (https://github.com/roboticslab-uc3m/yarp-devices/issues/223#issuecomment-572320252): the iPOS drive is configured to send dummy messages in regular intervals (currently every 100 milliseconds), and the TechnosoftIpos device spawns a dedicated thread at start to measure delays between consecutive heartbeat signals. In case the last signal was received more than half a second ago (ref), the drive is instructed to reset itself.

Idea: consider supporting special (e.g. negative?) values to the heartbeat period parameter in TechnosoftIpos. For instance, if the period is set to zero, the device would not try to spawn and use a monitoring thread as described above.

PeterBowman commented 3 years ago

Not doing... This behavior feature has been supported from the very beginning, I have just improved a bit the way common and driver-specific parameters are handled: https://github.com/roboticslab-uc3m/yarp-devices/commit/31df0a2a8f1c5c0ea4d13e8f5e45902a700947ee. Either omit the optional monitorPeriod parameter, or (especially if you have defined a global one), override it in your specific drive's .ini with a negative or zero value. Also, heartbeat is disabled whenever heartbeatPeriod is equal to zero (not negative!) or the monitor thread has been disabled as well.

jgvictores commented 3 years ago

Sorry for re-using a closed thread, but just a quick question! Is the monitorPeriod parameter, if desired, intended to be added at files such as https://github.com/roboticslab-uc3m/teo-configuration-files/tree/3ee7fef05c325b756d5fb963d8196131aa86c81b/share/robots/teo/hardware/drivers ? Thanks!

PeterBowman commented 3 years ago

Nope, please check the ref in:

In case the last signal was received more than half a second ago (ref), the drive is instructed to reset itself.

This is an iPOS-related configuration parameter. Therefore, you can put it in nodes/common-ipos.ini in case you want to reuse the same value across all nodes (or just set a default value), and optionally override it within individual nodes/idX-ipos.ini files for the specific iPOS nodes they relate to.

jgvictores commented 3 years ago

Oh! Found it at https://github.com/roboticslab-uc3m/teo-configuration-files/blob/3ee7fef05c325b756d5fb963d8196131aa86c81b/share/robots/teo/nodes/common-ipos.ini#L10 via your comment. I should have just grepped it or used the GitHub search field. ^^

Thank you so much!

PeterBowman commented 3 years ago

Also, heartbeat is disabled whenever heartbeatPeriod is equal to zero (not negative!) or the monitor thread has been disabled as well.

Not tested (yet) due to COVID, but https://github.com/roboticslab-uc3m/yarp-devices/commit/31a33617549dcc5f2715b63876e95a3076bf87a1 should have made possible still having a worker thread for online boot-up signal treatment while disabling heartbeat checks. In this scenario, provide a zero value for heartbeatPeriod and keep monitorPeriod unchanged.

PeterBowman commented 3 years ago

Not tested (yet) due to COVID, but 31a3361 should have made possible still having a worker thread for online boot-up signal treatment while disabling heartbeat checks.

Tested today. The driver attends to a boot-up signal once, just on start. Successive emergency stop&release actions (i.e. pressing the big red button of death) accomplish nothing, so this change has a pretty narrow use case anyway.