roboticslab-uc3m / yarp-devices

A place for YARP devices
https://robots.uc3m.es/yarp-devices/
9 stars 7 forks source link

Investigate CAN bus load #231

Closed PeterBowman closed 4 years ago

PeterBowman commented 4 years ago

Issue https://github.com/roboticslab-uc3m/yarp-devices/issues/217 targets the saturation of CAN buses due to the Cui encoders being configured in continuous send (i.e. push) mode. It is proposed that polling (i.e. pull) mode should be used instead, although we'd like to measure bus traffic beforehand to estimate the actual RX/TX load.

If we take a look into the PCAN user manual, we learn it supports event-like incoming CAN frames (in addition to actual data):

All of this concerns the Peak CAN bus device in particular. Device monitoring is proposed in https://github.com/roboticslab-uc3m/yarp-devices/issues/225 by means of the ICanBusErrors class. This solution entails dealing with YARP interfaces and propagating desired data through the CanBusControlboard layer and beyond, assuming we build a EasySetup-like GUI to rule everything status-related in a global context (having access to each node in all CAN buses).

Now, other solutions exist and do not meddle with the CanBusControlboard architecture. Credits to @rsantos88 for his research on these tools (read https://github.com/roboticslab-uc3m/yarp-devices/issues/217#issuecomment-528969785 and watch https://youtu.be/GRwmkx_YIIc):

This issue aims to extract the most of information we can from CAN (hey, I'm a wordplay ninja) using available tools. Then, tweak RX/TX rates (https://github.com/roboticslab-uc3m/yarp-devices/issues/217 regarding Cuis) and decide on how monitoring should be carried out (https://github.com/roboticslab-uc3m/yarp-devices/issues/225).

jgvictores commented 4 years ago

Tiny grain of salt: technically, the Yokogawa oscilloscope we have can be used to monitor CAN messages (maybe https://www.youtube.com/watch?v=FqDPMUk1KQQ).

PeterBowman commented 4 years ago

Martin Rostan, Configuration Guideline for CANopen Networks, provides very useful pointers on determining CAN bus load and several hints and guidelines on this matter. Our CAN design decisions should span to issue https://github.com/roboticslab-uc3m/yarp-devices/issues/232.

PeterBowman commented 4 years ago

Idea: drop the whole zero-delay thing in CanBusControlboard's RW threads, assume users will set a tiny yet non-micro-nor-zero-delay (e.g. microseconds?), check that CPU usage does not rocket up, and embrace yarp::os::PeriodicThread for clarity (https://github.com/roboticslab-uc3m/yarp-devices/issues/191#issuecomment-572216933).

https://github.com/roboticslab-uc3m/yarp-devices/blob/71803985c6719e781063ef9e247be05553812a6c/libraries/YarpPlugins/CanBusControlboard/CanRxTxThreads.cpp#L38-L41

PeterBowman commented 4 years ago

Idea: drop the whole zero-delay thing in CanBusControlboard's RW threads, assume users will set a tiny yet non-micro-nor-zero-delay (e.g. microseconds?), check that CPU usage does not rocket up, and embrace yarp::os::PeriodicThread for clarity (#191 (comment)).

Done at https://github.com/roboticslab-uc3m/yarp-devices/commit/908bc29b802da63dc4886dcf03a81048b61f5d5e, currently delaying 100 microseconds for both read/write threads (https://github.com/roboticslab-uc3m/teo-configuration-files/commit/8c69d4c43341f1391082856f39c0b54bc19fd329).

For completeness: previous yarp::os::Thread implementation raised up to 25-35% usage on a single CAN bus (0-1 millisecond), and almost 50% on a dual-bus configuration. The periodic thread alternative reaches around 20% and 35%, respectively (100-100 microseconds). Both tests at SYNC period of 30 milliseconds.

Edit: which raises up to 200% with a SYNC period of 5 milliseconds on a single arm...

PeterBowman commented 4 years ago

I managed to perform position control via joystick with a SYNC period of just 2 milliseconds on TEO's right arm. CSP mode was running under the hood (https://github.com/roboticslab-uc3m/yarp-devices/issues/222). Setup (https://github.com/roboticslab-uc3m/kinematics-dynamics/issues/173#issuecomment-493400560):

PeterBowman commented 4 years ago

Edit: which raises up to 200% with a SYNC period of 5 milliseconds on a single arm...

I'm unable to replicate this today. I decided to revert https://github.com/roboticslab-uc3m/yarp-devices/commit/908bc29b802da63dc4886dcf03a81048b61f5d5e since this is a classical save-CPU-resources problem. I don't really need to run stuff at precise intervals nor any of the statistical data yarp::os::PeriodicThread provides. Also, it seems the yarp::os::Thread solution can spare a bit of CPU %.

PeterBowman commented 4 years ago

I implemented yarp::dev::ICanBusErrors in order to inhibit CAN TX operations, most notably the periodic SYNC messages:

https://github.com/roboticslab-uc3m/yarp-devices/blob/40e212fca63a535851c9d93ce117dcf11dcddcac/libraries/YarpPlugins/CanBusControlboard/CanRxTxThreads.cpp#L147-L153

On power-off condition (i.e. emergency button pressed), we observed a constantly increasing CAN error count and a nearly 100% bus load via lspcan utility. It takes a single message written to the bus in order to saturate it. I presume there goes some kind of "bouncing" across dead nodes, or perhaps the power-off state is electrically detrimental for the bus. The canGetErrors check is therefore not enough, but I think there is no other means to diagnose whether power has gone off at any instant, and at least it prevents from commencing pointless CAN transfers.

PeterBowman commented 4 years ago

Regarding how to obtain/estimate CAN bus load:

PeterBowman commented 4 years ago

Added new /<robot>/load:o port for streaming CAN bus load info, see https://github.com/roboticslab-uc3m/yarp-devices/commit/30a29275d454b4188d41b55f13f6efc8dba0cdec and https://github.com/roboticslab-uc3m/teo-configuration-files/commit/eb8f6d2b05f2597170aa74d662e4d74026d51192. The yarpscope utility comes quite handy:

yarpscope --remote /teo/can/rightArm/load:o --min 0 --max 1 --persistent

PeterBowman commented 4 years ago

I managed to perform position control via joystick with a SYNC period of just 2 milliseconds on TEO's right arm.

On this setup, (estimated) bus load is stable around 90%.

PeterBowman commented 4 years ago

ASWJ we pick the following defaults: 20 ms for SYNC period, 10 ms for YARPs' controlboardwrapper period (https://github.com/roboticslab-uc3m/teo-configuration-files/commit/933370a874e444c493421331e3d99dd832d9771b).

PeterBowman commented 4 years ago

Added new /<robot>/load:o port for streaming CAN bus load info

Split into RX bus load, TX bus load, and overall bus load: https://github.com/roboticslab-uc3m/yarp-devices/commit/cf1aeaa192fdf77bc0f5f8b18c962c6609d465fb. We learned a CAN bus is half-duplex, therefore it actually makes sense to sum both terms: https://stackoverflow.com/a/58505043.

yarpscope --remote /teo/can/rightArm/load:o --min 0 --max 1.1 --index "(0 1 2)" --color "(Red Green Blue)" --persistent

PeterBowman commented 2 years ago

Currently 68% CAN bus load at much lower CPU usage (albeit on a new and better PC) with a SocketCAN implementation: https://github.com/roboticslab-uc3m/yarp-devices/issues/251#issuecomment-920270954. For local inspection, the canbusload command is preferable and more accurate (accounts for stuffed bits if enabled).