roboticslab-uc3m / yarp-devices

A place for YARP devices
https://robots.uc3m.es/yarp-devices/
9 stars 7 forks source link

Avoid bloat of CuiAbsolute CAN messages #217

Closed PeterBowman closed 4 years ago

PeterBowman commented 5 years ago

CuiAbsolute nodes are configured in CanBusControlboard initialization step to start in continuous publishing mode: ref. We only need these devices to query absolute encoder reads on start, all subsequent measures of built-in relative encoders would take into account this initial offset.

Options:

PeterBowman commented 5 years ago

ASWJμ (ASW Juanmi), make sure that a relative encoder-only configuration does not lead to accumulation of errors on the long run, which would be mitigated with periodic checks by reading absolute encoders.

PeterBowman commented 5 years ago

If I'm correct, each Cui publishes encoder reads every 8 microseconds:

https://github.com/roboticslab-uc3m/yarp-devices/blob/89e22a7a896ce48ea82dd3791ab94b08b39e0a2f/firmware/CuiAbsolute/Pic_source/main.c#L76-L82

https://github.com/roboticslab-uc3m/yarp-devices/blob/89e22a7a896ce48ea82dd3791ab94b08b39e0a2f/firmware/CuiAbsolute/Pic_source/main.c#L180-L190

PeterBowman commented 5 years ago

The ICuiAbsolute interface allows setting a Cui encoder device in either push or pull mode, that is, continuous or on-request message publishing, respectively:

https://github.com/roboticslab-uc3m/yarp-devices/blob/6f5c2da92bbf20cda288ff62de88cdf530355926/libraries/YarpPlugins/ICuiAbsolute.h#L24-L26

The default behavior is to enable push mode on device init, since https://github.com/roboticslab-uc3m/yarp-devices/commit/2ed1bffbabfc9492b8c83af3e7ae4692fbd57951. @smcdiaz, @jgvictores why wasn't pull mode chosen instead? Perhaps due to https://github.com/roboticslab-uc3m/yarp-devices/issues/217#issuecomment-514974018?

jgvictores commented 5 years ago

Perhaps due to #217 (comment)?

Nope. Something like this sequence happened:

  1. @smcdiaz had some control schemes in mind that involved using both relative and absolute encoders
  2. We set absolute encoders to push continuously (to potentially save bandwith on constantly polling)
  3. Had to take a design decision because YARP methods involve only reading from 1 type of encoder per joint, decided to provide the reading of the relative encoder (specially since some absolute encoders fail sometimes), and decided to use the absolute stream only on init (and by then we had already forgotten that we should be using the pull mechanism for that use case)

Feel free to switch to the pull mechanism, IMHO it's more efficient for the current workflow.

PeterBowman commented 5 years ago

Thanks! I guess https://github.com/roboticslab-uc3m/yarp-devices/commit/5f662863902f24cee57b0589889b641f1cc598f6 was an attempt to test the behavior of relative vs absolute encoders. Can we now delete the printRelEncs branch?

jgvictores commented 5 years ago

We can delete the printRelEncs branch. I do not recall the exact functionality, but it was definitely a hack for debugging. If we ever need a similar mechanism, we can try to open a specific issue and work on a better solution.

PeterBowman commented 5 years ago

@rsantos88 has measured the amount of Cui and non-Cui messages on normal operation per 60 seconds to estimate the CAN bus load:

This adds up to 288755 CAN messages. If we assume that all incoming frames consist of 108 bits (ref), the overall RX transfer rate is ~520 kbits, far from the 1 Mpbs but not accounting for TX nor filtered ids. Also, 252255 frames transferred in 60 seconds equals one frame per 240 microseconds, also far from the 8 microsecond period we extract from the firmware code.

PS would you mind checking this again with a minor change, @rsantos88? There is a potential pitfall in https://github.com/roboticslab-uc3m/yarp-devices/commit/70b1fa9a12ee5c914a5cd5b8150d26e356945f56: you are never initializing the cuiMessages and otherMessages class members, therefore they may contain some garbage value before first use. It wouls suffice to put cuiMessages = otherMessages = 0; in DeviceDriverImpl.cpp, for instance.

rsantos88 commented 5 years ago

PS would you mind checking this again with a minor change, @rsantos88? There is a potential pitfall in 70b1fa9: you are never initializing the cuiMessages and otherMessages class members, therefore they may contain some garbage value before first use. It wouls suffice to put cuiMessages = otherMessages = 0; in DeviceDriverImpl.cpp, for instance.

Ok, thanks for your advice. That has been fixed now (https://github.com/roboticslab-uc3m/yarp-devices/commit/f90abd0302c881b707cb0c026ce73bbccb56ae21). I've taken measures three times to be sure of the results and doing this with one thread, only with one can bus of right arm + ID28(head) using oneCanBusOneWrapper --from oneCanBusOneWrapper-right-arm.ini (oneCanBusOneWrapper-right-arm.ini).

Here are the results:

Time: 60.000303
[CUI messages (198215)] 
[NON-CUI messages (39990)] 

Time: 60.000152
[CUI messages (200496)] 
[NON-CUI messages (40214)]

Time: 60.001211
[CUI messages (200176)] 
[NON-CUI messages (39380)] 
rsantos88 commented 5 years ago

IDEA: We could use the software PCAN-View for Linux (Software for Displaying CAN and CAN FD Messages, page 4 of website driver) to monitor can packages

rsantos88 commented 5 years ago

Other IDEA: use USB-to-CAN compact interface and canAnalyser 3 on windows

rsantos88 commented 5 years ago

Related to the latest tests carried out today I comment on the following points: 1) I have performed tests with the PCAN-View for Linux software and have encountered problems when I try to read the information from the CAN network while theoneCanBusOneWrapper software is running, causing it to be blocked at the time of the connection to the can network . This program allows you to read packages that are being received over the network (continuous publishing mode of the absolute encoders), as well as the last packages received from the drivers status. You can perfectly differentiate the ID of the messages of each node, the header of your message, number of messages that are arriving, content of the same... but apparently the software is not compatible with real-time monitoring and sending messages , all through the use of the same CanBusPeak device. 2) As a result of this, I have carried out tests connecting the USB-to-CAN compact interface to the can network of the left arm, serving this device as another node of the network that allows monitoring all traffic. It may be interesting to analyze some of the results obtained by this software. I have recorded a small video of a few seconds so that some details can be appreciated:

480

PS: I don't know if it's the best location for this extensive issue. If necessary, you can move it to a better place

PeterBowman commented 5 years ago

Great analysis!

PS: I don't know if it's the best location for this extensive issue. If necessary, you can move it to a better place

I'm forking this issue into https://github.com/roboticslab-uc3m/yarp-devices/issues/231 to delegate bus load discussions. Let's focus here on Cui devices, push/pull modes and send rates.

rsantos88 commented 5 years ago

I have been researching the improvement obtained by reducing the frequency of sending messages from CUIs (which is the same, increasing the period of sending messages) and the effects produced in the statistics we obtain in canAnaliser3-mini. FYI, there are 3 versions of this software and we have the simplest version: mini. Here you can see the limitations of this version. Anyway, mini is free. With respect to the period of sending messages of the PICs, these are regulated with the function Delay10TCYc (unsigned char), being Delay10TCYc (1) equivalent to 8µs = 10 clock cycles. This is the current default delay between sending messages by sending 0 as a parameter of startContinuousPublishing (0). In this code we can appreciate that the number of times we´ll call Delay10TCYc (1) will be the parameter sent to startContinuousPublishing +1, detecting a possible interruption between each wait iteration.

Here I´ll show you the results obtained, doing diferent tests of pcan2 (left arm + ID27):

You can conclude that with the default option, the sending frequency of the CUIs is so high that it produces a large number of errors in the CAN, leading to a large number of packages being lost (in 1 minute a total of 126536 correct packages are received). While the following tests receive correct packages between 300,000 and 400,000 in 1 minute without errors or overruns. It could be concluded that the optimal configuration for a correct push send without errors and losses would be with a period of 24µs per message, changing startContinuousPublishing (0) to startContinuousPublishing (3).

PeterBowman commented 4 years ago

https://mrchunckuee.blogspot.com/2014/09/mplab-x-y-c18-uso-de-la-libreria-delaysh.html

rsantos88 commented 4 years ago

Going deeper, I am trying to understand the results I get (total number of packages) depending on the period of sending between messages (push type sending), controlled by the Delay10TCYx (unsigned char); located in the CUI code . Regarding the code executed in the CUIs, it has been cleaned and simplified (thanks @PeterBowman for the help) here are the result (WIP yet). By testing with testCuiAbsolute modified for this purpose and varying the period of sending messages with startContinuousPublishing, I've obtained the following results:

Note: Debugger CUI was a way to confirm that USB-CAN didn't lose packets, using an internal counter in the CUI code itself and checking the number of times it called the send () function. this value is the same, adding two packages (push and stop)

startContinupusPublishing(x) testCuiAbsolute USB-CAN interface Debugger CUI
1 949 1048 1046
10 958 1058 1056
100 1049 1158 1156
200 1175 1297 1295
startContinupusPublishing(x) testCuiAbsolute USB-CAN interface Debugger CUI
1 957 1057 1055
10 1049 1158 1156
100 425 470 468
rsantos88 commented 4 years ago

After these tests and talking with @PeterBowman the possibility of not understanding how the Delay10TCYx (unsigned char) function really works with respect to the oscillation frequency of the crystal, (20.000Hz ??), I've simplified the PIC code to the maximum, eliminating functions of reception and construction of the sending message, getting the least number of cycles consumed per instruction:

while(1)
{
  ECANSendMessage(canId, &degrees, sizeof(degrees), txFlags); 
  Delay100TCYx(delay);
} 

The results are:

delay packages / sec
100 474
10 3363
1 7400
rsantos88 commented 4 years ago

Another test that I've done has been remove the delay function in order to find out the number of packets that I would send in a second, taking into account the time spent processing the functions of ECANReceiveMessage and the send() function dedicated of constructing the message and sending it. The result has been 1132 packages in 1 second, which would be equivalent to a period of 0.88 ms per message sent.

rsantos88 commented 4 years ago

Testing this code :

    while(1)
    {
    ECANSendMessage(canId, &degrees_1, sizeof(degrees_1), txFlags); 
    Delay10KTCYx(100);
    ECANSendMessage(canId, &degrees_2, sizeof(degrees_2), txFlags); 
    }

the result is 200ms of difference between the two messages. Taking into account the equation of this link, the crystal oscillation frequency is 20Mhz

PeterBowman commented 4 years ago

the crystal oscillation frequency is 20Mhz

Verified with the Yokogawa oscilloscope.

PeterBowman commented 4 years ago

By applying the formula given at ref, I assembled a table for this particular cristal (20 MHz) unifying all available delay functions. First, I obtain the i iterations necessary to (theoretically) perform a 1 second delay. On the third and four columns, the minimum and maximum delays for i=1 and i=255 are shown, respectively (since the input parameter type is unsigned char):

DelayKTCYx(i) i T (i=1) [ms] T (i=255) [ms]
1 5000000 0.0002 0.051
10 500000 0.002 0.51
100 50000 0.02 5.1
1000 5000 0.2 51
10000 500 2 510

As @rsantos88 pointed out, even with i=0 there is an implicit ~1 millisecond delay due to low-level PIC state transitions and hardcoded delays. The i=500 case would fit; however, I doubt we are ever going to need a 510 millisecond delay. Since 51 milliseconds is already high enough, and the 0.2 millisecond resolution is also fine, I'd choose the Delay1KTCYx function.

PS currently using Delay10TCYx(1), which translates to 2 microseconds (not accounting for the implicit PIC delay).

rsantos88 commented 4 years ago

Send code (permalink) based on documentation (page 3,4,5) (copy, copy (permalink))

PeterBowman commented 4 years ago

Another test that I've done has been remove the delay function in order to find out the number of packets that I would send in a second, taking into account the time spent processing the functions of ECANReceiveMessage and the send() function dedicated of constructing the message and sending it. The result has been 1132 packages in 1 second, which would be equivalent to a period of 0.88 ms per message sent.

We learned today that this result is unpredictable, but seems to entail no more than a 1 ms delay due to internal hardware message buffering/queueing. By adding back the delay function call, messages are sent with the expected frequency.

Another issue arised: sometimes, the start push command renders no response from the PIC firmware. Also, we noticed that one or two error CAN frames are observed by the traffic analyzer right on application start. I presume some queued message originating from the previous run is malformed and sent in such shape by whatever device.

PeterBowman commented 4 years ago

Current state of the apocanlypse branch fulfills the goals of the CAN-TEO project as stated in the issue description:

rsantos88 commented 4 years ago

with i=5, the total number of packets are between 810-819 received in 1 second. Taking the highest value, the period is 1,22ms per message sent.

rsantos88 commented 4 years ago

Due to the new code modifications, tests will be carried out on a single TEO member (for example, left arm) to check with a startPushPublishing (5) value, the CAN traffic level. Rescued issues with problems encountered when reprogramming PICs (https://github.com/roboticslab-uc3m/teo-hardware-issues/issues/38)

jgvictores commented 4 years ago
PeterBowman commented 4 years ago

Pull command (re)implemented and successfully tested by @rsantos88, see https://github.com/roboticslab-uc3m/yarp-devices/issues/233#issuecomment-537983432.

PeterBowman commented 4 years ago

Commit https://github.com/roboticslab-uc3m/yarp-devices/commit/8bba682c83b57f60cc0e722e4e1fa7575e22df4d introduced retries on failed CAN transfers (current default: 5 retries).