Closed PeterBowman closed 4 years ago
ASWJμ (ASW Juanmi), make sure that a relative encoder-only configuration does not lead to accumulation of errors on the long run, which would be mitigated with periodic checks by reading absolute encoders.
If I'm correct, each Cui publishes encoder reads every 8 microseconds:
The ICuiAbsolute interface allows setting a Cui encoder device in either push or pull mode, that is, continuous or on-request message publishing, respectively:
The default behavior is to enable push mode on device init, since https://github.com/roboticslab-uc3m/yarp-devices/commit/2ed1bffbabfc9492b8c83af3e7ae4692fbd57951. @smcdiaz, @jgvictores why wasn't pull mode chosen instead? Perhaps due to https://github.com/roboticslab-uc3m/yarp-devices/issues/217#issuecomment-514974018?
Perhaps due to #217 (comment)?
Nope. Something like this sequence happened:
Feel free to switch to the pull mechanism, IMHO it's more efficient for the current workflow.
Thanks! I guess https://github.com/roboticslab-uc3m/yarp-devices/commit/5f662863902f24cee57b0589889b641f1cc598f6 was an attempt to test the behavior of relative vs absolute encoders. Can we now delete the printRelEncs
branch?
We can delete the printRelEncs
branch. I do not recall the exact functionality, but it was definitely a hack for debugging. If we ever need a similar mechanism, we can try to open a specific issue and work on a better solution.
@rsantos88 has measured the amount of Cui and non-Cui messages on normal operation per 60 seconds to estimate the CAN bus load:
This adds up to 288755 CAN messages. If we assume that all incoming frames consist of 108 bits (ref), the overall RX transfer rate is ~520 kbits, far from the 1 Mpbs but not accounting for TX nor filtered ids. Also, 252255 frames transferred in 60 seconds equals one frame per 240 microseconds, also far from the 8 microsecond period we extract from the firmware code.
PS would you mind checking this again with a minor change, @rsantos88? There is a potential pitfall in https://github.com/roboticslab-uc3m/yarp-devices/commit/70b1fa9a12ee5c914a5cd5b8150d26e356945f56: you are never initializing the cuiMessages
and otherMessages
class members, therefore they may contain some garbage value before first use. It wouls suffice to put cuiMessages = otherMessages = 0;
in DeviceDriverImpl.cpp, for instance.
PS would you mind checking this again with a minor change, @rsantos88? There is a potential pitfall in 70b1fa9: you are never initializing the cuiMessages and otherMessages class members, therefore they may contain some garbage value before first use. It wouls suffice to put cuiMessages = otherMessages = 0; in DeviceDriverImpl.cpp, for instance.
Ok, thanks for your advice. That has been fixed now (https://github.com/roboticslab-uc3m/yarp-devices/commit/f90abd0302c881b707cb0c026ce73bbccb56ae21).
I've taken measures three times to be sure of the results and doing this with one thread, only with one can bus of right arm + ID28(head) using oneCanBusOneWrapper --from oneCanBusOneWrapper-right-arm.ini
(oneCanBusOneWrapper-right-arm.ini).
Here are the results:
Time: 60.000303
[CUI messages (198215)]
[NON-CUI messages (39990)]
Time: 60.000152
[CUI messages (200496)]
[NON-CUI messages (40214)]
Time: 60.001211
[CUI messages (200176)]
[NON-CUI messages (39380)]
IDEA: We could use the software PCAN-View for Linux (Software for Displaying CAN and CAN FD Messages, page 4 of website driver) to monitor can packages
Other IDEA: use USB-to-CAN compact interface and canAnalyser 3 on windows
Related to the latest tests carried out today I comment on the following points:
1) I have performed tests with the PCAN-View for Linux
software and have encountered problems when I try to read the information from the CAN network while theoneCanBusOneWrapper
software is running, causing it to be blocked at the time of the connection to the can network . This program allows you to read packages that are being received over the network (continuous publishing mode of the absolute encoders), as well as the last packages received from the drivers status. You can perfectly differentiate the ID of the messages of each node, the header of your message, number of messages that are arriving, content of the same... but apparently the software is not compatible with real-time monitoring and sending messages , all through the use of the same CanBusPeak device.
2) As a result of this, I have carried out tests connecting the USB-to-CAN compact
interface to the can network of the left arm, serving this device as another node of the network that allows monitoring all traffic. It may be interesting to analyze some of the results obtained by this software. I have recorded a small video of a few seconds so that some details can be appreciated:
Busload
a maximum value of 63% in the execution period.PS: I don't know if it's the best location for this extensive issue. If necessary, you can move it to a better place
Great analysis!
PS: I don't know if it's the best location for this extensive issue. If necessary, you can move it to a better place
I'm forking this issue into https://github.com/roboticslab-uc3m/yarp-devices/issues/231 to delegate bus load discussions. Let's focus here on Cui devices, push/pull modes and send rates.
I have been researching the improvement obtained by reducing the frequency of sending messages from CUIs (which is the same, increasing the period of sending messages) and the effects produced in the statistics we obtain in canAnaliser3-mini
. FYI, there are 3 versions of this software and we have the simplest version: mini. Here you can see the limitations of this version. Anyway, mini is free. With respect to the period of sending messages of the PICs, these are regulated with the function Delay10TCYc (unsigned char)
, being Delay10TCYc (1)
equivalent to 8µs = 10 clock cycles. This is the current default delay between sending messages by sending 0 as a parameter of startContinuousPublishing (0).
In this code we can appreciate that the number of times we´ll call Delay10TCYc (1)
will be the parameter sent to startContinuousPublishing
+1, detecting a possible interruption between each wait iteration.
Here I´ll show you the results obtained, doing diferent tests of pcan2 (left arm + ID27):
Video 1: Period between sending CUI messages: 8µs using startContinuousPublishing (0)
It's the example seen above. Current operation of the encoders, sending messages at the maximum frequency allowed by the PIC of CUI.
Results:
Receive counter: 126536
Error counter: 67410
Overruns: 1403
Video 2: Period between sending CUI messages: 16µs using startContinuousPublishing (1)
Results:
Receive counter: 405496
Error counter: 0
Overruns: 818
Video 3 : Period between sending CUI messages: 24µs using startContinuousPublishing (2)
Results:
Receive counter: 398003
Error counter: 2
Overruns: 0
Video 4 : Period between sending CUI messages: 32µs using startContinuousPublishing (3)
no errors or overruns
Results:
Receive counter: 320016
Error counter: 0
Overruns: 0
Video 5: Without CUIs no errors or overruns . As you can see, the maximum Busload value is 17%
You can conclude that with the default option, the sending frequency of the CUIs is so high that it produces a large number of errors in the CAN, leading to a large number of packages being lost (in 1 minute a total of 126536 correct packages are received). While the following tests receive correct packages between 300,000 and 400,000 in 1 minute without errors or overruns. It could be concluded that the optimal configuration for a correct push send without errors and losses would be with a period of 24µs per message, changing startContinuousPublishing (0)
to startContinuousPublishing (3)
.
Going deeper, I am trying to understand the results I get (total number of packages) depending on the period of sending between messages (push type sending), controlled by the Delay10TCYx (unsigned char);
located in the CUI code . Regarding the code executed in the CUIs, it has been cleaned and simplified (thanks @PeterBowman for the help) here are the result (WIP yet).
By testing with testCuiAbsolute modified for this purpose and varying the period of sending messages with startContinuousPublishing, I've obtained the following results:
Note: Debugger CUI was a way to confirm that USB-CAN didn't lose packets, using an internal counter in the CUI code itself and checking the number of times it called the send () function. this value is the same, adding two packages (push and stop)
delay10TCYx(unsigned char)
:startContinupusPublishing(x) | testCuiAbsolute | USB-CAN interface | Debugger CUI |
---|---|---|---|
1 | 949 | 1048 | 1046 |
10 | 958 | 1058 | 1056 |
100 | 1049 | 1158 | 1156 |
200 | 1175 | 1297 | 1295 |
delay100TCYx(unsigned char)
:startContinupusPublishing(x) | testCuiAbsolute | USB-CAN interface | Debugger CUI |
---|---|---|---|
1 | 957 | 1057 | 1055 |
10 | 1049 | 1158 | 1156 |
100 | 425 | 470 | 468 |
After these tests and talking with @PeterBowman the possibility of not understanding how the Delay10TCYx (unsigned char)
function really works with respect to the oscillation frequency of the crystal, (20.000Hz ??), I've simplified the PIC code to the maximum, eliminating functions of reception and construction of the sending message, getting the least number of cycles consumed per instruction:
while(1)
{
ECANSendMessage(canId, °rees, sizeof(degrees), txFlags);
Delay100TCYx(delay);
}
The results are:
delay | packages / sec |
---|---|
100 | 474 |
10 | 3363 |
1 | 7400 |
Another test that I've done has been remove the delay function in order to find out the number of packets that I would send in a second, taking into account the time spent processing the functions of ECANReceiveMessage and the send() function dedicated of constructing the message and sending it. The result has been 1132 packages in 1 second, which would be equivalent to a period of 0.88 ms per message sent.
Testing this code :
while(1)
{
ECANSendMessage(canId, °rees_1, sizeof(degrees_1), txFlags);
Delay10KTCYx(100);
ECANSendMessage(canId, °rees_2, sizeof(degrees_2), txFlags);
}
the result is 200ms of difference between the two messages. Taking into account the equation of this link, the crystal oscillation frequency is 20Mhz
the crystal oscillation frequency is 20Mhz
Verified with the Yokogawa oscilloscope.
By applying the formula given at ref, I assembled a table for this particular cristal (20 MHz) unifying all available delay functions. First, I obtain the i
iterations necessary to (theoretically) perform a 1 second delay. On the third and four columns, the minimum and maximum delays for i=1
and i=255
are shown, respectively (since the input parameter type is unsigned char
):
DelayKTCYx(i) | i | T (i=1) [ms] | T (i=255) [ms] |
---|---|---|---|
1 | 5000000 | 0.0002 | 0.051 |
10 | 500000 | 0.002 | 0.51 |
100 | 50000 | 0.02 | 5.1 |
1000 | 5000 | 0.2 | 51 |
10000 | 500 | 2 | 510 |
As @rsantos88 pointed out, even with i=0
there is an implicit ~1 millisecond delay due to low-level PIC state transitions and hardcoded delays. The i=500
case would fit; however, I doubt we are ever going to need a 510 millisecond delay. Since 51 milliseconds is already high enough, and the 0.2 millisecond resolution is also fine, I'd choose the Delay1KTCYx
function.
PS currently using Delay10TCYx(1)
, which translates to 2 microseconds (not accounting for the implicit PIC delay).
Send code (permalink) based on documentation (page 3,4,5) (copy, copy (permalink))
Another test that I've done has been remove the delay function in order to find out the number of packets that I would send in a second, taking into account the time spent processing the functions of ECANReceiveMessage and the send() function dedicated of constructing the message and sending it. The result has been 1132 packages in 1 second, which would be equivalent to a period of 0.88 ms per message sent.
We learned today that this result is unpredictable, but seems to entail no more than a 1 ms delay due to internal hardware message buffering/queueing. By adding back the delay function call, messages are sent with the expected frequency.
Another issue arised: sometimes, the start push command renders no response from the PIC firmware. Also, we noticed that one or two error CAN frames are observed by the traffic analyzer right on application start. I presume some queued message originating from the previous run is malformed and sent in such shape by whatever device.
Current state of the apocanlypse branch fulfills the goals of the CAN-TEO project as stated in the issue description:
https://github.com/roboticslab-uc3m/yarp-devices/commit/a123bc981bbb7153b79874621065603bb6fbd3e5 refactors the CuiAbsolute device and wraps it in TechnosoftIpos so that absolute encoder reads are only requested on initial configuration; then, this value is used on normal operation
https://github.com/roboticslab-uc3m/yarp-devices/commit/a9d0ed14a33f14326d96ef13bacff17197c71307 accomplishes https://github.com/roboticslab-uc3m/yarp-devices/issues/233#issuecomment-533717255:
We want to store absolute encoder readings on device init (https://github.com/roboticslab-uc3m/yarp-devices/issues/217) so that the resulting offset value is added to the relative encoder reads on runtime. It's certainly a bit less of a trouble to call
setEncoder
on init to synchronize relative and absolute encoders instead. There is no need to involve a local variable (which should be properly set/reset bysetEncoder
in that scenario).
with i=5, the total number of packets are between 810-819 received in 1 second. Taking the highest value, the period is 1,22ms per message sent.
Due to the new code modifications, tests will be carried out on a single TEO member (for example, left arm) to check with a startPushPublishing (5)
value, the CAN traffic level.
Rescued issues with problems encountered when reprogramming PICs (https://github.com/roboticslab-uc3m/teo-hardware-issues/issues/38)
amt203-dmk-appnote.pdf
(a.k.a. AMT203_AppNote.pdf
): was exported at commit https://github.com/roboticslab-uc3m/yarp-devices/commit/d8b2c2bd5a3c3c9bb1b6a70a3d5c604ecdb7b695, imported at commit https://github.com/roboticslab-uc3m/datasheets-and-manuals/commit/f7354a4e80a362cebcc75526a61d7594d7239083.
New location: https://github.com/roboticslab-uc3m/datasheets-and-manuals/blob/master/cui/amt203-dmk-appnote.pdf (permalink)Pull command (re)implemented and successfully tested by @rsantos88, see https://github.com/roboticslab-uc3m/yarp-devices/issues/233#issuecomment-537983432.
Commit https://github.com/roboticslab-uc3m/yarp-devices/commit/8bba682c83b57f60cc0e722e4e1fa7575e22df4d introduced retries on failed CAN transfers (current default: 5 retries).
CuiAbsolute nodes are configured in CanBusControlboard initialization step to start in continuous publishing mode: ref. We only need these devices to query absolute encoder reads on start, all subsequent measures of built-in relative encoders would take into account this initial offset.
Options:
Configure Cui nodes in pull (aka RPC) mode, that is, don't send an encoder read unless requested. This option is preferable, reads would be performed on CanBusControlboard start only.
Enable Cui firmware to handle a stop publishing command.