robotology / icub-tech-support

Virtual repository that provides support requests for individual robots
GNU General Public License v2.0
20 stars 2 forks source link

ergoCub 1.1 S/N:001 – Left arm suddently stopped streaming data #1865

Closed S-Dafarra closed 1 month ago

S-Dafarra commented 2 months ago

Robot Name 🤖

ergoCub 1.1 S/N:001

Request/Failure description

While during normal teleoperation, one arm suddenly stopped with errors like

[ERROR] |yarp.devices.controlBoard_nws_yarp|left_arm-mc_nws_yarp| Encoder timestamps are not consistent! Data will not be published.

Detailed context

Here a portion of the log: left_arm_issue.txt Here the full log: left_arm_issue_full.zip

Additional context

No response

How does it affect you?

No response

S-Dafarra commented 2 months ago

Rebooting the robot seemed to be enough, still it should not stop streaming the data

S-Dafarra commented 2 months ago

Ciao @Gandoo, @maggia80. I imagine that this issue is related to the faulty CAN cable of https://github.com/robotology/icub-tech-support/issues/1867#issuecomment-2258585871, right?

S-Dafarra commented 2 months ago

Unfortunately it is still happening. By editing the error message, it turns out it is the wrist yaw the first joint for which the timestamp is not consistent. This might indicate that it is not possible to communicate to the AMC board.

@AntonioAzocar also noted a particular LED lighting pattern on the board, and we are not sure it is normal.

cc @MSECode @marcoaccame

AntonioAzocar commented 2 months ago

Hello everyone, To get all the info, I attach the video of the led

https://github.com/user-attachments/assets/d1dbdbc8-1706-44aa-8ffe-4dd70a0ad9e7

cc @S-Dafarra @MSECode @marcoaccame

MSECode commented 2 months ago

Looking at logs I saw that in multiple sections there are these errors and warnings, which seems to be related to the ETH communication, and they are actually related to the board we have issues on:

image

Therefore it might be that the board is not communicating good over ETH. I'll suggest then to do the following:

in the meanwhile I'll double check the issue and the logs with @marcoaccame this afternoon to get a better idea of the problem

S-Dafarra commented 2 months ago

I did check, and it was possible to ping it. I noticed those errors appear when trying to close the yarprobotinterface. My hunch is that the board gets blocked somehow, and when we try to close the device also the network communication gets blocked. In fact, if we want to restart the robot, we also need to restart the motors.

traversaro commented 2 months ago

Not directly related to the issue (the root issue is indeed in EMS communication) but just for reference this is related to https://github.com/robotology/yarp/issues/2939 .

marcoaccame commented 2 months ago

hi all, @MSECode and I will analyze in more details all available information asap later today. for now we saw:

long story short: second point may explain the problem. we need to understand why it happens and if it happened before.

marcoaccame commented 2 months ago

@MSECode and I have had a first analysis:

  • fw version is not the latest

if possible we advice to upgrade to latest devel: icub-main, icub-firmware-shared and flash latest binaries

  • yri often loses contact with board eb31 for times ranging 40 ms to some seconds.

long story short: second point may explain the problem. we need to understand why it happens and if it happened before.

yri loses contact because... link between eb25 and eb31 continually goes down and up again. see:

ERROR] from BOARD 10.0.1.25 (left_arm-eb25-j11_12) time=4108s 545m 416u :  ETH monitor: link goes down  in port ETH output (P3/P12/J5). Application state is unknown.
 .....
 .....
[ERROR] from BOARD 10.0.1.25 (left_arm-eb25-j11_12) time=4110s 446m 416u :  ETH monitor: link goes up. in port ETH output (P3/P12/J5). Application state is unknown.

we shall try to re-crimp the cable.

S-Dafarra commented 2 months ago

we shall try to re-crimp the cable.

@AntonioAzocar tried to recrimp the cable some days ago, but it also happened afterwards. He also mentioned that the connector on the board is not very firm.

Discussing with @AntonioConsilvio we also thought that a good test could be that when the failure happens we try to ping the AMC board (10.0.1.31) board.

cc @CarlottaSartore

marcoaccame commented 2 months ago

we shall try to re-crimp the cable.

@AntonioAzocar tried to recrimp the cable some days ago, but it also happened afterwards. He also mentioned that the connector on the board is not very firm.

The link must not go down and up. If it does that you loose all UDP frames beyond the line interruption. The link down is due to one of the four wires of the ETH cable that is interrupted. In this case it may be due to the movement or to the bending of the link. The link can be the cable, the crimp but it may be also the connector on the board.

Discussing with @AntonioConsilvio we also thought that a good test could be that when the failure happens we try to ping the AMC board (10.0.1.31) board.

cc @CarlottaSartore

the ping may say that all is OK even with link down for a short time. Sometimes I can see messages telling that the link is down just for a small amount of time and the ping may just report OK and only some ms more roundtrip time. The HW check of the link status on the ETH boards on the other hand is done every 100 ms and tells where we have link or not.

AntonioConsilvio commented 2 months ago

Hi @S-Dafarra! With the help of @marcoaccame and @MSECode, we realised that the problem was with the power supply of the AMC EB31 board, which was rebooting occasionally.

Trying to find the cause, @fgarini noticed that something was touching the back of the board.

In fact, the thumb tendon was broken and touching the back of the board, occasionally shorting out the power supply:

IMG_20240808_170751

I replaced the tendon and, together with @AntonioAzocar, we applied Kapton to the back of the AMC to prevent it from short-circuiting again.

The robot has been tested and works fine now! ✅

Thank you all for your help! 🚀

AntonioConsilvio commented 1 month ago

Since the problem did not reoccur, I proceed with the closure of the issue! ✅