robotology / icub-tech-support

Virtual repository that provides support requests for individual robots
GNU General Public License v2.0
20 stars 2 forks source link

ergoCub 1.0 S/N:000 – The robot sometimes completely blacks out when in battery mode #1688

Open S-Dafarra opened 10 months ago

S-Dafarra commented 10 months ago

Robot Name 🤖

ergoCub 1.0 S/N:000

Request/Failure description

It happened that during two demos, the robot suddenly shuts down completely when in battery power.

Detailed context

This happened twice in similar conditions. The robot starts in the following state:

Then, we remove the battery charger and the power supply. After 30s or so, the robot falls down and the robot is completely off, as well as the display.

It is then necessary to power cycle the battery to have the robot functional again, and the battery is still charged.

Additional context

An educated guess is that the battery BMS goes into protection. We suspect that there is an overheating issue. Indeed, the second time the robot was quite hot, but the first time we were with the robot "outside" and the weather was quite fresh.

How does it affect you?

No response

maggia80 commented 10 months ago

yes, the BMS of the battery goes in protection. In order to recover the BMS from the fault status, a switch on and off of the battery.
There are 4 possible faults:

The problem is that we cannot read these status bits... We can keep monitoring this issue and wait to get more insight

SimoneMic commented 4 months ago

Hello!

Something really similar happened to ergoCubSN001 during the tests for the demo in Florence. After an extended time of operation on battery mode, the robot completely shuts down and restarts, like a suddend void of power. At the moment of the power outage, the robot was (luckily) suspended on the crane and standing still, with the motors always on and controlled.

The differences that were present, albeit really similiar, with the original post were:

@AntonioAzocar @AntonioConsilvio were present, maybe thay can add some details that I've missed cc @S-Dafarra

S-Dafarra commented 4 months ago

The differences that were present, albeit really similiar, with the original post were:

  • The battery level was around 35%
  • The robot was operating normally for an extended period of time.
  • The robot display on the back of the robot was on, but both the motors and PCs were off (red leds)

It happened again today. It seems like the BAT rebooted for some reason. May it be a FW issue on the BAT side? @maggia80 @valegagge @MSECode

MSECode commented 4 months ago

oks, @S-Dafarra the robot was powered by battery as for previous occurrence, by power supply or by both of them? Moreover, do we have a log of the YRI? Another tings, bms and/or bat devices were enabled and so do we have some data collected by the TelemetryDeviceDumper? It would be good to check those since we can maybe have issues on the bms side or even to the battery pack. Since this happens already couple of times lately and the behavior is similar, I'm thinking that there might be issues on the battery pack. Being the robot powered by the battery pack, I'm not sure there's something wrong with the BAT fw. If that was the case you should have similar problem even when using the power supply and more frequently. Thus, please add here whatever data you have, so I'll look for any meaningful information.

S-Dafarra commented 4 months ago

@S-Dafarra the robot was powered by battery as for previous occurrence, by power supply or by both of them?

Only battery.

Moreover, do we have a log of the YRI?

I am afraid not 😢

Another tings, bms and/or bat devices were enabled and so do we have some data collected by the TelemetryDeviceDumper?

For this, I am tagging @carloscp3009 and @GiulioRomualdi, but I am not sure we were logging the data. I think the battery device was active though since we the YRI throws errors about it (something like that it detected that some boards are not streaming).

Being the robot powered by the battery pack, I'm not sure there's something wrong with the BAT fw. If that was the case you should have similar problem even when using the power supply and more frequently.

The reason why I am thinking about this is because it is happening on three different robots in very similar conditions, while it never happened on iCub3 (at least before it got 🚀 mounted on it). The probability that three battery packs are faulted in the same way seems low, and iCub3 had the same BMS. The only difference is the BAT, which is common between ergoCubs.

MSECode commented 4 months ago

For this, I am tagging @carloscp3009 and @GiulioRomualdi, but I am not sure we were logging the data. I think the battery device was active though since we the YRI throws errors about it (something like that it detected that some boards are not streaming).

Oks, the errors related to the fact that the boards are not streaming happens when a device is enabled in the xml configuration files but it's not streaming anything on CAN. It's generated by the 'can monitor', which is pinging the CAN boards every 1s.

The reason why I am thinking about this is because it is happening on three different robots in very similar conditions, while it never happened on iCub3 (at least before it got 🚀 mounted on it). The probability that three battery packs are faulted in the same way seems low, and iCub3 had the same BMS. The only difference is the BAT, which is common between ergoCubs.

Yeh, that's reasonable. I'll dig into that to see if there's something wrong in the fw. Anyway, since it seems that this issue is happening quite often lately and it is difficult to simulate it on a test setup, let's work together by having the DeviceDumper collecting data during all the experiments so it's easier for us to reach the source of the problem.

S-Dafarra commented 4 months ago

Anyway, since it seems that this issue is happening quite often lately and it is difficult to simulate it on a test setup could you please have the DeviceDumper collecting data during all the experiments so it's easier for us to reach the source of the problem.

Makes sense! Just to be sure, which data are you interested in?

S-Dafarra commented 4 months ago

Another tings, bms and/or bat devices were enabled and so do we have some data collected by the TelemetryDeviceDumper?

For this, I am tagging @carloscp3009 and @GiulioRomualdi, but I am not sure we were logging the data. I think the battery device was active though since we the YRI throws errors about it (something like that it detected that some boards are not streaming).

I checked with them and we inadvertently deleted all the logs because we ran out of HD. We did not think of keeping that one.

valegagge commented 4 months ago

Remember to dump the data on a laptop! 😉

valegagge commented 4 months ago

Anyway, since it seems that this issue is happening quite often lately and it is difficult to simulate it on a test setup could you please have the DeviceDumper collecting data during all the experiments so it's easier for us to reach the source of the problem.

Makes sense! Just to be sure, which data are you interested in?

We need the output of the battery devices (both for BAT and BMS), but @MSECode can add more details and ideas.

MSECode commented 4 months ago

Exactly, since those device ports, if enabled, are already opened when the YRI starts we can add them to your TelemetryDeviceDumper and get out the data

S-Dafarra commented 4 months ago

Remember to dump the data on a laptop! 😉

Unfortunately, it was the laptop to be full. We were focused on an experiment and we were logging a lot of data, and to continue with the experiment we decided to delete the past logs.

Exactly, since those device ports, if enabled, are already opened when the YRI starts we can add them to your TelemetryDeviceDumper and get out the data

Yeah, I think that at the moment that data is not logged

S-Dafarra commented 3 months ago

Hi all, soon we are going to have a demo on a stage without any cable. Do you think there is something that is possible to do in order to reduce the likelihood of this happening?

Can this be related to https://github.com/robotology/icub-firmware/pull/421 since it started happening afterward?

valegagge commented 3 months ago

Hi @S-Dafarra, today we gonna check if those changes caused the issue.

In the meantime, did you dump some data?

cc @MSECode

S-Dafarra commented 3 months ago

In the meantime, did you dump some data?

I could not since I was busy all week in Washington for a demo. Moreover, to the best of my knowledge, it did not appear again lately.

MSECode commented 3 months ago

Hi @S-Dafarra, I checked out the code but it seems all fine to me. The changes done in that PR should be fine. I do not notice anything strange. Anyways, is the latest version of the BAT, i.e. 1.3.4 flashed on all the robot you are using? And I'll keep this issue monitored eventually.