meshtastic / firmware

Meshtastic device firmware
https://meshtastic.org
GNU General Public License v3.0
3.62k stars 903 forks source link

[Bug]: Low Battery voltage detection threshold is way too low for the most LDOs being used on ESP32 boards #5259

Open dm5tt opened 2 weeks ago

dm5tt commented 2 weeks ago

Category

Other

Hardware

Other

Firmware Version

2.5.x

Description

Did a bit of code reviewing.. still hunting our ESP32 LittleFS corruption bug.

Problem

If I understand this piece of code correctly:

Power.cpp, 714:
    // If we have a battery at all and it is less than 0%, force deep sleep if we have more than 10 low readings in
    // a row. NOTE: min LiIon/LiPo voltage is 2.0 to 2.5V, current OCV min is set to 3100 that is large enough.
    //
    if (batteryLevel && powerStatus2.getHasBattery() && !powerStatus2.getHasUSB()) {
        if (batteryLevel->getBattVoltage() < OCV[NUM_OCV_POINTS - 1]) {
            low_voltage_counter++;
            LOG_DEBUG("Low voltage counter: %d/10", low_voltage_counter);
            if (low_voltage_counter > 10) {
#ifdef ARCH_NRF52
                // We can't trigger deep sleep on NRF52, it's freezing the board
                LOG_DEBUG("Low voltage detected, but not triggering deep sleep");
#else
                LOG_INFO("Low voltage detected, triggering deep sleep");
                powerFSM.trigger(EVENT_LOW_BATTERY);
#endif
            }
        } else {
            low_voltage_counter = 0;
        }
    }
}

We go to Deep Sleep with our ESP32 device as soon the battery reaches 3100mV

#else // LiIon
#define OCV_ARRAY 4190, 4050, 3990, 3890, 3800, 3720, 3630, 3530, 3420, 3300, 3100
#endif

Most (>90%) of our devices are using a 3V3 LDO that will crap out as soon the battery goes below 3.4V (the good ones) or even higher if bad LDOs (looking at you AMS1117) are being used. This effect is almost binary.

This renders the Low Battery protection basically useless.

Suggested Solution

  1. Increase generic minimum detection to at least 3420mV or even 3530mV. I'd vote vote 3530mv as the device should survive a while on deep sleep.

  2. The threshold should be configurable from the board definition files

Relevant log output

No response

thebentern commented 2 weeks ago

I'm in favor of at least bumping this on at least a subset of variants in a generic way if it makes a more reliable experience with device flash persistence.

HarukiToreda commented 2 weeks ago

I can run a tests on most devices to find the exact voltage that causes a reboot at witch we could calculate the minimum percentage to which we could set the node to enter deepsleep. Just need a few days to gather that data.

dm5tt commented 2 weeks ago

I can run a tests on most devices to find the exact voltage that causes a reboot at witch we could calculate the minimum percentage to which we could set the node to enter deepsleep. Just need a few days to gather that data.

Sounds good. I roughly expect a brown-out below 3.3V+100..200mV.

dm5tt commented 2 weeks ago

Directly related ticket: #5199

Dude5101 commented 2 weeks ago

Directly related ticket: #5199

Thanks for picking this up and logging a bug

todd-herbert commented 2 weeks ago

Separately to the shutdown voltage, I'd like to suggest adjusting the (default Li Ion) curve which estimates battery % based on voltage. As a possible starting point, I have visually fitted a curve to match general observations by @ianmcorvidae.

plotted soc estimation curve suggestion

#define OCV_ARRAY 4190, 4100, 4090, 4080, 4040, 3990, 3920, 3850, 3760, 3600, 3200

Update: doesn't seem quite right yet. Needs some tweaking.


Currently, the shutdown voltage is defined by the final value in that OCV_ARRAY. I think it would make sense to define this independent of the OCV_ARRAY (at least with Li Ion, for now). One consideration is that a separate shutdown voltage of, for example, 3530mV, would see the device power off while still reporting a percentage of 5%. This might be desirable; keeping the user informed of the genuine state of the battery. Alternatively, we could remap reported percentages to have 0% reflect a value of 3530mV etc.

What's the opinion on these two thoughts?

thebentern commented 2 weeks ago

What's the opinion on these two thoughts?

Perhaps for the OCV_ARRAY we should ensure that the device state is save to the file system (once) while in the N-1 voltage stage of runtime, and then shutdown without saving to the filesystem at the N stage?

Anticipating that we'll really also need a way to ensure that if we start back up that the device still not in the risky voltage range. Unfortunately loading all of the protos from the file system is one of the first operations. Long term, we'll almost need a captive Low-power - please charge screen IMO

dm5tt commented 2 weeks ago

Can't we enforce a "Read Only"?

HarukiToreda commented 2 weeks ago

Brownout volatage

Here's so of the results I got so far for Brownout voltages

dm5tt commented 2 weeks ago

Good work!

Was this voltage measured under load?

HarukiToreda commented 2 weeks ago

Good work!

Was this voltage measured under load?

yes, battery connected with the nodes active and measured from the power connector leads. Logged the last voltage before the nodes turned off.

Dude5101 commented 2 weeks ago

Brownout volatage

Here's so of the results I got so far for Brownout voltages

Nice, thanks for testing

todd-herbert commented 2 weeks ago

@thebentern

Perhaps for the OCV_ARRAY we should ensure that the device state is save to the file system (once) while in the N-1 voltage stage of runtime, and then shutdown without saving to the filesystem at the N stage?

Makes sense to me! I assume the cause of the reboot loop and corruption is brown out because of that final flash write, yeah?

One concern is that that OCV_ARRAY might not be granular enough to simply pick the N or N-1 element as the ideal shutdown voltage. Although the curve I suggested in https://github.com/meshtastic/firmware/issues/5259#issuecomment-2458217232 does seems to be too extreme, it does highlight the potential issue here, with the final two elements having been set at 3600mV and 3200mV.

Anticipating that we'll really also need a way to ensure that if we start back up that the device still not in the risky voltage range. Unfortunately loading all of the protos from the file system is one of the first operations. Long term, we'll almost need a captive Low-power - please charge screen IMO

I've had success doing something similar to this with RAK4631 before; checking voltage immediately at boot, and entering a low power delay() loop (similar to tracker role) for 30 minutes to conserve power. This was to allow a solar panel to recharge the battery to a stable level before normal boot is permitted.

It might be an idea to allow a manual override of a captive "please charge" screen: "press user button within 5 seconds to continue boot".

todd-herbert commented 2 weeks ago

One other quick thought: the low voltage counter requires 10 readings below the 3.1V threshold before triggering. This, along with the delaying effect of the smoothing filter, can mean that the true voltage has dropped significantly below 3.1V before shutdown actually occurrs, especially with lower capacity batteries.

thebentern commented 2 weeks ago

One other quick thought: the low voltage counter requires 10 readings below the 3.1V threshold before triggering. This, along with the delaying effect of the smoothing filter, can mean that the true voltage has dropped significantly below 3.1V before shutdown actually occurrs, especially with lower capacity batteries.

The background on this is that we had a ton of false positive shutdowns in the past, so I'd encourage being very careful with relaxing anything there.

todd-herbert commented 1 week ago

@HarukiToreda @Dude5101 Have either of you adjusted the ADC multiplier from the default? If possible, it'd be interesting to hear if the voltage reported by the firmware matches a multimeter reading.

Dude5101 commented 1 week ago

@HarukiToreda @Dude5101 Have either of you adjusted the ADC multiplier from the default? If possible, it'd be interesting to hear if the voltage reported by the firmware matches a multimeter reading.

With stock firmware it is pretty close on the Heltec V3 as far as I remember. I modified the ADC to 4.6 on the Heltec to get it to go into deep sleep at 3.3v so that it doesn't brown out. Actual voltage measured with a multimeter is 3.3v reported voltage is 3.1v

The LilyGo T3S3 does not brownout at 3.1V but with stock firmware the voltage reading is not accurate. I adjusted the ADC to 2.011 then the voltage is accurate.

todd-herbert commented 1 week ago

I've been testing with Heltec Wireless Paper. The reported ADC voltages are correct, however only 80% is reported on full charge. My reason for asking was in case if you had noticed something similar with Heltec V3 and bumped the ADC manually to reach 100%.

I haven't seen any issues yet with this hardware entering deep sleep at 3.1V, but I appreciate that this is more likely to affect certain models.

Dude5101 commented 1 week ago

I've been testing with Heltec Wireless Paper. The reported ADC voltages are correct, however only 80% is reported on full charge. My reason for asking was in case if you had noticed something similar with Heltec V3 and bumped the ADC manually to reach 100%.

I haven't seen any issues yet with this hardware entering deep sleep at 3.1V, but I appreciate that this is more likely to affect certain models.

Good check but I didn't modify ADC and experience brownout at 3.1/3.2v

I think this will become a rabbit hole especially with small batteries/cables and voltage sag.

In my related bug I requested user adjustable deep sleep voltage. This will probably solve the problem for most and can be the recommendation if users experience brownout after your efforts to protect memory from corruption.

"My board still experiences brownout issues", solution -> bump the deep sleep voltage up and see if it solves the problem. If it does check your installation for bad contacts or a small/old battery.