Open dm5tt opened 2 weeks ago
This issue has been mentioned on Meshtastic. There might be relevant details there:
https://meshtastic.discourse.group/t/heltec-v3-random-factory-resets/13769/16
Played around a bit more.
Seems like when the ESP32 is running into a power loss during the early phase of the first file system operations it will reset the configuration:
INFO | ??:??:?? 0 Booted, wake cause 0 (boot count 1), reset_reason=reset
DEBUG | ??:??:?? 0 Filesystem files (16384/1048576 Bytes):
DEBUG | ??:??:?? 0 /prefs/channels.proto (57 Bytes)
DEBUG | ??:??:?? 0 /prefs/config.proto (173 Bytes)
DEBUG | ??:??:?? 0 /prefs/db.proto.tmp (365 Bytes)
DEBUG | ??:??:?? 0 /prefs/module.proto (95 Bytes)
[ 473][I][esp32-hal-i2c.c:75] i2cInit(): Initialising I2C Master: sda=18 scl=19 freq=100000
DEBUG | ??:??:?? 0 Using analog input 1 for battery level
INFO | ??:??:?? 0 ADCmod: ADC characterization based on Two Point values stored in eFuse
INFO | ??:??:?? 0 Scanning for i2c devices...
DEBUG | ??:??:?? 0 Scanning for I2C devices on port 1
INFO | ??:??:?? 0 No I2C devices found
DEBUG | ??:??:?? 0 acc_info = 0
INFO | ??:??:?? 0 S:B:53,2.5.0.d420433c
DEBUG | ??:??:?? 0 Total heap: 231364
DEBUG | ??:??:?? 0 Free heap: 199860
DEBUG | ??:??:?? 0 Total PSRAM: 0
DEBUG | ??:??:?? 0 Free PSRAM: 0
DEBUG | ??:??:?? 0 NVS: UsedEntries 137, FreeEntries 493, AllEntries 630, NameSpaces 7
DEBUG | ??:??:?? 0 Setup Preferences in Flash Storage
DEBUG | ??:??:?? 0 Number of Device Reboots: 26
ESP_ERROR_CHECK_WITHOUT_ABORT failed: esp_err_t 0x105 (ESP_ERR_NOT_FOUND) at 0x4203701b
file: "src/platform/esp32/BleOta.cpp" line 16
func: static const esp_partition_t* BleOta::findEspOtaAppPartition()
expression: esp_ota_get_partition_description(part, &app_desc)
ESP_ERROR_CHECK_WITHOUT_ABORT failed: esp_err_t 0x102 (ESP_ERR_INVALID_ARG) at 0x4203b043
file: "src/platform/esp32/BleOta.cpp" line 30
func: static String BleOta::getOtaAppVersion()
expression: esp_ota_get_partition_description(part, &app_desc)
!Power-loss roughly happened here!
DEBUG | ??:??:?? 0 No OTA firmware available
INFO | ??:??:?? 0 Initializing NodeDB
[ 637][E][vfs_api.cpp:105] open(): /littlefs/prefs/db.proto does not exist, no permits for creation
ERROR | ??:??:?? 0 Could not open / read /prefs/db.proto
WARN | ??:??:?? 0 Devicestate 0 is old, discarding
INFO | ??:??:?? 0 Performing factory reset!
DEBUG | ??:??:?? 0 Deleting /prefs/channels.proto
as i see, on every boot looking in internal db, found self, and generate new src/mesh/NodeDB.cpp#L622-L627 Log example:
WARN | ??:??:?? 1 NOTE! Our desired nodenum 0x51e8f344 is invalid or in use, so trying for 0x41412e1
I just tested this rather ugly hack:
NodeDB.cpp, loadProto(...):
uint32_t retry = 0;
while(state == LoadFileResult::OTHER_FAILURE && retry < 5)
{
if (f) {
LOG_INFO("Loading %s\n", filename);
pb_istream_t stream = {&readcb, &f, protoSize};
memset(dest_struct, 0, objSize);
if (!pb_decode(&stream, fields, dest_struct)) {
LOG_ERROR("Error: can't decode protobuf %s\n", PB_GET_ERROR(&stream));
state = LoadFileResult::DECODE_FAILED;
} else {
LOG_INFO("Loaded %s successfully\n", filename);
state = LoadFileResult::LOAD_SUCCESS;
}
f.close();
break;
} else {
LOG_ERROR("Could not open / read %s, attempt %u\n", filename, retry);
retry++;
delay(500);
}
Basically giving it a few retries to read the files. The trick is to be slow enough so we can be 100% sure that we are leaving the short duration where the ESP32 is dying away because of power loss. Most likely it will fail if the devices are hovers around long enough.
Really great investingation @dm5tt! I think you've found two problems.
(probably) the ESP32 init code isn't enabling the brownout detector properly. Because if Vcc is getting low enough that the SPI flash is crapping out the brown-out circuit (if enabled) should be holding the main CPU in reset. (This is sorta related to #4378 on nrf52 though a different cause)
Repeatedly generating a new node ID due to (our) nodenum being found in the internal db is super bad. I've made a new issue to track that separately: #4559.
I'm finishing up some heltec tracker power improvements today (and possibly tomorrow). But I'll eagerly work on these tue or weds.
The most glaring problem is the problem of the controller freezing when powered by the sun. The effect is not known to everyone and it is mostly encountered by designers of weather stations or small repeaters. When the night time of the day or cloudy weeks comes, the moment comes when the power source exhausts its resource and the power supply decreases so much that it begins to be insufficient for stable operation, and when the sun appears, the power supply voltage begins to increase slowly (not in a jump), however, the controller does not go to working mode To eliminate this effect, there are power supervisors... But this is a hardware problem caused by ill-conceived Chinese circuitry. But software has the ability to hang for various reasons. And when the device is physically nearby, the problem is small - I clicked RESET or switched the power and it's on the air again. But with remote use, especially in street conditions, in conditions of the impossibility of repeated physical access, for example to the roof, the problem comes to the fore. The solution methods are as old as the world and are known since the time of operation of external access points, weather stations and repeaters
When the night time of the day or cloudy weeks comes, the moment comes when the power source exhausts its resource and the power supply decreases so much that it begins to be insufficient for stable operation, and when the sun appears, the power supply voltage begins to increase slowly (not in a jump), however, the controller does not go to working mode To eliminate this effect, there are power supervisors...
The ESP32 has an integrate brown-out detection for this. And it can be set high enough (~2.9-3.2V) so that it can reset the entire device to a safe reset state while still being fully functional before coming even close to dangerous areas where the SPI flash starts dying.
When the night time of the day or cloudy weeks comes, the moment comes when the power source exhausts its resource and the power supply decreases so much that it begins to be insufficient for stable operation, and when the sun appears, the power supply voltage begins to increase slowly (not in a jump), however, the controller does not go to working mode To eliminate this effect, there are power supervisors...
The ESP32 has an integrate brown-out detection for this. And it can be set high enough (~2.9-3.2V) so that it can reset the entire device to a safe reset state while still being fully functional before coming even close to dangerous areas where the SPI flash starts dying.
I know about brown-out detection. It has false positives. To increase reliability and avoid cyclic reboots in my projects, I disable brown-out detection altogether
(((uint32_t volatile)ETS_UNCACHED_ADDR((DR_REG_RTCCNTL_BASE + 0xd4)))) = 0
And judging by this study https://www.esp32.com/viewtopic.php?t=38178 - the peripheral low voltage reset device is practically useless. An external hardware supervisor (3-pin case) will provide reliability and a reset guarantee than the crooked implementation in the ESP32 itself
btw - I'm starting work on investigating/fixing this one tomorrow. will be a few days before a PR or resolution tho.
ESP32C3?
Alas - I ran out of time for meshtastic futzing before leaving on a long bike trip (with no computer/minimum cell-phone access).
I won't be able to work on this until I return to my desk (on Oct 1, but realistically need a few days to complete a move - so should be back at meshtastic about Oct 7). If someone wants to work on it before then then great - but if still open when I return I'll continue my investigation. Have a nice September ya'll!
I'm trying to force the Brownout-detection right now:
REG_SET_BIT(RTC_CNTL_BROWN_OUT_REG, RTC_CNTL_BROWN_OUT_ENA); // Enable brownout
WRITE_PERI_REG(RTC_CNTL_BROWN_OUT_REG, 7); // Set threshold to max (3.0V)
But as the device in placed on the roof there are no live logs from it. Just rough uptime values into my InfluxDB.
@dm5tt any luck?
Not Yet.
As I'm using pretty big batteries (3000...4000mAh) this doesn't happen that often to me. The patch above (RTC_CNTL_BROWN_OUT_REG) is integrated into my Fork but not properly tested yet.
Lets wait for geeksville to return. Then we maybe can start a coordinated effort hunting this issue down.
This issue has been mentioned on Meshtastic. There might be relevant details there:
https://meshtastic.discourse.group/t/three-rak4631-lose-their-minds/11928/8
Looks like the RAK4631 have a similar problem.
Category
Other
Hardware
Other
Firmware Version
v2.5.0.ab7de7f and 2.4.x
Description
Device under test
Heltect HT-CT62 but other ESP32 device seem to be affected too. See Meshtastic Discourse.
I only modified the pin configuration a bit - no other functional changes.
Affected versions
2.4.x and v2.5.0.ab7de7f
People wrote that that last working version was 2.3.15. Didn't test this yet.
How to reproduce
The device resets itself to a default configuration after ~2-3 attempts.
Is this bug relevant
Yes. When using solar driven stations we quite often can run into a brown out situation when the battery isn't fully yet charged and/or sun radiation isn't yet able to fully power the station.
Relevant log output