Closed btsimonh closed 2 years ago
hmm thinking about this - Q: what NimBLE callbacks originate in the ipc0 task? - probably one of MY callback functions is using too much stack, and the canary was only found corrupt by the above. I'm doing a test with a hand-modified sdkconfig.h - I'm not sure if this will actually increase the IPC stack size or not - 1024->2048 (no experience with sdkconfig here). Time will tell. Crash is now confirmed on other devices, but we also have devices with uptime > 20 hours, where some are barely managing 5 mins :(.
Keep in mind that the device with the 22 hours uptime is only listening to the BLE adverts, nothing else. My other device which queries the TRV's every minutes crashes indeed on a regular basis.
You need commit 08be9140c313af980331d7d8967563ee8c254489 from this repo.
@h2zero - thankyou. Will go for the NimBLE upgrade now :). Was resisting just because we had enough issues of our own, but we're now pretty clear of them, so that time has arrived .... Expect some feedback later today.
It's a very ugly hack but I could see no other way to maintain support for older arduino versions and the latest master. Will be removed in the future if arduino core ever does a new release.
ok, it's running with latest master, without my 'getResult' in NimBLEClient. only 15 mins so far. Will report.
looking good so far.
Q: in Tasmota, we had these modifications in place to support a unicore ESP32 (ESP32solo1): https://github.com/arendst/Tasmota/commit/b11f1b5483a9baa264adb1ef49688d471f6fea80
Looking at your new commit, I'm not sure we need them anymore? Your thoughts? br,
S
Glad it's working for you so far. You shouldn't need those mods anymore, as it will always use the same core as NimBLE stack for commands except when using arduino rc5/6, but if you're running unicore then that shouldn't matter.
Working on a better solution to this than the current implementation, does not require idf commit checks. promising results so far.
hi h2zero, just popping back to report general success... most people are experiencing good uptime. One report today:
16:10:33.996 Mode a: bat updated
16:10:34.056 MI32: scancomplete
E (7737992) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (7737992) task_wdt: - IDLE0 (CPU 0)
E (7737992) task_wdt: Tasks currently running:
E (7737992) task_wdt: CPU 0: btController
E (7737992) task_wdt: CPU 1: IDLE1
E (7737992) task_wdt: Aborting.
abort() was called at PC 0x40188188 on core 0
ELF file SHA256: 0000000000000000
Backtrace: 0x40091054:0x3ffbea80 0x400912d1:0x3ffbeaa0 0x40188188:0x3ffbeac0 0x40084ac9:0x3ffbeae0 0x400158ad:0x3ffb5ef0 0x401b95cb:0x3ffb5f10 0x401b8a8b:0x3ffb5f30 0x4009226a:0x3ffb5f60
Rebooting...
ets Jun 8 2016 00:22:57
very little to go on - except that 'btController' is hogging the CPU for a long time. This could be in esp library, your code, or a BT callback :(. I take it that there is nothing to be gained from any of the backtrace numbers here - we don't get the PC of the actual looping code...?
anyway, so far, much better - thanks again for all your help.
Glad it's working well, sadly that exception you posted is an upstream issue. I have reported it here https://github.com/espressif/esp-idf/issues/6417 and there are a couple other reports of this there as well.
Which arduino commit is this on? I have only seen it in 1.0.5-rc6 but not often enough to find a reliable way to reproduce.
this one: 5dc72ab10d9f928442a25ef3bdcf8a31a7e16301e089fae - 17th jan Was master on the day you told me to update :). I hear you did a release version - anything significant above this? Sorry, with RC6.
There's a couple improvements to client in 1.1.0 that you may want.
Yep rc6 seems to have this issue instead of the disconnect bug. I haven't seen it happen yet with https://github.com/espressif/arduino-esp32/commit/cee659563d38434abbe7fb3523f186f0f308bffc.
I just started getting this randomly on scans (seems to be a random core every time too). I think it's related to increasing my scan timings to:
pBLEScan->setInterval(550);
pBLEScan->setWindow(500);
I'll try lowering them and see if it fixes the issue.
Okay, so mine probably isn't related to this.
A long callback with a bunch of if tests in it causes the crash every single time when it his the scan request.
A short callback with the scan request never crashes.
I'll just set up a non blocking flag for the scan. Weirdly it never had this issue before and I haven't changed that part of my code in a bit.
Have you updated the arduino core sources? I'm curious what commits do this so I can try to pinpoint the location of the issue.
Thanks for the input though, I'll have to try making my callback take some time and see if it triggers.
Yes.
I need to switch to using a release.
It just asked me to sync to master a couple days ago.
I should also point out that it was within a webserver callback where I had issues.
This long callback crashes every single time as soon as it hits the scan with the latest arduino: (btw) I even tried setting a 10 second vtaskdelay (maybe clear the radio is my thought) prior to the scan and then it just waits 10 seconds before the crash). I also tried moving it before and after the html send.
server.on("/send_settings", []() { String tString; bool wasBTUpdate = false; if (!server.arg("ssid").isEmpty()) { tString = server.arg("ssid"); tString.trim(); userConfig.setSsid(tString); } if (!server.arg("password").isEmpty()) { tString = server.arg("password"); tString.trim(); userConfig.setPassword(tString); } if (!server.arg("deviceName").isEmpty()) { tString = server.arg("deviceName"); tString.trim(); userConfig.setDeviceName(tString); } if (!server.arg("shiftStep").isEmpty()) { userConfig.setShiftStep(server.arg("shiftStep").toInt()); } if (!server.arg("stepperPower").isEmpty()) { userConfig.setStepperPower(server.arg("stepperPower").toInt()); updateStepperPower(); } //checkboxes don't report off, so need to check using another parameter that's always present on that page if (!server.arg("stepperPower").isEmpty()) { if (!server.arg("autoUpdate").isEmpty()) { userConfig.setAutoUpdate(true); } else { userConfig.setAutoUpdate(false); } if (!server.arg("stealthchop").isEmpty()) { userConfig.setStealthChop(true); updateStealthchop(); } else { userConfig.setStealthChop(false); updateStealthchop(); } } if (!server.arg("inclineMultiplier").isEmpty()) { userConfig.setInclineMultiplier(server.arg("inclineMultiplier").toFloat()); } if (!server.arg("blePMDropdown").isEmpty()) { wasBTUpdate = true; if (server.arg("blePMDropdown")) { tString = server.arg("blePMDropdown"); userConfig.setConnectedPowerMeter(server.arg("blePMDropdown")); } else { userConfig.setConnectedPowerMeter("any"); } } if (!server.arg("bleHRDropdown").isEmpty()) { wasBTUpdate = true; if (server.arg("bleHRDropdown")) { tString = server.arg("bleHRDropdown"); userConfig.setConnectedHeartMonitor(server.arg("bleHRDropdown")); } else { userConfig.setConnectedHeartMonitor("any"); } }
if (!server.arg("session1HR").isEmpty()) //Needs checking for unrealistic numbers.
{
userPWC.session1HR = server.arg("session1HR").toInt();
}
if (!server.arg("session1Pwr").isEmpty())
{
userPWC.session1Pwr = server.arg("session1Pwr").toInt();
}
if (!server.arg("session2HR").isEmpty())
{
userPWC.session2HR = server.arg("session2HR").toInt();
}
if (!server.arg("session2Pwr").isEmpty())
{
userPWC.session2Pwr = server.arg("session2Pwr").toInt();
if (!server.arg("hr2Pwr").isEmpty())
{
userPWC.hr2Pwr = true;
}
else
{
userPWC.hr2Pwr = false;
}
}
String response = "<!DOCTYPE html><html><body><h2>";
if (wasBTUpdate) //Special BT update response
{
response += "Selections Saved!</h2></body><script> setTimeout(\"location.href = 'http://" + String(userConfig.getDeviceName()) + ".local/bluetoothscanner.html';\",1000);</script></html>";
spinBLEClient.serverScan(true);
}
else
{ //Normal response
response += "Network settings will be applied at next reboot. <br> Everything else is availiable immediatly.</h2></body><script> setTimeout(\"location.href = 'http://" + String(userConfig.getDeviceName()) + ".local/index.html';\",1000);</script></html>";
}
server.send(200, "text/html", response);
debugDirector("Config Updated From Web");
userConfig.saveToSPIFFS();
userConfig.printFile();
userPWC.saveToSPIFFS();
userPWC.printFile();
});
And this short callback never causes an issue:
server.on("/BLEScan", []() { debugDirector("Scanning for BLE Devices"); String response = "<!DOCTYPE html>
Scanning for BLE Devices. Please wait 5 seconds."; spinBLEClient.serverScan(true); server.send(200, "text/html", response); });I'm setting a flag now with a separate non-blocking task initiating the scan without issue.
Thanks, looks like the issue has been pinpointed by another posted here https://github.com/espressif/esp-idf/issues/6358
Looks like you found a workaround for your use at least. I'm curious if you changed the 5 second delay to just a taskYIELD() if it would alleviate this.
@h2zero, next report from my testing guy....:
I'm sorry: but the test ESP (connected to serial) crashed a few hours ago (the others are still up and running)
13:13:20.047 WIF: Checking connection...
13:13:40.032 WIF: Checking connection...
13:14:00.026 WIF: Checking connection...
E (78418071) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (78418071) task_wdt: - IDLE0 (CPU 0)
E (78418071) task_wdt: Tasks currently running:
E (78418071) task_wdt: CPU 0: btController
E (78418071) task_wdt: CPU 1: IDLE1
E (78418071) task_wdt: Aborting.
abort() was called at PC 0x40163394 on core 0
ELF file SHA256: 0000000000000000
Backtrace: 0x40091054:0x3ffbfde0 0x400912d1:0x3ffbfe00 0x40163394:0x3ffbfe20 0x400860c9:0x3ffbfe40 0x400158ad:0x3ffe2380 0x401c03a3:0x3ffe23a0 0x401bf867:0x3ffe23c0 0x400922d2:0x3ffe23f0
Rebooting...
ets Jun 8 2016 00:22:57
rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DOUT, clock div:2
load:0x3fff0018,len:4
load:0x3fff001c,len:1044
load:0x40078000,len:8896
load:0x40080400,len:5828
entry 0x400806ac
That is running your build, so you should have the proper map file. On the other hand, looks like this is not in your code anyway (btController)
BuildDateTime":"2021-01-21T16:46:49"
same issue.... release nimble + that arduino commit :(.....
but the reporter also said: "the others are still up and running" - about 1 1/2 days.
E (78418071) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time: E (78418071) task_wdt: - IDLE0 (CPU 0) E (78418071) task_wdt: Tasks currently running: E (78418071) task_wdt: CPU 0: btController
That says it all lol, not much we can do until espressif issues a fix. older arduino cores don't seem to have this bug, but they have the disconnect bug that we can at least work around.
it could still be in your code or mine (: - 'cos they call us on that task? or not?
It's not in our code as this has been reported by others using IDF directly (not this library). Here is the report from a IDF user, same backtrace: espressif/esp-idf#6358
good call :).
does the released 1.0.5 bring anything over rc6/rc7? s
Possibly, looks like there might have been some controller updates, I don't know what they are though.
Update to this; Looks like there may be a fix soon but might not be in ardruino for a while https://github.com/espressif/esp32-bt-lib/commit/950f7886b6a28734c625f979c2a73752db1dfb19
Though it says wifi, I'm hopeful that it solves the same issue as it's in the BT repo. Closed source sucks, not knowing what's actually fixed is frustrating.
Update to this; Looks like there may be a fix soon but might not be in ardruino for a while espressif/esp32-bt-lib@950f788
Though it says wifi, I'm hopeful that it solves the same issue as it's in the BT repo. Closed source sucks, not knowing what's actually fixed is frustrating.
Obviously it's the issue contained in 950f788 that was fixed.....
:grin:
Damn, no idea how I missed that, thanks for pointing it out!
😃
Closing as this should be resolved. Please reopen if not.
hi @h2zero,
with the vanilla latest release version of arduino, I'm getting these regularly:
Although not directly NimBLE related, do you have any advice? (still in an older NimBLE)
br,
s