homieiot / homie-esp8266

💡 ESP8266 framework for Homie, a lightweight MQTT convention for the IoT
http://homieiot.github.io/homie-esp8266
MIT License
1.36k stars 308 forks source link

Deepsleep causes segfault when WiFi fails #355

Open top4ek opened 7 years ago

top4ek commented 7 years ago

Made sketch with going to Deep Sleep when WiFi connection fails (like here) When homie triggers READY_TO_SLEEP event with

case HomieEventType::READY_TO_SLEEP:
  Homie.getLogger() << "Ready to sleep." << endl;
  ESP.deepSleep(sleepSetting.get() * 1000000);
break;

it tries to connect once more and crashes.

↕ Attempting to connect to Wi-Fi... ✖ Wi-Fi disconnected Triggering WIFI_DISCONNECTED event... Wi-Fi disconnected, reason: 15 Could not connect to WIFI. Event 1. ↕ Attempting to connect to Wi-Fi... ✖ Wi-Fi disconnected Triggering WIFI_DISCONNECTED event... Wi-Fi disconnected, reason: 2 Could not connect to WIFI. Event 2. ↕ Attempting to connect to Wi-Fi... ✖ Wi-Fi disconnected Triggering WIFI_DISCONNECTED event... Wi-Fi disconnected, reason: 15 Could not connect to WIFI. Event 3. Could not connect to WLAN. Preparing for deep sleep... Triggering READY_TO_SLEEP event... Ready to sleep. ↕ Attempting to connect to Wi-Fi...

Exception (28): epc1=0x4000df2f epc2=0x00000000 epc3=0x00000000 excvaddr=0x00000033 depc=0x00000000

ctx: sys sp: 3ffffa90 end: 3fffffb0 offset: 01a0

Tried WiFi.disconnect() before firing ESP.deepSleep but it didn's worked. Using dev version of Homie.

Gulaschcowboy commented 7 years ago

I observed this final "↕ Attempting to connect to Wi-Fi..." after "Ready to sleep." as well, but didn't get that exception yet. Must be related to failed WiFi connection.

top4ek commented 7 years ago

Yes, for testing purposes i've set wrong password. I need to go to Deepsleep if I unable to connect to WiFi.

Gulaschcowboy commented 7 years ago

yes, fairly legit for battery powered devices. But as I'm just a user and not a developer of this library, I don't know what causes this exception. We have to wait for @marvinroger or somebody else with more in-sight.

jbliesener commented 7 years ago

Same problem here on reboot after successful OTA update:

Receiving OTA firmware (357024/357024)...
✔ OTA succeeded
Triggering OTA_SUCCESSFUL event...
Device is idle
↻ Rebooting...

✖ MQTT disconnected
Triggering MQTT_DISCONNECTED event...
↕ Attempting to connect to MQTT...
✖ Wi-Fi disconnected
Triggering WIFI_DISCONNECTED event...
↕ Attempting to connect to Wi-Fi...

Exception (28):
epc1=0x4000df2f epc2=0x00000000 epc3=0x00000000 excvaddr=0x00000033 depc=0x00000000

ctx: sys
sp: 3ffffa90 end: 3fffffb0 offset: 01a0

>>>stack>>>
3ffffc30:  00000000 3fff340c 4024d7c0 40107604
3ffffc40:  4023112b 00000000 00000000 4023fa41
3ffffc50:  00000033 00000010 402319c5 3ffe9fe4
3ffffc60:  3fff340c 00000000 3ffe9fe4 3fff242c
3ffffc70:  3fff340c 40231911 3ffe9fe4 00000000
3ffffc80:  3ffe89e6 3ffeac0c 3ffea790 00000064
3ffffc90:  3fff36e4 3ffea704 3ffe89e6 3ffe89e6
3ffffca0:  00000033 3ffea010 3fff2184 00000006
3ffffcb0:  40233547 3ffea6b8 3ffea790 0000010f
3ffffcc0:  00000001 00040000 7fffffff 00000000
3ffffcd0:  402331fc 402331d0 00040000 000000fd
3ffffce0:  40234f99 402331d0 40104836 3ffeb800
3ffffcf0:  3ffea068 40234f60 00000001 00000012
3ffffd00:  402331c6 4023312d 00000002 3ffea6b8
3ffffd10:  4023316d 00000001 00000012 0000010f
3ffffd20:  40233660 00000001 3ffea6b8 0000010f
3ffffd30:  402330e8 3ffea158 0000000d 007f8000
3ffffd40:  3ffea074 00000000 402355c0 4023537c
3ffffd50:  0000010f 3fff17f2 00000000 00000004
3ffffd60:  40235256 3fff1813 3fff17f2 00000000
3ffffd70:  40220c34 3fff1813 3fff17f2 007f8000
3ffffd80:  3fff2184 40213b89 3fff1813 40213a73
3ffffd90:  4c57424a 10004e41 21fe8e40 0003ff40
3ffffda0:  24a3e000 21feab40 ff295c40 21ff513f
3ffffdb0:  62726f4b 33686361 37393434 73617242

3ffffdc0:  61696c69 36373037 39372d30 ff1d0035
3ffffdd0:  ff17703f 1004d83f 21fb7640 ff177040
3ffffde0:  ff1df83f 1076003f 220b9940 00000140
3ffffdf0:  cd8b7000 4a9036ad 414c5742 0000004e
3ffffe00:  40201081 00000000 00000076 0000007f
3ffffe10:  0000007e 4b00001c 6162726f 34336863
3ffffe20:  42373934 69736172 3761696c 30363730
3ffffe30:  3539372d 3fff1e00 3fff17d0 401004d8
3ffffe40:  4021fe92 3fff17d0 3fff1e58 40107604
3ffffe50:  40220eb5 00ff27a0 00000001 40220e9a
3ffffe60:  00000000 3fff1e58 00000001 00000000
3ffffe70:  40213320 000000e0 3fff178c 3ffe9ce8
3ffffe80:  3fff17f2 3fff1e58 00000000 402070e6
3ffffe90:  3ffffea0 00000000 3fff3934 40107614
3ffffea0:  401009f4 01097235 3fff369c 00000000
3ffffeb0:  3ffe9b50 00000001 3ffe859d 4021c794
3ffffec0:  00000001 0104ad81 40217300 3ffe9ce8
3ffffed0:  3fffff40 3fff178c 3fff178c 4021ce34
3ffffee0:  40203ebc 3fffff58 3fffff58 40201305
3ffffef0:  3ffe859c 00000024 4024cb1c 4021ce16
3fffff00:  3fff1c20 3f800000 3fff0cf8 3ffe9ce8
3fffff10:  3fffff40 3fff1c20 3fff0cf8 4020789d
3fffff20:  3fff1e5c 3fff3304 3fff376c 4021ccc5
3fffff30:  000005e0 00000000 3fff376c 40213274
3fffff40:  3fff279c 0000000f 00000006 adcd8b70
3fffff50:  00009036 00000008 00000000 00000000
3fffff60:  00000000 40104c16 3fff0e4c 40214521
3fffff70:  3ffeaf58 3fff376c 3fff332c 4021347f
3fffff80:  40104a22 00000000 00000002 3ffe9ce8
3fffff90:  3fffdc10 00000000 3fff376c 4022205c
3fffffa0:  40000f49 3fffdab0 3fffdab0 40000f49
<<<stack<<<

 ets Jan  8 2013,rst cause:4, boot mode:(3,6)

wdt reset
load 0x4010f000, len 1384, room 16
tail 8
chksum 0x2d

csum 0x2d
v3ffe8558
@cp:0
ld

💡 Firmware bare-minimum (1.0.9)
🔌 Booting into normal mode 🔌
{} Stored configuration
  • Hardware device ID: 5ccf7f82f8e5
  • Device ID: 5ccf7f82f8e5
  • Name: Homie-1
  • Wi-Fi:
    ◦ SSID: JBWLAN
    ◦ Password not shown
  • MQTT:
    ◦ Host: 10.0.0.253
    ◦ Port: 1883
    ◦ Base topic: homie/
    ◦ Auth? no
  • OTA:
    ◦ Enabled? yes
↕ Attempting to connect to Wi-Fi...
✖ Wi-Fi disconnected
Triggering WIFI_DISCONNECTED event...
↕ Attempting to connect to Wi-Fi...
✖ Wi-Fi disconnected
Triggering WIFI_DISCONNECTED event...

↕ Attempting to connect to Wi-Fi...
✖ Wi-Fi disconnected
Triggering WIFI_DISCONNECTED event...
↕ Attempting to connect to Wi-Fi...
✖ Wi-Fi disconnected
Triggering WIFI_DISCONNECTED event...
↕ Attempting to connect to Wi-Fi...
✖ Wi-Fi disconnected
Triggering WIFI_DISCONNECTED event...
↕ Attempting to connect to Wi-Fi...
✖ Wi-Fi disconnected
Triggering WIFI_DISCONNECTED event...
↕ Attempting to connect to Wi-Fi...
✖ Wi-Fi disconnected
Triggering WIFI_DISCONNECTED event...
↕ Attempting to connect to Wi-Fi...
✖ Wi-Fi disconnected
Triggering WIFI_DISCONNECTED event...
↕ Attempting to connect to Wi-Fi...
✖ Wi-Fi disconnected
Triggering WIFI_DISCONNECTED event...
↕ Attempting to connect to Wi-Fi...
✔ Wi-Fi connected, IP: 10.0.0.5
Triggering WIFI_CONNECTED event...
↕ Attempting to connect to MQTT...
Sending initial information...
✔ MQTT ready
Triggering MQTT_READY event...
Calling setup function...
〽 Sending statistics...
  • Wi-Fi signal quality: 100%
  • Uptime: 20s

Decoded stack trace:

Exception Cause: 28  [LoadProhibited: A load referenced a page mapped with an attribute that does not permit loads]

0x4000df2f: ?? ??:0
0x4024d7c0: chip_v6_unset_chanfreq at ??:?
0x40107604: pvPortMalloc at /home/jbliesener/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/heap.c:13
0x4023112b: ieee80211_send_setup at ??:?
0x4023fa41: esf_buf_alloc at ??:?
0x402319c5: ieee80211_send_probereq at ??:?
0x40231911: ieee80211_send_probereq at ??:?
0x40233547: scan_remove_probe_ssid at ??:?
0x402331fc: scan_pm_channel_op_cb at ??:?
0x402331d0: scan_pm_channel_op_cb at ??:?
0x40234f99: chm_start_op at ??:?
0x402331d0: scan_pm_channel_op_cb at ??:?
0x40104836: lmacTxFrame at ??:?
0x40234f60: chm_start_op at ??:?
0x402331c6: scan_pm_channel_op_cb at ??:?
0x4023312d: scan_start at ??:?
0x4023316d: scan_pm_channel_op_cb at ??:?
0x40233660: scan_clear_channles at ??:?
0x402330e8: scan_start at ??:?
0x402355c0: cnx_start_handoff_cb at ??:?
0x4023537c: cnx_sta_scan_cmd at ??:?
0x40235256: cnx_sta_connect_cmd at ??:?
0x40220c34: wifi_station_connect at ??:?
0x40213b89: ESP8266WiFiSTAClass::begin(char const*, char const*, int, unsigned char const*, bool) at /home/jbliesener/.platformio/packages/framework-arduinoespressif8266/libraries/ESP8266WiFi/src/ESP8266WiFiSTA.cpp:510
0x40213a73: ESP8266WiFiSTAClass::begin(char const*, char const*, int, unsigned char const*, bool) at /home/jbliesener/.platformio/packages/framework-arduinoespressif8266/libraries/ESP8266WiFi/src/ESP8266WiFiSTA.cpp:510
0x40201081: HardwareSerial::write(unsigned char) at /home/jbliesener/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/HardwareSerial.cpp:193
0x401004d8: malloc at /home/jbliesener/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/umm_malloc/umm_malloc.c:1664
0x4021fe92: wifi_get_opmode at ??:?
0x40107604: pvPortMalloc at /home/jbliesener/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/heap.c:13
0x40220eb5: wifi_station_set_hostname at ??:?
0x40220e9a: wifi_station_set_hostname at ??:?
0x40213320: ESP8266WiFiGenericClass::getMode() at /home/jbliesener/.platformio/packages/toolchain-xtensa/xtensa-lx106-elf/include/c++/4.8.2/functional:1954
0x402070e6: HomieInternals::BootNormal::_wifiConnect() at /home/jbliesener/platformio/src/homietest/.piolibdeps/Homie/src/Homie/Boot/BootNormal.cpp:149
0x40107614: vPortFree at /home/jbliesener/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/heap.c:18
0x401009f4: ets_timer_arm_new at ??:?
0x4021c794: Print::write(unsigned char const*, unsigned int) at /home/jbliesener/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/Print.cpp:38
0x40217300: _M_manager at /home/jbliesener/.platformio/packages/framework-arduinoespressif8266/libraries/ESP8266mDNS/ESP8266mDNS.cpp:384
0x4021ce34: HomieInternals::Logger::write(unsigned char const*, unsigned int) at /home/jbliesener/platformio/src/homietest/.piolibdeps/Homie/src/Homie/Logger.cpp:24
0x40203ebc: HomieInternals::Blinker::_tick(unsigned char) at /home/jbliesener/platformio/src/homietest/.piolibdeps/Homie/src/Homie/Blinker.cpp:24
0x40201305: Print::write(char const*) at /home/jbliesener/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/Print.cpp:188
0x4024cb1c: chip_v6_unset_chanfreq at ??:?
0x4021ce16: HomieInternals::Logger::write(unsigned char) at /home/jbliesener/platformio/src/homietest/.piolibdeps/Homie/src/Homie/Logger.cpp:20
0x4020789d: HomieInternals::BootNormal::_onWifiDisconnected(WiFiEventStationModeDisconnected const&) at /home/jbliesener/platformio/src/homietest/.piolibdeps/Homie/src/Homie/Boot/BootNormal.cpp:179
0x4021ccc5: std::_Function_handler<void (WiFiEventStationModeDisconnected const&), std::_Bind<std::_Mem_fn<void (HomieInternals::BootNormal::*)(WiFiEventStationModeDisconnected const&)> (HomieInternals::BootNormal*, std::_Placeholder<1>)> >::_M_invoke(std::_Any_data const&, WiFiEventStationModeDisconnected const&) at /home/jbliesener/.platformio/packages/toolchain-xtensa/xtensa-lx106-elf/include/c++/4.8.2/functional:2073
0x40213274: ~WiFiEventStationModeDisconnected at /home/jbliesener/.platformio/packages/toolchain-xtensa/xtensa-lx106-elf/include/c++/4.8.2/functional:1954
 (inlined by) operator() at /home/jbliesener/.platformio/packages/framework-arduinoespressif8266/libraries/ESP8266WiFi/src/ESP8266WiFiGeneric.cpp:121
 (inlined by) _M_invoke at /home/jbliesener/.platformio/packages/toolchain-xtensa/xtensa-lx106-elf/include/c++/4.8.2/functional:2071
0x40104c16: ppEnqueueRxq at ??:?
0x40214521: WiFiClient::stopAll() at /home/jbliesener/.platformio/packages/framework-arduinoespressif8266/libraries/ESP8266WiFi/src/WiFiClient.cpp:149
0x4021347f: std::_List_iterator<std::shared_ptr<WiFiEventHandlerOpaque> >::operator++() at /home/jbliesener/.platformio/packages/toolchain-xtensa/xtensa-lx106-elf/include/c++/4.8.2/functional:1954
 (inlined by) ESP8266WiFiGenericClass::_eventCallback(void*) at /home/jbliesener/.platformio/packages/framework-arduinoespressif8266/libraries/ESP8266WiFi/src/ESP8266WiFiGeneric.cpp:219
0x40104a22: ppProcessTxQ at ??:?
0x4022205c: wifi_set_status_led_output_level at ??:?
0x40000f49: ?? ??:0
0x40000f49: ?? ??:0
CapnBry commented 7 years ago

I ran into the same problem when making my node go to sleep if it can't connect to wifi after a short amount of time. What happens is ESP.deepSleep() triggers WiFi.onStationModeDisconnected() event as part of its shutdown routine (even though the wifi never connected). This calls BootNormal::_onWifiDisconnected which immediately calls BootNormal::_wifiConnect which bombs out when it starts trying to reconnect WiFi while preparing to enter deep sleep, I don't believe the ESP core is designed to be re-entrant while entering deep sleep.

One problem is that Homie triggers the READY_TO_SLEEP event immediately if not connected but does not set the flaggedForSleep member which allows it to try to reconnect even as it is shutting down. This can be avoided by changing HomieClass::prepareToSleep to always set flaggedForSleep instead of just when connected:

void HomieClass::prepareToSleep() {
  Interface::get().flaggedForSleep = true;
  if (!Interface::get().connected) {
    Interface::get().getLogger() << F("Triggering READY_TO_SLEEP event...") << endl;
    Interface::get().event.type = HomieEventType::READY_TO_SLEEP;
    Interface::get().eventHandler(Interface::get().event);
  }
}

That prevents the exception but you still get the logging that says that it is reconnecting, so I also changed _onWifiDisconnected to bail if flaggedForSleep is set:

void BootNormal::_onWifiDisconnected(const WiFiEventStationModeDisconnected& event) {
  Interface::get().connected = false;
  if (Interface::get().flaggedForSleep)
    return;
  if (Interface::get().led.enabled) Interface::get().getBlinker().start(LED_WIFI_DELAY);

Note I've included that last line about the LED because it is redundant with the one in _wifiConnect() which is always called so I believe it should be removed.

CapnBry commented 7 years ago

I should add that this is also a problem when you flag for reset by holding down the reset trigger button in much the same way as @jbliesener mentioned about rebooting after OTA.

Flagged for reset by pin
Device is idle
Configuration erased
Triggering ABOUT_TO_RESET event...
↻ Rebooting into config mode...
✖ Wi-Fi disconnected
Triggering WIFI_DISCONNECTED event...
↕ Attempting to connect to Wi-Fi...

Exception (28):
epc1=0x4000df2f epc2=0x00000000 epc3=0x00000000 excvaddr=0x00000033 depc=0x00000000

Homie tries to reconnect wifi even though it is flagged for reset and reboot so it seems that in _onWifiDisconnected it should check HomieClass::_flaggedForReboot as well somehow. It also is odd that _wifiConnect sets WiFi.setAutoconnect(true) and WiFi.setAutoReconnect(true) because Homie forces connect on setup() so it shouldn't autoconnect, and Homie does its own manual reconnect on disconnect so it shouldn't auto reconnect. I'm not sure if these cause problems if the ESP core is trying to do its own connect and Homie is setting wifi parameters at the same time.

jmozmoz commented 6 years ago

I also see this problem. Is there any update on how to fix it?

Is this relevant: Most of the times, I see such an exception, there is this in the log:

Offline message acknowledged. Disconnecting MQTT...

I never see this offline message log output if there is no crash in case of a deepSleep

This might be interesting, too: Besides the change in prepareToSleep, I added a flag before the call to prepareToSleep() so that homie.loop() is not called anymore. This seems to avoid the crash.

x821938 commented 6 years ago

I have also been fighting with this problem. My solution was to revert to 0.8.1 of AsyncMqtt: https://github.com/marvinroger/async-mqtt-client/releases/tag/v0.8.1 (remember to build clean and rebuild everything)

beckenc commented 5 years ago

Same problem here. @timpur @marvinroger any solution ?

This is my workaround for going into deep sleep

void onHomieEvent(const HomieEvent& event) {
  static bool readyToSleep = false;
  switch (event.type) {
    case HomieEventType::READY_TO_SLEEP:
      WiFi.disconnect();    // Calling doDeepSleep() at this point sometimes causes exceptions due to asynchronous arrival 
                            // of WIFI_DISCONNECTED. WDT wakes up the Device immediatly after the exception. So deep sleep does not work.
                            // To work around this, I decided to inherit this task to the WIFI_DISCONNECTED event.
      readyToSleep = true;
      break;
    case HomieEventType::WIFI_DISCONNECTED:
      if(readyToSleep)
        Homie.doDeepSleep(publishIntervalSetting.get() * 1000000);
      break;
  }
}

For a workaround / hotfix within the framework, I'd suggest to disable the onStationModDisconnected callback of the WiFi before every restart and deepSleep.

WiFi.onStationModeDisconnected([](const WiFiEventStationModeDisconnected&){});
ESP.restart();
WiFi.onStationModeDisconnected([](const WiFiEventStationModeDisconnected&){});
ESP.deepSleep();
nougad commented 5 years ago

I can confirm the WiFi is not correctly stopped and @nuccleon patch helps. Is a PR with the proposed workaround welcomed? I'm happy to change the two lines.

I got a Exception Cause: 28 as well:

0x4024f89e: tcp_output at /home/gauchard/dev/esp8266/esp8266/tools/sdk/lwip2/builder/lwip2-src/src/core/tcp_out.c:1262
0x40100211: std::function<void (void*, AsyncClient*)>::operator()(void*, AsyncClient*) const at ??:?
0x40228780: AsyncClient::close(bool) at ??:?
0x402287ad: AsyncClient::close(bool) at ??:?
0x402284bd: AsyncClient::send() at ??:?
0x4021a121: AsyncMqttClient::subscribe(char const*, unsigned char) at ??:?
0x4020689a: HomieInternals::BootNormal::_onMqttPublish(unsigned short) at ??:?
0x40229492: std::_Function_handler<void (char*, char*, AsyncMqttClientMessageProperties, unsigned int, unsigned int, unsigned int), std::_Bind<std::_Mem_fn<void (HomieInternals::BootNormal::*)(char*, char*, AsyncMqttClientMessageProperties, unsigned int, unsigned int, unsigned int)> (HomieInternals::BootNormal*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>, std::_Placeholder<6>)> >::_M_invoke(std::_Any_data const&, char*, char*, AsyncMqttClientMessageProperties, unsigned int, unsigned int, unsigned int) at ??:?
0x40100103: std::function<void (unsigned short)>::operator()(unsigned short) const at ??:?
0x4021a975: AsyncMqttClient::_onPubAck(unsigned short) at ??:?
0x40204ebc: std::_Function_base::_Base_manager<std::_Bind<std::_Mem_fn<void (HomieInternals::BootNormal::*)(char*, char*, AsyncMqttClientMessageProperties, unsigned int, unsigned int, unsigned int)> (HomieInternals::BootNormal*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>, std::_Placeholder<6>)> >::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation) at ??:?

which is related to a double send() on close and fixed in async-mqtt-client: https://github.com/marvinroger/async-mqtt-client/commit/8bd918bfeb2f197afa8b0cb5d278ab2942c62ccb

But the commit before that changes the behavior of onDisconnect callback: https://github.com/marvinroger/async-mqtt-client/issues/160 and homie is stuck at Offline message acknowledged. Disconnecting MQTT...

luebbe commented 5 years ago

A PR is always welcome. Please make it against develop-v3, because we decided to stop development on the develop (v2.x) branch.

VyacheslavKorotach commented 4 years ago

.. In my case, the same issue was solved in that way: wifi_set_opmode(NULL_MODE); delay(100); ESP.deepSleep(sleep_duration * 1000, WAKE_RF_DEFAULT);

stritti commented 4 years ago

A PR is always welcome. Please make it against develop-v3, because we decided to stop development on the develop (v2.x) branch.

we have refactored the repository currently. #develop is now as expected the latest development. Please create PR against develop now.