theelims / ESP32-sveltekit

A simple and extensible framework for ESP32 based IoT projects with a feature-rich, beautiful, and responsive front-end build with Sveltekit, Tailwind CSS and DaisyUI. This is a project template to get you started in no time with a fully integrated build chain.
https://theelims.github.io/ESP32-sveltekit/
Other
90 stars 15 forks source link

Server stops responding #11

Closed jetpax closed 4 months ago

jetpax commented 8 months ago

Thanks for building this great project!

Not sure if this is related to #8, so am opening a new issue.

I did a build with webui in LittleFS, and system works but is quite unstable and server stops responding after a while.

ERROR: Too many SSE messages queued for [192.168.1.125]
ERROR: Too many SSE messages queued for [192.168.1.125]
ERROR: Too many SSE messages queued for [192.168.1.125]
ERROR: Too many SSE messages queued for [192.168.1.125]
ERROR: Too many SSE messages queued for [192.168.1.125]
ERROR: Too many SSE messages queued for [192.168.1.125]
ERROR: Too many SSE messages queued for [192.168.1.125]
ERROR: Too many SSE messages queued for [192.168.1.125]
ERROR: Too many SSE messages queued for [192.168.1.125]
ERROR: Too many SSE messages queued for [192.168.1.125]
ERROR: Too many SSE messages queued for [192.168.1.125]
ERROR: Too many SSE messages queued for [192.168.1.125]
ERROR: Too many SSE messages queued for [192.168.1.125]
ERROR: Too many SSE messages queued for [192.168.1.125]
ERROR: Too many SSE messages queued for [192.168.1.125]
ERROR: Too many SSE messages queued for [192.168.1.125]
ERROR: Too many SSE messages queued for [192.168.1.125]
ERROR: Too many SSE messages queued for [192.168.1.125]
ERROR: Too many SSE messages queued for [192.168.1.125]
ERROR: Too many SSE messages queued for [192.168.1.125]
[879753][V][WiFiGeneric.cpp:362] _arduino_event_cb(): STA Disconnected: SSID: Paxnet, BSSID: b4:fb:e4:d7:d4:ee, Reason: 200
[879754][D][WiFiGeneric.cpp:931] _eventCallback(): Arduino Event: 5 - STA_DISCONNECTED
[879761][W][WiFiGeneric.cpp:955] _eventCallback(): Reason: 200 - BEACON_TIMEOUT
[879768][D][WiFiGeneric.cpp:975] _eventCallback(): WiFi Reconnect Running
WiFi Disconnected. Reason code=200
WiFi connection dropped, stopping NTP.
[879787][V][WiFiGeneric.cpp:343] _arduino_event_cb(): STA Stopped
[879789][D][WiFiGeneric.cpp:931] _eventCallback(): Arduino Event: 3 - STA_STOP
Starting software access point
[881615][D][WiFiGeneric.cpp:931] _eventCallback(): Arduino Event: 0 - WIFI_READY
[881617][V][WiFiGeneric.cpp:392] _arduino_event_cb(): AP Started
[881617][D][WiFiGeneric.cpp:931] _eventCallback(): Arduino Event: 10 - AP_START
[881618][V][WiFiGeneric.cpp:97] set_esp_interface_ip(): Configuring SoftAP static IP: 192.168.4.1, MASK: 255.255.255.0, GW: 192.168.4.1
[881636][V][WiFiGeneric.cpp:143] set_esp_interface_ip(): SoftAP: 192.168.4.1 | Gateway: 192.168.4.1 | DHCP Start: 0.0.0.0 | Netmask: 255.255.255.0
[881649][V][WiFiGeneric.cpp:190] set_esp_interface_ip(): DHCP Server Range: 192.168.4.2 to 192.168.4.12
[882343][V][WiFiGeneric.cpp:395] _arduino_event_cb(): AP Stopped
[882343][D][WiFiGeneric.cpp:931] _eventCallback(): Arduino Event: 11 - AP_STOP
[882344][V][WiFiGeneric.cpp:392] _arduino_event_cb(): AP Started
[882351][D][WiFiGeneric.cpp:931] _eventCallback(): Arduino Event: 10 - AP_START
Starting captive portal on 192.168.4.1
Connecting to WiFi.
[900901][V][WiFiGeneric.cpp:340] _arduino_event_cb(): STA Started
[900901][D][WiFiGeneric.cpp:931] _eventCallback(): Arduino Event: 2 - STA_START
[900901][V][WiFiGeneric.cpp:97] set_esp_interface_ip(): Configuring Station static IP: 0.0.0.0, MASK: 0.0.0.0, GW: 0.0.0.0
[900952][V][WiFiGeneric.cpp:355] _arduino_event_cb(): STA Connected: SSID: Paxnet, BSSID: f4:92:bf:a5:99:f1, Channel: 6, Auth: WPA2_PSK
[900953][D][WiFiGeneric.cpp:931] _eventCallback(): Arduino Event: 4 - STA_CONNECTED
WiFi Connected.
[902371][V][WiFiGeneric.cpp:369] _arduino_event_cb(): STA Got Same IP:192.168.1.153
[902371][D][WiFiGeneric.cpp:931] _eventCallback(): Arduino Event: 7 - STA_GOT_IP
[902374][D][WiFiGeneric.cpp:996] _eventCallback(): STA IP: 192.168.1.153, MASK: 255.255.255.0, GW: 192.168.1.1
WiFi Got IP. localIP=192.168.1.153, hostName=Svelte-RetroVMS
Got IP address, starting NTP Synchronization
Starting NTP...
[908358][E][vfs_api.cpp:105] open(): /littlefs/www/index.html does not exist, no permits for creation
[908389][E][vfs_api.cpp:105] open(): /littlefs/www/index.html does not exist, no permits for creation
[908419][E][vfs_api.cpp:105] open(): /littlefs/www/index.html does not exist, no permits for creation
[908488][E][vfs_api.cpp:105] open(): /littlefs/www/index.html does not exist, no permits for creation
[908518][E][vfs_api.cpp:105] open(): /littlefs/www/index.html does not exist, no permits for creation
[908548][E][vfs_api.cpp:105] open(): /littlefs/www/index.html does not exist, no permits for creation
[908644][E][vfs_api.cpp:105] open(): /littlefs/www/_app/immutable/entry/start.js does not exist, no permits for creation
[909278][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[909279][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[909279][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[909281][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[909286][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[909304][E][vfs_api.cpp:105] open(): /littlefs/www/favicon.png does not exist, no permits for creation
New client connected to Event Source: 1 Clients connected
ESP-ROM:esp32s3-20210327
Build:Mar 27 2021
rst:0x1 (POWERON),boot:0x8 (SPI_FAST_FLASH_BOOT)
SPIWP:0xee
mode:DIO, clock div:1
load:0x3fce3808,len:0x44c
load:0x403c9700,len:0xbec
load:0x403cc700,len:0x2920
entry 0x403c98d8
[   317][D][WiFiGeneric.cpp:931] _eventCallback(): Arduino Event: 0 - WIFI_READY
[   352][V][WiFiGeneric.cpp:340] _arduino_event_cb(): STA Started
[   353][D][WiFiGeneric.cpp:931] _eventCallback(): Arduino Event: 2 - STA_START
[   355][E][WiFiGeneric.cpp:940] _eventCallback(): esp_wifi_set_ps failed
E (452) wifi_init_default: esp_wifi_get_mac failed with 12289
[   367][V][WiFiGeneric.cpp:343] _arduino_event_cb(): STA Stopped
[   373][D][WiFiGeneric.cpp:931] _eventCallback(): Arduino Event: 3 - STA_STOP
Running Firmware Version: 0.2.2
Connecting to WiFi.
[   542][D][esp32-hal-rmt.c:615] rmtInit():  -- TX RMT - CH 0 - 1 RAM Blocks - pin 48
[   555][D][WiFiGeneric.cpp:931] _eventCallback(): Arduino Event: 0 - WIFI_READY
[   557][V][WiFiGeneric.cpp:340] _arduino_event_cb(): STA Started
[   557][D][WiFiGeneric.cpp:931] _eventCallback(): Arduino Event: 2 - STA_START
[   557][V][WiFiGeneric.cpp:97] set_esp_interface_ip(): Configuring Station static IP: 0.0.0.0, MASK: 0.0.0.0, GW: 0.0.0.0
Starting software access point
[   732][V][WiFiGeneric.cpp:392] _arduino_event_cb(): AP Started
[   732][D][WiFiGeneric.cpp:931] _eventCallback(): Arduino Event: 10 - AP_START
[   733][V][WiFiGeneric.cpp:97] set_esp_interface_ip(): Configuring SoftAP static IP: 192.168.4.1, MASK: 255.255.255.0, GW: 192.168.4.1
[   746][V][WiFiGeneric.cpp:143] set_esp_interface_ip(): SoftAP: 192.168.4.1 | Gateway: 192.168.4.1 | DHCP Start: 0.0.0.0 | Netmask: 255.255.255.0
[   758][V][WiFiGeneric.cpp:190] set_esp_interface_ip(): DHCP Server Range: 192.168.4.2 to 192.168.4.12
[  1454][V][WiFiGeneric.cpp:395] _arduino_event_cb(): AP Stopped
[  1455][D][WiFiGeneric.cpp:931] _eventCallback(): Arduino Event: 11 - AP_STOP
[  1455][V][WiFiGeneric.cpp:392] _arduino_event_cb(): AP Started
[  1462][D][WiFiGeneric.cpp:931] _eventCallback(): Arduino Event: 10 - AP_START
Starting captive portal on 192.168.4.1
[  2455][V][WiFiGeneric.cpp:362] _arduino_event_cb(): STA Disconnected: SSID: Paxnet, BSSID: 68:ff:7b:06:87:15, Reason: 2
[  2455][D][WiFiGeneric.cpp:931] _eventCallback(): Arduino Event: 5 - STA_DISCONNECTED
[  2462][W][WiFiGeneric.cpp:955] _eventCallback(): Reason: 2 - AUTH_EXPIRE
[  2469][D][WiFiGeneric.cpp:975] _eventCallback(): WiFi Reconnect Running
WiFi Disconnected. Reason code=2
WiFi connection dropped, stopping NTP.
[  2487][V][WiFiGeneric.cpp:362] _arduino_event_cb(): STA Disconnected: SSID: Paxnet, BSSID: f4:92:bf:a5:99:f1, Reason: 202
[  2493][D][WiFiGeneric.cpp:931] _eventCallback(): Arduino Event: 5 - STA_DISCONNECTED
[  2493][V][WiFiGeneric.cpp:343] _arduino_event_cb(): STA Stopped
[  2500][W][WiFiGeneric.cpp:955] _eventCallback(): Reason: 202 - AUTH_FAIL
WiFi Disconnected. Reason code=202
WiFi connection dropped, stopping NTP.
[  2523][D][WiFiGeneric.cpp:931] _eventCallback(): Arduino Event: 3 - STA_STOP
Connecting to WiFi.
[ 30548][V][WiFiGeneric.cpp:340] _arduino_event_cb(): STA Started
[ 30548][D][WiFiGeneric.cpp:931] _eventCallback(): Arduino Event: 2 - STA_START
[ 30549][V][WiFiGeneric.cpp:97] set_esp_interface_ip(): Configuring Station static IP: 0.0.0.0, MASK: 0.0.0.0, GW: 0.0.0.0
[ 30734][V][WiFiGeneric.cpp:355] _arduino_event_cb(): STA Connected: SSID: Paxnet, BSSID: f4:92:bf:a5:99:f1, Channel: 6, Auth: WPA2_PSK
[ 30735][D][WiFiGeneric.cpp:931] _eventCallback(): Arduino Event: 4 - STA_CONNECTED
WiFi Connected.
[ 31266][V][WiFiGeneric.cpp:369] _arduino_event_cb(): STA Got New IP:192.168.1.153
[ 31267][D][WiFiGeneric.cpp:931] _eventCallback(): Arduino Event: 7 - STA_GOT_IP
[ 31270][D][WiFiGeneric.cpp:996] _eventCallback(): STA IP: 192.168.1.153, MASK: 255.255.255.0, GW: 192.168.1.1
WiFi Got IP. localIP=192.168.1.153, hostName=Svelte-RetroVMS
Got IP address, starting NTP Synchronization
Starting NTP...
New client connected to Event Source: 1 Clients connected
[ 39401][E][vfs_api.cpp:105] open(): /littlefs/www/_app/version.json does not exist, no permits for creation
Stopping captive portal
Stopping software access point
[ 40758][V][WiFiGeneric.cpp:395] _arduino_event_cb(): AP Stopped
[ 40758][D][WiFiGeneric.cpp:931] _eventCallback(): Arduino Event: 11 - AP_STOP
[ 40759][V][WiFiGeneric.cpp:392] _arduino_event_cb(): AP Started
[ 40769][D][WiFiGeneric.cpp:931] _eventCallback(): Arduino Event: 10 - AP_START
[ 40770][V][WiFiGeneric.cpp:395] _arduino_event_cb(): AP Stopped
[ 40782][D][WiFiGeneric.cpp:931] _eventCallback(): Arduino Event: 11 - AP_STOP
[ 42864][W][AsyncTCP.cpp:999] _poll(): rx timeout 4
[ 42866][W][AsyncTCP.cpp:999] _poll(): rx timeout 4
[ 42867][W][AsyncTCP.cpp:999] _poll(): rx timeout 4
[ 42868][W][AsyncTCP.cpp:999] _poll(): rx timeout 4
[ 45573][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[ 45936][E][vfs_api.cpp:105] open(): /littlefs/www/favicon.png does not exist, no permits for creation
[ 45979][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[ 47043][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[ 47043][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[ 47513][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[ 47514][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[ 48053][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[ 48053][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[ 48605][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[ 48605][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
New client connected to Event Source: 1 Clients connected
[105819][E][vfs_api.cpp:105] open(): /littlefs/www/index.html does not exist, no permits for creation
[105848][E][vfs_api.cpp:105] open(): /littlefs/www/index.html does not exist, no permits for creation
[105879][E][vfs_api.cpp:105] open(): /littlefs/www/index.html does not exist, no permits for creation
[106689][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[106689][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[106689][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[106691][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[107279][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[107279][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[107677][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[107677][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[107678][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[107680][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[107684][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[108408][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[108408][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[108408][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[108410][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[108415][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[108872][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[108872][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[108872][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[108874][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[108879][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
New client connected to Event Source: 2 Clients connected
New client connected to Event Source: 3 Clients connected
[119216][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[119216][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[119216][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
E (127354) tsens: Do not configure the temp sensor when it's running!
E (157350) tsens: Do not configure the temp sensor when it's running!
[324237][I][WebSocketServer.h:144] onWSEvent(): [WebSocket Server] ws[/ws/lightState][1] connect
[342419][I][WebSocketServer.h:253] onWSEvent(): [WebSocket Server] ws[/ws/lightState][1] disconnect: 1070549080
Starting NTP...
Connecting to MQTT...
[509382][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
Connected to MQTT, without persistent session
Disconnected from MQTT reason: 0
Connecting to MQTT...
Connected to MQTT, without persistent session
[530068][I][WebSocketServer.h:144] onWSEvent(): [WebSocket Server] ws[/ws/lightState][2] connect
[537240][I][WebSocketServer.h:253] onWSEvent(): [WebSocket Server] ws[/ws/lightState][2] disconnect: 1070549080
[551064][I][WebSocketServer.h:144] onWSEvent(): [WebSocket Server] ws[/ws/lightState][3] connect
[557424][I][WebSocketServer.h:253] onWSEvent(): [WebSocket Server] ws[/ws/lightState][3] disconnect: 97
E (4545344) tsens: Do not configure the temp sensor when it's running!
E (4585341) tsens: Do not configure the temp sensor when it's running!
E (4775339) tsens: Do not configure the temp sensor when it's running!
[4805411][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
E (10791926) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (10791926) task_wdt:  - IDLE (CPU 0)
E (10791926) task_wdt: Tasks currently running:
E (10791926) task_wdt: CPU 0: Analytics Servi
E (10791926) task_wdt: CPU 1: IDLE
E (10791926) task_wdt: Aborting.

abort() was called at PC 0x4202f5d8 on core 0

Backtrace: 0x403778de:0x3fc962b0 0x4037cf39:0x3fc962d0 0x40383611:0x3fc962f0 0x4202f5d8:0x3fc96370 0x40378c55:0x3fc96390 0x4202a76f:0x3fca7280 0x4202a7df:0x3fca72b0 0x420254c6:0x3fca7300 0x4201676c:0x3fca7330 0x42016979:0x3fca7870

  #0  0x403778de:0x3fc962b0 in panic_abort at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/esp_system/panic.c:402
  #1  0x4037cf39:0x3fc962d0 in esp_system_abort at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/esp_system/esp_system.c:128
  #2  0x40383611:0x3fc962f0 in abort at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/newlib/abort.c:46
  #3  0x4202f5d8:0x3fc96370 in task_wdt_isr at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/esp_system/task_wdt.c:176 (discriminator 3)
  #4  0x40378c55:0x3fc96390 in _xt_lowint1 at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/freertos/port/xtensa/xtensa_vectors.S:1111
  #5  0x4202a76f:0x3fca7280 in temp_sensor_read_raw at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/driver/esp32s3/rtc_tempsensor.c:122 (discriminator 2)
  #6  0x4202a7df:0x3fca72b0 in temp_sensor_read_celsius at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/driver/esp32s3/rtc_tempsensor.c:158
  #7  0x420254c6:0x3fca7300 in temperatureRead at /Users/jep/.platformio/packages/framework-arduinoespressif32/cores/esp32/esp32-hal-misc.c:73
  #8  0x4201676c:0x3fca7330 in AnalyticsService::_loop() at lib/framework/AnalyticsService.h:64 (discriminator 3)
  #9  0x42016979:0x3fca7870 in AnalyticsService::_loopImpl(void*) at lib/framework/AnalyticsService.h:48
EvEggelen commented 8 months ago

@jetpax I also noticed that the wifi connection was very unstable with my aliexpress nodemcu-32s board. When I added a small capacitor ( e.g. 10uF+ ) on the 5V rail the server was with the same SW stable. I expect that ESP32-sveltekit uses a lot of power in small peaks. The inductance of my USB cable limited the peak current to the board. You could also say the capacitor on the cloned nodemcu-32s board is too small...

One solution is using break-outboard that has extra capacitors on the supply rail. I used the one below and added the capacitor leads in the left top in the screw connection. ( GND and VIN are next to each-other).

image

The log you have was similar to what I had without the capacitor. But I hope we can still look into the code and figure out why the server stops responding. Although the capacitor improved the stability of the WiFi in my case, I expect this also happens when the WiFi is really bad. In my view bad WiFi should not result in a deadlock of the server. Without the capacitor I was able to reproduce this with multiple devices connection over WiFi to the ESP32.

EvEggelen commented 8 months ago

@theelims

Would it be an idea add a setting to control the WiFi TX power in the WiFi setup. If I am not mistaken there is also a powersaving option for WiFi ( Default, Off, minimum and Maximum )

theelims commented 8 months ago

@jetpax please try to stabilize the supply voltage first. ESP's are known to be very power hungry. I had similar issues with one board as well. You need to be able to supply 500mA peaks. You can hook-up a scope on your power rail and see whether it drops every few ms.

The crash you observed is coming from issues with reading the temperature sensor. I haven't seen this in my tests. But I'll observe whether I can observe traces of this like E (127354) tsens: Do not configure the temp sensor when it's running! in my own builds. But since one ADC is tightly coupled to the WiFi hardware and this is likely used for all internal ADC measurements it could be a brownout as well.

@EvEggelen I don't know if this is the right approach to fix poor hardware design.

EvEggelen commented 8 months ago

@theelims I agree with you that the supply voltage should be stable. This needs to be addressed.

Regarding adding a setting for changing the TX power. This is in that sense not related and not intended to solve this problem. To be honest I should have added this in a different ticket as feature request. My main motivation is to reduce radiation and power. When possible I reduce the WiFI power to the level that is required to function correctly ( with some margin ). I do this for my access point and so too. I see more IOT devices have this feature.

Regarding the dead-lock of the server. I expect this can also happen with a stable power supply when the WiFi signal is bad. Even with my stable power supply see errors that I do not understand. Luckily they do not result in a dead-lock of the server. When I have something concrete I will make a ticket of this issue.

theelims commented 8 months ago

There are a number of error messages coming from the AsnycTCP and AsnycWebServer libraries. Unfortunately they are largely unmaintained and rather buggy. I included the most stable versions and included all pull request I could gather from various forks that seemed usable. Plus I added some of my own. However, their queuing mechanism has a memory leak resulting in a crash if too much of these show up ERROR: Too many SSE messages queued for [192.168.1.125]. Same for websocket. Also increasing any message rate to > 20Hz is rather dangerous and results in a crash, too. But I couldn't locate the bug, yet.

EvEggelen commented 7 months ago

@theelims Great to hear that you included the latest fixes. I quickly look at the AsyncTCP, and indeed it seems unmaintained. But I see forks popping-up left and right. One of them is : https://github.com/TienHuyIoT/AsyncTCP that has fixes of 2 days ago. I assume you are aware of the "new" forks. Do you know if this is just a collection of fixes in the unhandled pull requests, or do they include new fixes ?

But these kind of issues related to message rate are in my view not really nice. Not sure what the exact root cause is, but based on what you indicate, I would not be amazed that if the Wifi connection is really crappy, this happens at lower rates too.

As I wanted to use ESP32-sveltekit as foundation for my new project I was hoping for a stable and maintained core. Do you know if there are stable and maintained alternatives for AsnycTCP and AsnycWebServer ?

theelims commented 7 months ago

I do follow the what I think best maintained forks https://github.com/OttoWinter/AsyncTCP and https://github.com/esphome/ESPAsyncWebServer

Since Many big ESP open source projects use these libraries I'm not overly concerned. As long as you keep out of the danger zone of fast websocket and SSE the code should be pretty reliable.

EvEggelen commented 7 months ago

After testing I notice that the current SW is not stable. This is even with "low" amount of messages. I noticed that the WiFi connection is bad, seen CORRUPT HEAP and ERROR: Too many SSE messages queued for [�␌�?␟]in the log. I initially expected it was the HW or WiFi reception/HW. I tested different boards, different (lab) power supplies but this does not seem to fix the issue.

As a test I compiled the original ESP8266 React project for the nodemcu-32s board and flashed it on the same HW ( functionality is simular ). So the only thing what is different is the SW on the device ( WiFi signal, power supply, board and physical location are all the same). In this case the SW is rock stable. I had 4 devices connected, went crazy with browser refreshes and let it run for hours ( with the LightState toggle in main) . I could not find any problems with that SW.

I find it hard to reproduce the issues deterministic with ESP32-sveltekit. But refreshing the page, connecting with both AP and Wifi normally triggers issues ( so with 2 devices). I sometimes also notice that the number of connected devices keeps increasing while the number of connected devices is the same. Is there also code to disconnect devices ( release memory) ? Could the bad Wifi issue a scheduling problem that conflicts with WiFi reception framework ( and that this triggers other issues) ? I also noticed power consumption is very different than from ESP8266 React ( stable 130mA) while with ESP32-sveltekit I have it seen dropping to ~20mA.

I would love to solve this issue. What is best to take as a next step. I can create some more logs. Would that help ?

Added code for the automatic LightState toggle.

void loop()
{
    static bool bLocalState = true;
    lightStateService.update([&](LightState &state)
                             {
                                 if (bLocalState == state.ledOn)
                                 {
                                     return StateUpdateResult::UNCHANGED; // lights were already on, return UNCHANGED
                                 }

                                 state.ledOn = bLocalState;
                                 return StateUpdateResult::CHANGED; // notify StatefulService by returning CHANGED
                             },
                             "timer");

    usleep(2 * 1000000);
    bLocalState = !bLocalState;
    // Delete Arduino loop task, as it is not needed in this example
    // vTaskDelete(NULL);
}

Logs:

rst:0x1 (POWERON_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0030,len:184
load:0x40078000,len:12732
␛[33m=> 0x40078000: ?? ??:0␛[0m
ho 0 tail 12 room 4
load:0x40080400,len:2908
␛[33m=> 0x40080400: _init at ??:?␛[0m
entry 0x400805c4
␛[33m=> 0x400805c4: ?? ??:0␛[0m
[    34][D][esp32-hal-cpu.c:244] setCpuFrequencyMhz(): PLL: 480 / 2 = 240 Mhz, APB: 80000000 Hz
[    71][W][WiFiGeneric.cpp:1403] setTxPower(): Neither AP or STA has been started
[    92][D][WiFiGeneric.cpp:1035] _eventCallback(): Arduino Event: 0 - WIFI_READY
[   190][V][WiFiGeneric.cpp:340] _arduino_event_cb(): STA Started
[   193][D][WiFiGeneric.cpp:1035] _eventCallback(): Arduino Event: [   STA][V][RTFiGener c94]p:][W] _iGeuinic.vpnt1042): evA tCalped
(): esp_wifi_set_ps failed
[   201][D][WiFiGeneric.cpp:1035] _eventCallback(): Arduino Event: 3 - STA_STOP
Running Firmware Version: 0.2.2
Connecting to WiFi.
[   532][D][WiFiGeneric.cpp:1035] _eventCallback(): Arduino Event: 0 - WIFI_READY
[   541][V][WiFiGeneric.cpp:340] _arduino_event_cb(): STA Started
[   542][D][WiFiGeneric.cpp:1035] _eventCallback(): Arduino Event: 2 - STA_START
[   543][V][WiFiGeneric.cpp:97] set_esp_interface_ip(): Configuring Station static IP: 0.0.0.0, MASK: 0.0.0.0, GW: 0.0.0.0
Starting software access point
[   581][V][WiFiGeneric.cpp:392] _arduino_event_cb(): AP Started
[   582][D][WiFiGeneric.cpp:1035] _eventCallback(): Arduino Event: 10 - AP_START
[   586][V][WiFiGeneric.cpp:97] set_esp_interface_ip(): Configuring SoftAP static IP: 192.168.4.1, MASK: 255.255.255.0, GW: 192.168.4.1
[   597][V][WiFiGeneric.cpp:143] set_esp_interface_ip(): SoftAP: 192.168.4.1 | Gateway: 192.168.4.1 | DHCP Start: 0.0.0.0 | Netmask: 255.255.255.0
[   608][V][WiFiGeneric.cpp:190] set_esp_interface_ip(): DHCP Server Range: 192.168.4.2 to 192.168.4.12
[  1079][V][WiFiGeneric.cpp:395] _arduino_event_cb(): AP Stopped
[  1080][D][WiFiGeneric.cpp:1035] _eventCallback(): Arduino Event: 11 - AP_S[ P1
81][V][WiFiGeneric.cpp:392] _arduino_event_cb(): AP Started
[  1087][D][WiFiGeneric.cpp:1035] _eventCallback(): Arduino Event: 10 - AP_START
Starting captive portal on 192.168.4.1
[  2884][V][WiFiGeneric.cpp:362] _arduino_event_cb(): STA Disconnected: SSID: SpeedTouch12, BSSID: b8:d5:26:d1:78:e1, Reason: 2
[  2885][D][WiFiGeneric.cpp:1035] _eventCallback(): Arduino Event: 5 - STA_DISCONNECTED
[  2892][W][WiFiGeneric.cpp:1057] _eventCallback(): Reason: 2 - AUTH_EXPIRE
[  2899][D][WiFiGeneric.cpp:1077] _eventCallback(): WiFi Reconnect Running
WiFi Disconnected. Reason code=2
WiFi connection dropped, stopping NTP.
[  3383][V][WiFiGeneric.cpp:362] _arduino_event_cb(): STA Disconnected: SSID: SpeedTouch12, BSSID: b8:d5:26:d1:78:e1, Reason: 203
[  3384][D[ Wi3iGe[V]iWicpGen035c cppent3a]lbark():o_eduito Eve: S 5 S STAedI
CONNECTED
[  3391][W][WiFiGeneric.cpp:1057] _eventCallback(): Reason: 203 - ASSOC_FAIL
WiFi Disconnected. Reason code=203
WiFi connection dropped, stopping NTP.
[  3409][D][WiFiGeneric.cpp:1035] _eventCallback(): Arduino Event: 3 - STA_STOP
Connecting to WiFi.
[ 30514][V][WiFiGeneric.cpp:340] _arduino_event_cb(): STA Started
[ 30514][D][WiFiGeneric.cpp:1035] _eventCallback(): Arduino Event: 2 - STA_START
[ 30515][V][WiFiGeneric.cpp:97] set_esp_interface_ip(): Configuring Station static IP: 0.0.0.0, MASK: 0.0.0.0, GW: 0.0.0.0
[ 31143][V][WiFiGeneric.cpp:355] _arduino_event_cb(): STA Connected: SSID: SpeedTouch12, BSSID: b8:d5:26:d1:78:e1, Channel: 3, Auth: WPA2_PSK
[ 31145][D][WiFiGeneric.cpp:1035] _eventCallback(): Arduino Event: 4 - STA_CONNECTED
WiFi Connected.
[ 32174][V][WiFiGeneric.cpp:369] _arduino_event_cb(): STA Got New IP:192.168.1.24
[ 32174][D][WiFiGeneric.cpp:1035] _eventCallback(): Arduino Event: 7 - STA_GOT_IP
[ 32177][D][WiFiGeneric.cpp:1098] _eventCallback(): STA IP: 192.168.1.24, MASK: 255.255.255.0, GW: 192.168.1.1
WiFi Got IP. localIP=192.168.1.24, hostName=esp32
Got IP address, starting NTP Synchronization
Starting NTP...
[ 33108][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[ 33108][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[ 33108][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[ 33110][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[ 33115][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[ 33120][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
New client connected to Event Source: 1 Clients connected
[ 37742][W][AsyncTCP.cpp:999] _poll(): rx timeout 4
Stopping captive portal
Stopping software access point
[ 41056][V][WiFiGeneric.cpp:395] _arduino_event_cb(): AP Stopped
[ 41058][V][WiFiGeneric.cpp:392] _arduino_event_cb(): AP Startedt: 11 - AP_STOP
[ 41064][D][WiFiGeneric.cpp:1035] _eventCallback(): Arduino Event: 10 - AP_START
[ 41071][V][WiFiGeneric.cpp:395] _arduino_event_cb(): AP Stopped
[ 41077][D][WiFiGeneric.cpp:1035] _eventCallback(): Arduino Event: 11 - AP_STOP
[ 99875][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
New client connected to Event Source: 1 Clients connected
CORRUPT HEAP: Bad head at 0x3ffdb172. Expected 0xabba1234 got 0x00103ffd

Log 2

WiFi Connected.
[421063][V][WiFiGeneric.cpp:369] _arduino_event_cb(): STA Got Same IP:192.168.1.148
[421063][D][WiFiGeneric.cpp:1035] _eventCallback(): Arduino Event: 7 - STA_GOT_IP
[421066][D][WiFiGeneric.cpp:1098] _eventCallback(): STA IP: 192.168.1.148, MASK: 255.255.255.0, GW: 192.168.1.1
WiFi Got IP. localIP=192.168.1.148, hostName=esp32.home
Got IP address, starting NTP Synchronization
Starting NTP...
[458360][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[458361][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[458362][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[458369][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
New client connected to Event Source: 1 Clients connected
[461813][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
CORRUPT HEAP: Bad tail at 0x3ffdff68. Expected 0xbaad5678 got 0xbaad569e
CORRUPT HEAP: Bad tail at 0x3ffdf2dc. Expected 0xbaad5678 got 0xbaad569e
[492229][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[495183][W][AsyncTCP.cpp:999] _poll(): rx timeout 4
[495189][W][AsyncTCP.cpp:999] _poll(): rx timeout 4
[495191][W][AsyncTCP.cpp:999] _poll(): rx timeout 4
New client connected to Event Source: 1 Clients connected

[ 45190][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[ 45190][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[ 45191][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[ 45203][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[ 45675][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[ 45675][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[ 45675][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[ 46158][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[ 48638][W][AsyncTCP.cpp:999] _poll(): rx timeout 4
[ 49638][W][AsyncTCP.cpp:999] _poll(): rx timeout 4
New client connected to Event Source: 3 Clients connected
New client connected to Event Source: 4 Clients connected
New client connected to Event Source: 5 Clients connected
New client connected to Event Source: 6 Clients connected
New client connected to Event Source: 7 Clients connected
CORRUPT HEAP: Bad tail at 0x3ffe17bc. Expected 0xbaad5678 got 0xbaad573d
CORRUPT HEAP: Bad tail at 0x3ffdfff0. Expected 0xbaad5678 got 0xbaad569f
CORRUPT HEAP: Bad tail at 0x3ffe2734. Expected 0xbaad5678 got 0xbaad573f
CORRUPT HEAP: Bad tail at 0x3ffe130c. Expected 0xbaad5678 got 0xbaad573f
CORRUPT HEAP: Bad tail at 0x3ffccf10. Expected 0xbaad5678 got 0xbaad569f
CORRUPT HEAP: Bad tail at 0x3ffdf67c. Expected 0xbaad5678 got 0xbaad5736
CORRUPT HEAP: Bad tail at 0x3ffdffd0. Expected 0xbaad5678 got 0xbaad5736
[4962027][W][AsyncTCP.cpp:976] _poll(): pcb is NULL
[4962028][W][AsyncTCP.cpp:976] _poll(): pcb is NULL

Log 3

[2762402][V][WiFiGeneric.cpp:362] _arduino_event_cb(): STA Disconnected: SSID: SpeedTouch12, BSSID: b8:d5:26:d1:78:e1, Reason: 2
[2762402][D][WiFiGeneric.cpp:1035] _eventCallback(): Arduino Event: 5 - STA_DISCONNECTED
[2762410][W][WiFiGeneric.cpp:1057] _eventCallback(): Reason: 2 - AUTH_EXPIRE
WiFi Disconnected. Reason code=2
WiFi connection dropped, stopping NTP.
[2762429][V][WiFiGeneric.cpp:343] _arduino_event_cb(): STA Stopped
[2762429][D][WiFiGeneric.cpp:1035] _eventCallback(): Arduino Event: 3 - STA_STOP
ERROR: Too many SSE messages queued for []
ERROR: Too many SSE messages queued for []
ERROR: Too many SSE messages queued for []
ERROR: Too many SSE messages queued for []
ERROR: Too many SSE messages queued for []
ERROR: Too many SSE messages queued for []
ERROR: Too many SSE messages queued for []
ERROR: Too many SSE messages queued for []
ERROR: Too many SSE messages queued for []
ERROR: Too many SSE messages queued for [�␌�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [�␌�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [�␌�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [�␌�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for []
ERROR: Too many SSE messages queued for [�␂�?␟]
ERROR: Too many SSE messages queued for [�      �?␟]
ERROR: Too many SSE messages queued for [�      �?␟]
ERROR: Too many SSE messages queued for [�      �?␟]
ERROR: Too many SSE messages queued for [�      �?␟]
ERROR: Too many SSE messages queued for [�      �?␟]
ERROR: Too many SSE messages queued for [�      �?␟]
ERROR: Too many SSE messages queued for [�      �?␟]
ERROR: Too many SSE messages queued for [�      �?␟]
ERROR: Too many SSE messages queued for [�      �?␟]
ERROR: Too many SSE messages queued for [$�?␟]
ERROR: Too many SSE messages queued for [$�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [�␌�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [�␌�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [<      �?␟]
ERROR: Too many SSE messages queued for [<      �?␟]
ERROR: Too many SSE messages queued for [<      �?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for []
ERROR: Too many SSE messages queued for [�␌�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [�␌�?␟]
ERROR: Too many SSE messages queued for [P
�?␟]
ERROR: Too many SSE messages queued for [P
theelims commented 7 months ago

@EvEggelen No need to do more tests. I think I know where the issue is. I make use of AsyncWebServers Server Send Messages (esp8266-react doesn't). This seems not so heavily used and still quite buggy. It will happen with Websockets as well. It's the queuing mechanism which does not release memory properly. At least once you start to stream analog sensor readings through it. I reviewed the code section many times, but couldn't find an obvious reason why it crashes.

I'm tracing this bug since a while, but couldn't get a hold of it yet.

EvEggelen commented 7 months ago

Currently I am testing the patch : https://github.com/esphome/ESPAsyncWebServer/pull/24. In the comments I found the following remark

"I have the feeling the hole code is not perfect. I can crash my ESP32 by just reloading the web site 5-6 times quickly. So I think the goal currently is not perfect code...but just a bit better."

So I do not expect miracles. But so far it seems indeed an improvement. So far I did not see the error message ERROR: Too many SSE messages queued for. With 2 clients refreshing, I see connect clients go up to 4, but after some time it goes back to 2 again. But the

CORRUPT HEAP: Bad tail at 0x3ffe3ae0. Expected 0xbaad5678 got 0xbaad569d

still pops-up :-(. Strangely enough, it does not take down the application.

jetpax commented 7 months ago

I came across a discussion about ESPAsyncWebserver that I found interesting, with code that supposedly fixes its shortcomings.

It was written by Phil Bowles who alas has since passed away, but seems solid.

Maybe worth looking at...

EvEggelen commented 7 months ago

@jetpax Thanks for posting this information. I watched all Phil videos and get the impression he did quit some work on this. Actually, he reports the exact same issues we are seeing ( instability ). If his info is valid, these ASync libs currently used in ESP32-sveltekit are so riddled with issues that I prefer not to use them in framework that has the goal to be used as a stable foundation. Also I got the impression he put a lot of effort in rewriting the whole stack and making a new stable foundation as a "bug" fix here and there will not do the job in the Async stack of libraries. Unfortunately, if he passed away ( where did you see this ), I am wondering if it is realistic to expect that his work will be maintained / matured.

Did anyone ever consider using the : https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/protocols/esp_http_server.html. I did a quick test and it seems to work. I expect this version is stable and will get the maintenance when needed. So potentially this is more future proof ?

jetpax commented 7 months ago

Yes @theelims, I agree with you re maturity/long term support of H4, (even though Phil's work seems to have been adopted (https://github.com/philbowles/h4plugins/compare/master...HamzaHajeir:h4plugins:master)) and think the espressif stack is the way to go, esp. as it has been around long enough to mature (IMO espressif tend to release early, release often) They also support MQTT V5.0 with topic aliases I believe, which is great for reducing data usage in mobile applications (which I am personally interested in) Can I ask why did you opt for Arduino framework rather than esp-idf, as I find it more solid in general?

theelims commented 7 months ago

@jetpax @EvEggelen I had a look at that H4 and it basically does the queuing length fix. A similar fix is in my esphome fork included as well. But instead of limiting the queue length to a fixed number H4 takes the free heap as a measure. I'll have a more detailed look at the code differences once I'm actively working on it. It won't be a drop-in replacement as H4 depreciated websockets and json support, which are essential.

That pull request from esphome you linked got me the idea to replace the whole queuing mechanism with the FreeRTOS xqueue to make it thread safe. Instead of the proposed mutex. I'll try that in the next weeks.

Why Arduino instead of ESP-IDF? This project aims to be the basis for a number of open source projects I'm participating / working on. And Arduino has a much lower barrier for contributors to work on the code. Secondly I forked a majority of the code from rjwats/esp8266-react project which had like 80% of the work I wanted to do already done.

Why ESPAsyncWebserver instead of native ESP-IDF http server? First and foremost because it was inherited from esp8266-react. However, I was working with this lib before and would've chosen it anyway. And despite some shortcomings is has one of the most powerful API's in that space. After the troubles it gave me I was looking into replacing ESPAsyncWebserver with the ESP-IDF one and concluded that this is above my current coding capabilities (I'm not a professional software developer) and it would likely take me months to implement. But it is not completely off the table. And I will happily accept any pull request achieving this. My personal pain is just not big enough (yet) to justify going this step.

The ESP-IDF libs are quite buggy as well. I failed to implement a Websocket Client Endoint (which I thought was straight forward) because the ESP lib kept crashing whenever I was doing something else with the payload then ESP_LOG. I postponed this to the release of the next major Arduino and ESP-IDF version which saw significant changes to that lib.

Also the maintainer of ESPAsyncWebserver is an employee of espressif now. He publicly announced that he can officially work on ESPAsnycWebserver after the release of the next major version of the Arduino framework. He stated, that he'll base his new lib on ESP-IDF. Maybe time will solve this for everyone.

Many big ESP32 open source projects work with ESPAsyncWebserver as stable and productive code.

Support for MQTT 5.0 Migrating to the ESP-IDF MQTT client is on my todo-list. According to the docs, this does support MQTT 5.0. However, I postponed this to the upcoming major version for the same reasons as the websocket client. My motivation is support for TLS/SSL. Especially as the new major versions of these libs support the certificate store which I already use for OTA. So full SSL support is also something I wish for this. This update should be doable with less ressources than changing the webserver.

EvEggelen commented 7 months ago

@theelims To evaluate the option I created a small test application with ESP-IDF running the IDF HTTP server. The content is virtually the same as from ESP32-sveltekit. I changed to script generating the WWWData to make the loading into the HTTP server light on the stack and heap. Also added some dummy pages ( e.g. events and features ) to get the pages show something. The rest of the backend is not there ( as it is a test application for the web based interface ).

The good thing is that this set is really stable. Meaning I tried all kind of things with it, but did not see crashes or other problems. My perception is that the loading speed is also increased. Doing a full refresh over my wifi with really bad reception on the ESP32 took 800ms. Looking in more detail I could see that these svelte pages try to open a lot of parallel connections ( 20+) to the server. So far it seems that web browsers do not close these connections quickly. So this gives issues when trying to connect from multiple hosts simultaneously. This seems logical as LWIP is configured by default on the ESP-IDF for max 10 open sockets. When enabling features from the IDF HTTP server to manage these open connection, serving this svelte content to multiple clients is also stable.

So far I could not find the SSE feature in the ESP HTTP server. I looked a bit into this, but it seems that the features to facilitate implementing this ( keep connection open) are in v5.3-dev only ( right now we have v5.1, so first v5.2 needs to go out). But websockets are there already ( going to test right now). To limited to amount of open sockets ( strain on memory) I was thinking to use one websocket only for both SSE and websockets. Basically have one single connection from the back-end and front-end. I expect that right now we have at least LWIP 2 sockets open ( one for SSE and one for the sockets) per client.

Long story short. Based on what I am seeing the ESP HTTP server seems to provide stable and performant functionality. It works both on the ESP-IDF and Arduino framework ( tested briefly though ). But as I have very little experience with svelte ( planning to learn) I was hoping I could get some help to adapt the front-end to a single websocket solution. Based on the learning we see if it is possible to move the whole solution to ESP32-sveltekit. Right now I prioritizing stability over feature-set. Would you be able to jump-start me a bit on the svelte front end architecture of ESP32-sveltekit ?

theelims commented 7 months ago

I think in a first step you can completely ignore the SSE. It's just adding WiFi RSSI reading on top. You might get some trash on the debug terminal, but it's not impacting the core functions. Changing that over from SSE to WS is just a few lines of code on the frontend.

However, the whole back-end heavily relies on the ESPAsyncWebserver API for the REST calls. It's not just about serving the static content. I expect that you run into the real troubles there.

Let me know how things progress.

theelims commented 7 months ago

@EvEggelen I just stumbled across https://github.com/hoeken/PsychicHTTP which is a brand new ESPAsyncWebserver compatible library based on the ESP-IDF HTTPserver. You migh give it a try instead of rolling your own.

It does not come with SSE either, but as you suggested that is easily switched over to websocket instead.

EvEggelen commented 7 months ago

@theelims Today I finished my tests. I have written new WebSocket typescript module for the frontend to enable the notification ( simulated SSE) and websocket communication over a single WebSocket. The interfaces are easy to use in new modules. I also fixed most of the warning in the front end too. The whole system seems to work well.

The backend is implemented from scratch on the IDF framework with the IDF HTTPServer. Not all rest interfaces are implemented yet, but the Demo page is working ( both rest interfaces and Websockets). Also I implemented the System Status page with all the Toasts and RSSI. The MQTT and NTP and Wifi control are so far not added to the IDF backend. But the stability of stability of the whole system is good ( during my limited tests).

Now I need to see if it possible to move the HTTPServer code into a class that is "compatible" with the Arduino ESP32 sveltkit code. But I guess the PsychicHttp is something close to what I have made/ plan to make. So my front-end ( to share the single socket with websockets and SSE) is still useful. I expect that my updated WWWData scripts will also help to reduce the strain stack and heap ( the old method gave an stack overflow on the IDF framework with default stack configuration). It will also only trigger a rebuild of the WWWData when the svelte source files are changed. This reduces the build time significantly on my machine during back-end development.

I will look into the PsychicHttp and let you know what I find. Thanks for the tip on PsychicHttp.

theelims commented 7 months ago

I expect that my updated WWWData scripts will also help to reduce the strain stack and heap ( the old method gave an stack overflow on the IDF framework with default stack configuration). It will also only trigger a rebuild of the WWWData when the svelte source files are changed. This reduces the build time significantly on my machine during back-end development.

@EvEggelen Are you willing to make a pull request for that updated build script?

EvEggelen commented 6 months ago

@theelims . I am willing to try. But this means that I first need to back-port it for the current code on github. I did add the #if for Arduino and IDF, but better to test this first before committing. I also need to update a few lines in the code to make sure the new prototype of the WWWData.h is used ( to make sure all the filenames are not first wrapped in a String and then used. )

What update/change is particular interesting to you ? To make the project stable we need in my view minimally replace the HTTP server...

theelims commented 6 months ago

@EvEggelen in particular I'm interested in the script being only triggered when there was a change in the frontend files.

For the remaining work I'm torn between just replacing the queuing mechanisms in ESPAsyncWebserver with xqueues or a full blown re-implementation with PsychicHTTP. I think PsychicHTTP is very promising, just not yet where it needs to be. And my C++ foo is not good enough to bridge that gap.

EvEggelen commented 6 months ago

@theelims Let me try to test and commit it this weekend. Currently I am looking at at the possibility using PsychicHTTP in the project and add the new modules I created. I see 3 options. 1) create a complete new back-end on IDF 2) put my HTTP server in the backend + modules 3) use PsychicHTTP + modules. Let me evaluate the options......

Typically the things I program are very timing critical and HW accelerated. So maybe the IDF framwork is better for me. But lets see.

theelims commented 6 months ago

@EvEggelen if you're capable to port the backend to ESP-IDF in any form I am very happy. Loosing SSE should be easily compensated by keeping a websocket live. With ESP-IDF I would appreciate the possibility to use TLS/SSL. Because the safety features are a laugh without SSL.

I'll happily do the front end part of this.

EvEggelen commented 6 months ago

@theelims Never published code publicly on github. I assume you need to give me access rights to create a branch on this project ? At least I got an error when pushing code.

theelims commented 6 months ago

@EvEggelen The right way would be that you fork this repository, create the branch there and then make a pull request to push your changes into here.

theelims commented 5 months ago

@EvEggelen how are you progressing with your PsychicHTTP port? My pain threshold has been reached by ESPAsyncWebserver. Is there anything how I can support you?

EvEggelen commented 5 months ago

@theelims. I feel your pain.... In the beginning I could not imagine that these kind of problems are caused by SW that is used so broadly. Anyway, regarding the porting. Some features were missing from PsychicHTTP ( put in a request for those features) . These are now implemented. Others are also moving from ESPAsyncWebserver toward PsychicHTTP. During this process some issues were found, and to my knowledge fixed.

As PsychicHTTP is not a drop in replacement, you need to do some work on the code to get it to work. I was working on it, but I stopped ( as I needed to wait for these new features). As more people are now working on porting from ESPAsyncWebserver toward PsychicHTTP it is now easier. Currently I am busy, but could re-start the work later.

But honestly I as still doubting if it makes sense to move completely to IDF framework. I already gained some experience with it, and must say, the more I use it, the more it makes sense to use it.

theelims commented 5 months ago

Yes, it is gaining momentum. This was so overdue. I found https://github.com/emsesp/EMS-ESP32 which has a branch that already did most of the hard work. I'll get inspired by @proddy and his progress. He did quite some drastic changes to esp8266-react though. But his HttpEndpoint file is so much more understandable then the original one. I'm confident, that it is possible to switch ESP32-SvelteKit to PsychicHTTP without much hassle. But I'll keep the multi-connection architecture for now.

I've started a branch for this and will continue to work on it over the next days.

What benefits do you see in going all in on ESP-IDF webserver? I'll convert the MQTT client to ESP-IDF for sure. That's on my todo list anyway.

proddy commented 5 months ago

I paused my IDF ports. With my optimized versions of AsyncWebServer and AsynTCP (ESP32 only) and the switch to espMqttClient I have very low heap mem consumption and performance is 2x of what I was seeing with PsychicHttp. What you could try is using my libraries as a drop in replacement and see if that helps with your stability.

theelims commented 5 months ago

@proddy I'm currently trying to get this running, but you've messed way too hard with ArduinoJSON for this to be a drop in replacement.

Also, do you make use of the Event Server and Websockets? High traffic on those two is my source of troubles. The server is quite stable. I use a slightly modified version of ESPAsyncWebserver and AsnycTCP from esphome.

proddy commented 5 months ago

@theelims the main branch still uses ArduinoJson 6 on the Async* libs if that is your struggle.

I did not "mess" anything up, I just spent 3 years further building on Rick's excellent framework, which is outdated and has a few bugs here and there, and enhanced it with more features - added scheduler, logging, telnet, ethernet etc. And optimized the web moving from react to preact, to vite/yarn and cutting space down by 50%. You can pick up some ideas from my code if you want. Async* and esp8266-react are based on ArduinoJson 5 so that's the first thing I would fix, like removing all the reference pointers to JsonObject etc

I use EventSource, but not WebSockets. The code is there in the WebLogService.

good luck!

theelims commented 5 months ago

@proddy I didn't meant to offend you with the messing around statement. I just had 20 Minutes and thought I would just swap libraries, hit compile and can see whether it'll crash in my project. Which wasn't the case because of dozens of compiler errors.

I was under the impression my system uses ArduinoJSON 6. But I'll double check. Thanks for that hint. I really enjoyed Ricks work. It's an amazing foundation. And I also like your work on it. I'll certainly draw a lot of inspiration from that as well.

EvEggelen commented 5 months ago

@theelims

regarding the bug.

This is really my bad. When porting my buildscript towards your project I forgot to include my latest fix... Sorry for that. Good that you noticed it.

What benefits do you see in going all in on ESP-IDF webserver?

So far the stability is very good with my test code. I expect to also use the resources more efficient when using that API directly. But this basically this is a complete rewrite of the back-end. Currently I am experimenting to move as much into data, bss and text sections and see what happens.

As I was also considering HTTPS, the number of concurrent connections need to be reduced on the ESP. Svelte kit splits the front-end up in a lot of small files and loads them in parallel. Do know a way to reduce the number of parallel loads when using Svelte-kit ? Is this something you can configure ?

@proddy Can you elaborate a bit more on the heap/memory usage on the different implementations ? I am really interested in seeing what the impact is of the different options.

theelims commented 5 months ago

@EvEggelen No problem, this happens. I found it after wondering why it wouldn't rebuild wwwdata.h despite having changed a few files.

Yes, SvelteKit generates a awful lot of files. There is a long open issue regarding that topic: https://github.com/sveltejs/kit/issues/3882 However, there doesn't seem to be a viable solution out there. This could become a bottle neck. For this simple demo project here I need 110 dedicated endpoints.

I'm in the middle of the porting process to PsychicHTTP. We will see how it turns out in terms of memory usage.

theelims commented 5 months ago

@EvEggelen I just pushed a commit to https://github.com/theelims/ESP32-sveltekit/tree/psychichttp which has everything including websocket ported over to PsychicHttp. Only did shallow testing so far, but all features appear to work. CORS pre-flight is not implemented yet.

I'll do a stress testing in the coming weeks and see how it copes in my actual projects and whether it is as stable as hoped.

@proddy Thank you so much for your work. Your https_36 branch helped me tremendously making this port.

proddy commented 5 months ago

glad it helped @theelims . I'll keep a close eye on your stress testing (and help where I can, just shout) as I didn't instantly see the benefits of switching from AsyncWS and AsyncTCP which is showing 10KB less heap and 2x more http response times. I created a small benchmark app and used https://github.com/mcollina/autocannon to stress test and compare, and autocannon-ui to show the results. I'm hoping when IDF 5.1.2 and Arduino 3 is stable I can switch to multi-core and balance the threads over the 2 ESP32 cores.

theelims commented 5 months ago

@EvEggelen Just wanted to give you a brief update on my progress. So far psychichttp has proved to be more stable and except for CORS preflight and WS authentication I have everything working. On the plus side, the RAM usage is much more favorable. With the ESPAsync libraries I couldn't load BLE libraries at the same time. With PsychicHttp this is possible and really a game changer for me.

@proddy regarding MQTT I'm departing from async-mqtt-client as well. But I wanted a more solid support for SSL/TLS, especially the possibility to include the mozilla root CA cert bundle. So I went down the "Psychic" route and took the API of asnyc-mqtt-client and backed it up with the ESP-IDF MQTT client: https://github.com/theelims/PsychicMqttClient Repository is still empty as I need to push the code with examples. It also supports MQTT over WS and does not limit the message size like most other Arduino MQTT clients. And for convenience I added a onTopic() event which works like a request endpoint in PsychicHttp. Just register a topic (aka subscribe) and receive the payload callback every time a message is received on that topic. I had been missing this feature direly on all other MQTT clients.

proddy commented 5 months ago

Glad you got it all ported over. I'm curious to what kind of response times you'll be seeing using Psychic as I still get it to load faster than my AsyncWS libs yet. The PsyhicMqttClient would be awesome if you can get it chunking and streaming over WS, will watch out for that. Let me know if I can help in anyway.

ps. I looked a your code, nice and solid. You use the xTask for the loop instead of the Arduino's loop() - was there a reason for this?

EvEggelen commented 5 months ago

@theelims Good the hear that you are making good progress. I am still tinkering with ESP IDF and see what is possible there. WS authentication is challenging with JWT ( if I am correct, the current code has nothing). You can put a token in the protocol field, but what you use today is too-large for that. I used a token in the protocol field and a cookie ( HttpOnly; path=/ws; SameSite=Strict ) for WS connect.

How much heap do you now have free ? Currently I have 200+kb free while running AP and station WiFi.

theelims commented 5 months ago

@proddy Just polishing over the lib with many different examples. I'll give you a note once the longer MQTT message over WS works.

Why a separate task and not the Arduino loop()? Separation of concern. In your project everything is very much entangled and organically grown. In my use case I want to have a solid foundation (more like an operating system) for a series of IoT devices. So easily propagating changes of the core from one project to an other was a major design consideration. Also I don't like the main loop. I mostly use it for testing only. For productive code each major module runs it's own task and communication happens between them.

@EvEggelen My code uses a search parameter to transport the JWT token. This is one of the last missing puzzle pieces for my PsychicHttp port. But JWT is meaningless without SSL. So this goes hand in hand. Let me check on the heap once my MQTT side hustle is done.

theelims commented 5 months ago

@proddy Have a look at https://github.com/theelims/PsychicMqttClient . I have added docs and a lot of examples. And I tested successfully MQTT over WSS (with SSL) and very large messages. Works like a charm.

Next I need to look into publishing them on the platformio registry.

proddy commented 5 months ago

I'll check it out. I also saw IDF finally implemented HTTP_ANY so the HttpEndpoint can be simplified and reduce the number of endpoints (if heap is a problem for your users)

theelims commented 5 months ago

@EvEggelen I got websocket with JWT authentication going. The JWT token is supplied as a search parameter (as with the old code). However, PsychichHttp has some quirks I needed to work around. I filed an issue https://github.com/hoeken/PsychicHttp/issues/73 on this. I can live with it for now, but it would be nicer if this would be consistent with the ESPAsnycWebserver behavior.

theelims commented 5 months ago

I'll check it out. I also saw IDF finally implemented HTTP_ANY so the HttpEndpoint can be simplified and reduce the number of endpoints (if heap is a problem for your users)

Good to see, however, it will take ages for that to trickle down into the Arduino world.

proddy commented 5 months ago

I'll check it out. I also saw IDF finally implemented HTTP_ANY so the HttpEndpoint can be simplified and reduce the number of endpoints (if heap is a problem for your users)

Good to see, however, it will take ages for that to trickle down into the Arduino world.

true. I use https://github.com/tasmota/platform-espressif32/releases to test out the latest IDFs

theelims commented 4 months ago

@EvEggelen It is done. Thank you very much for your contributions and discussions.