mathieucarbou / ESPAsyncWebServer

Asynchronous HTTP and WebSocket Server Library for ESP32, ESP8266 and RP2040
https://mathieu.carbou.me/ESPAsyncWebServer/
GNU Lesser General Public License v3.0
74 stars 16 forks source link

[BUG] ESP8266 crashes if different pages are loaded on high frequency #70

Open lumapu opened 2 months ago

lumapu commented 2 months ago

Please make sure to go through the recommendations before opening a bug report:

https://github.com/mathieucarbou/ESPAsyncWebServer?tab=readme-ov-file#important-recommendations

done, set -D CONFIG_ASYNC_TCP_STACK_SIZE=4096 without any change

Description

I'm the maintainer and developer of AhoyDTU https://github.com/lumapu/ahoy. This project has some configuration pages, which are mostly communicating using AJAX. As some of the users mention that the ESP8266 is really unstable with other forks of the AsyncWebserver, I wanted to try this fork.

To produce the issue you simply have to click in the menu 2-3 times on a high frequency to get the ESP crashed.

Board

ESP8266 Wroom

Ethernet adapter used ?

no

Stack trace

I see two different behaviors:

Trace 1 ``` 0x4022cc39 in std::_Function_handler::_M_invoke(std::_Any_data const&, void*&&, AsyncClient*&&) at AsyncEventSource.cpp:? 0x401001e5 in std::function::operator()(void*, AsyncClient*) const at ??:? 0x40229dcc in AsyncClient::_close() at ??:? 0x4022a078 in AsyncClient::_s_recv(void*, tcp_pcb*, pbuf*, int) at ??:? 0x40255abc in tcp_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/tcp_in.c:542 (discriminator 1) 0x4025aaf9 in ip4_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/ipv4/ip4.c:1467 0x402524c1 in mem_malloc at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/mem.c:210 0x40100cce in umm_free_core at umm_malloc.cpp:? 0x40277438 in ppRecycleRxPkt at ??:? 0x40251b81 in ethernet_input_LWIP2 at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/netif/ethernet.c:188 0x40251994 in git2glue_err at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-lwip/lwip-git.c:118 (inlined by) esp2glue_ethernet_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-lwip/lwip-git.c:494 0x4027a5fd in ethernet_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-esp/lwip-esp.c:365 0x4027a60f in ethernet_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-esp/lwip-esp.c:373 0x40277067 in ppPeocessRxPktHdr at ??:? 0x4027b8f3 in ets_snprintf at ??:? 0x40105c4d in call_user_start_local at ??:? 0x40105c53 in call_user_start_local at ??:? 0x4010000d in call_user_start at ??:? 0x401000ab in app_entry_redefinable at ??:? 0x4024d65c in __ssputs_r at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf.c:232 0x4024d65c in __ssputs_r at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf.c:232 0x40238b2c in esp8266::MDNSImplementation::MDNSResponder::stcMDNSSendParameter::addDomainCacheItem(void const*, bool, unsigned short) at ??:? 0x40101050 in malloc at ??:? 0x4023ff0b in operator new(unsigned int) at ??:? 0x40239725 in esp8266::MDNSImplementation::MDNSResponder::_udpAppendBuffer(unsigned char const*, unsigned int) at ??:? 0x40239a2a in esp8266::MDNSImplementation::MDNSResponder::_writeMDNSRRDomain(esp8266::MDNSImplementation::MDNSResponder::stcMDNS_RRDomain const&, esp8266::MDNSImplementation::MDNSResponder::stcMDNSSendParameter&) at ??:? 0x40239be4 in esp8266::MDNSImplementation::MDNSResponder::_writeMDNSServiceDomain(esp8266::MDNSImplementation::MDNSResponder::stcMDNSService const&, bool, bool, esp8266::MDNSImplementation::MDNSResponder::stcMDNSSendParameter&) at ??:? 0x40239725 in esp8266::MDNSImplementation::MDNSResponder::_udpAppendBuffer(unsigned char const*, unsigned int) at ??:? 0x40239752 in esp8266::MDNSImplementation::MDNSResponder::_udpAppend8(unsigned char) at ??:? 0x4023986a in esp8266::MDNSImplementation::MDNSResponder::_write8(unsigned char, esp8266::MDNSImplementation::MDNSResponder::stcMDNSSendParameter&) at ??:? 0x40239b99 in esp8266::MDNSImplementation::MDNSResponder::_writeMDNSServiceDomain(esp8266::MDNSImplementation::MDNSResponder::stcMDNSService const&, bool, bool, esp8266::MDNSImplementation::MDNSResponder::stcMDNSSendParameter&) at ??:? 0x402781bb in pp_attach at ??:? 0x4027820a in pp_attach at ??:? 0x40278316 in pp_attach at ??:? 0x402772cb in ppTxPkt at ??:? 0x4026050f in ieee80211_output_pbuf at ??:? 0x4010618f in wdt_feed at ??:? 0x40251581 in glue2esp_linkoutput at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-esp/lwip-esp.c:301 0x402517af in new_linkoutput at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-lwip/lwip-git.c:272 0x40251c0e in ethernet_output at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/netif/ethernet.c:312 0x402592a5 in etharp_output_LWIP2 at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/ipv4/etharp.c:897 0x40100588 in ets_post at ??:? 0x40102ce5 in rcUpdateTxDone at ??:? 0x401028b4 in pp_post at ??:? 0x40100588 in ets_post at ??:? 0x40100588 in ets_post at ??:? 0x40102ce5 in rcUpdateTxDone at ??:? 0x401028b4 in pp_post at ??:? 0x40100588 in ets_post at ??:? 0x40102ce5 in rcUpdateTxDone at ??:? 0x401028b4 in pp_post at ??:? 0x40100588 in ets_post at ??:? 0x4010343f in rcReachRetryLimit at ??:? 0x401028b4 in pp_post at ??:? 0x40105b4b in lmacRxDone at ??:? 0x4010361c in rcReachRetryLimit at ??:? 0x4010343f in rcReachRetryLimit at ??:? 0x401028b4 in pp_post at ??:? 0x4010361c in rcReachRetryLimit at ??:? 0x40103ad6 in wDev_ProcessFiq at ??:? 0x40100588 in ets_post at ??:? 0x401063e5 in ets_timer_disarm at ??:? 0x40100588 in ets_post at ??:? 0x4010101f in free at ??:? 0x4010101c in free at ??:? 0x40228584 in Communication::loop()::{lambda(bool, CommQueue<(unsigned char)100>::queue_s const*)#1}::operator()(bool, CommQueue<(unsigned char)100>::queue_s const*) const at ??:? 0x4021344b in std::queue > >::message_s, std::deque > >::message_s, std::allocator > >::message_s> > >::queue > >::message_s, std::allocator > >::message_s> >, void>() at ??:? 0x40100574 in ets_post at ??:? 0x401067de in system_get_time at ??:? 0x40100588 in ets_post at ??:? 0x40100cce in umm_free_core at umm_malloc.cpp:? 0x4010101c in free at ??:? 0x4021b0aa in std::deque > >::message_s, std::allocator > >::message_s> >::~deque() at ??:? 0x40100cce in umm_free_core at umm_malloc.cpp:? 0x4010101c in free at ??:? 0x40225966 in PubMqtt > >::loop() at ??:? 0x4020cbae in ah::Scheduler::checkTicker() at ??:? 0x40100169 in std::function::queue_s const*)>::operator()(bool, CommQueue<(unsigned char)100>::queue_s const*) const at ??:? ```
Trace 2 ``` 0x4023ff20 in operator new(unsigned int) at ??:? 0x4022f6d2 in AsyncWebHeader& std::__cxx11::list >::emplace_back(String const&, String const&) at ??:? 0x4023ed55 in String::copy(char const*, unsigned int) at ??:? 0x4022f7a0 in AsyncWebServerResponse::addHeader(String const&, String const&) at ??:? 0x40222ed2 in RestApi > >::onApi(AsyncWebServerRequest*) at ??:? 0x4024d65c in __ssputs_r at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf.c:232 0x40248b4d in _printf_i at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf_i.c:196 (discriminator 1) 0x4024d65c in __ssputs_r at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf.c:232 0x4028e52c in etharp_output at ??:? 0x4028e54c in etharp_output at ??:? 0x4028e564 in etharp_output at ??:? 0x4028e574 in etharp_output at ??:? 0x4028e588 in etharp_output at ??:? 0x4028e5a4 in etharp_output at ??:? 0x4028e5bc in etharp_output at ??:? 0x4028e5cc in etharp_output at ??:? 0x4028e5dc in etharp_output at ??:? 0x4028e5f8 in etharp_output at ??:? 0x4028e620 in etharp_output at ??:? 0x4028e644 in etharp_output at ??:? 0x4028e664 in etharp_output at ??:? 0x4028e664 in etharp_output at ??:? 0x4028e644 in etharp_output at ??:? 0x4028e620 in etharp_output at ??:? 0x4028e5f8 in etharp_output at ??:? 0x4028e5dc in etharp_output at ??:? 0x4028e5cc in etharp_output at ??:? 0x4028e5bc in etharp_output at ??:? 0x4028e5a4 in etharp_output at ??:? 0x4028e588 in etharp_output at ??:? 0x4028e574 in etharp_output at ??:? 0x4028e564 in etharp_output at ??:? 0x4028e54c in etharp_output at ??:? 0x4028e52c in etharp_output at ??:? 0x4024aaf5 in _vsnprintf_r at /workdir/repo/newlib/newlib/libc/stdio/vsnprintf.c:71 (discriminator 4) 0x4024d65c in __ssputs_r at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf.c:232 0x40101282 in realloc at ??:? 0x40248b4d in _printf_i at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf_i.c:196 (discriminator 1) 0x4024d65c in __ssputs_r at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf.c:232 0x4024d598 in __ssputs_r at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf.c:179 0x4028d2a6 in etharp_output at ??:? 0x40248c7c in _printf_i at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf_i.c:246 0x4024d65c in __ssputs_r at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf.c:232 0x4028d2a6 in etharp_output at ??:? 0x4028d2a8 in etharp_output at ??:? 0x4028d2a6 in etharp_output at ??:? 0x4024d859 in _svfprintf_r at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf.c:528 0x40101050 in malloc at ??:? 0x40252520 in do_memp_malloc_pool at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/memp.c:255 0x4024aaf5 in _vsnprintf_r at /workdir/repo/newlib/newlib/libc/stdio/vsnprintf.c:71 (discriminator 4) 0x40100cce in umm_free_core at umm_malloc.cpp:? 0x40101282 in realloc at ??:? 0x4023ec7e in String::changeBuffer(unsigned int) at ??:? 0x40100cce in umm_free_core at umm_malloc.cpp:? 0x40100cce in umm_free_core at umm_malloc.cpp:? 0x4023f15a in String::concat(char const*, unsigned int) at ??:? 0x4022ebcc in std::__cxx11::_List_base >::_M_clear() at ??:? 0x4022f924 in AsyncWebServerResponse::_assembleHead(unsigned char) at ??:? 0x402781bb in pp_attach at ??:? 0x402781bb in pp_attach at ??:? 0x40100cce in umm_free_core at umm_malloc.cpp:? 0x40101282 in realloc at ??:? 0x4026050f in ieee80211_output_pbuf at ??:? 0x4022ec44 in void std::vector >::_M_realloc_insert(__gnu_cxx::__normal_iterator > >, char const*&) at ??:? 0x40101050 in malloc at ??:? 0x4023ed55 in String::copy(char const*, unsigned int) at ??:? 0x4022ec54 in void std::vector >::_M_realloc_insert(__gnu_cxx::__normal_iterator > >, char const*&) at ??:? 0x4022edb6 in AsyncWebServerRequest::addInterestingHeader(char const*) at ??:? 0x4023ed55 in String::copy(char const*, unsigned int) at ??:? 0x4022cc80 in _ZZN21AsyncWebServerRequest28_removeNotInterestingHeadersEvENKUlRK6StringE_clES2_$constprop$0 at WebRequest.cpp:? 0x4022cd4a in AsyncWebServerRequest::_removeNotInterestingHeaders() at ??:? 0x40230534 in AsyncCallbackWebHandler::handleRequest(AsyncWebServerRequest*) at ??:? 0x4022e4af in AsyncWebServerRequest::_parseLine() at ??:? 0x4022e5ea in AsyncWebServerRequest::_onData(void*, unsigned int) at ??:? 0x4010101c in free at ??:? 0x4022a018 in AsyncClient::_recv(std::shared_ptr&, tcp_pcb*, pbuf*, int) at ??:? 0x40100cce in umm_free_core at umm_malloc.cpp:? 0x4022a078 in AsyncClient::_s_recv(void*, tcp_pcb*, pbuf*, int) at ??:? 0x40255a65 in tcp_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/tcp_in.c:501 (discriminator 1) 0x402524c1 in mem_malloc at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/mem.c:210 0x40252520 in do_memp_malloc_pool at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/memp.c:255 0x4025aaf9 in ip4_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/ipv4/ip4.c:1467 0x402524c1 in mem_malloc at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/mem.c:210 0x40100cce in umm_free_core at umm_malloc.cpp:? 0x40277438 in ppRecycleRxPkt at ??:? 0x40251b81 in ethernet_input_LWIP2 at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/netif/ethernet.c:188 0x40251994 in git2glue_err at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-lwip/lwip-git.c:118 (inlined by) esp2glue_ethernet_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-lwip/lwip-git.c:494 0x4027a5fd in ethernet_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-esp/lwip-esp.c:365 0x4027a60f in ethernet_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-esp/lwip-esp.c:373 0x40277067 in ppPeocessRxPktHdr at ??:? 0x4027b8f3 in ets_snprintf at ??:? 0x40105c4d in call_user_start_local at ??:? 0x40105c53 in call_user_start_local at ??:? 0x4010000d in call_user_start at ??:? 0x401000ab in app_entry_redefinable at ??:? 0x4026bbfc in cont_ret at cont.S.o:? 0x4026bbad in cont_continue at cont.S.o:? 0x40100588 in ets_post at ??:? 0x401028b4 in pp_post at ??:? 0x40105b4b in lmacRxDone at ??:? 0x4010343f in rcReachRetryLimit at ??:? 0x4010361c in rcReachRetryLimit at ??:? 0x40103ad6 in wDev_ProcessFiq at ??:? 0x401037f8 in wDev_ProcessFiq at ??:? 0x40100588 in ets_post at ??:? 0x40238b2c in esp8266::MDNSImplementation::MDNSResponder::stcMDNSSendParameter::addDomainCacheItem(void const*, bool, unsigned short) at ??:? 0x40101050 in malloc at ??:? 0x4023ff0b in operator new(unsigned int) at ??:? 0x40239725 in esp8266::MDNSImplementation::MDNSResponder::_udpAppendBuffer(unsigned char const*, unsigned int) at ??:? 0x40239a2a in esp8266::MDNSImplementation::MDNSResponder::_writeMDNSRRDomain(esp8266::MDNSImplementation::MDNSResponder::stcMDNS_RRDomain const&, esp8266::MDNSImplementation::MDNSResponder::stcMDNSSendParameter&) at ??:? 0x40239be4 in esp8266::MDNSImplementation::MDNSResponder::_writeMDNSServiceDomain(esp8266::MDNSImplementation::MDNSResponder::stcMDNSService const&, bool, bool, esp8266::MDNSImplementation::MDNSResponder::stcMDNSSendParameter&) at ??:? 0x40239725 in esp8266::MDNSImplementation::MDNSResponder::_udpAppendBuffer(unsigned char const*, unsigned int) at ??:? 0x40239752 in esp8266::MDNSImplementation::MDNSResponder::_udpAppend8(unsigned char) at ??:? 0x4023986a in esp8266::MDNSImplementation::MDNSResponder::_write8(unsigned char, esp8266::MDNSImplementation::MDNSResponder::stcMDNSSendParameter&) at ??:? 0x40239b99 in esp8266::MDNSImplementation::MDNSResponder::_writeMDNSServiceDomain(esp8266::MDNSImplementation::MDNSResponder::stcMDNSService const&, bool, bool, esp8266::MDNSImplementation::MDNSResponder::stcMDNSSendParameter&) at ??:? 0x402781bb in pp_attach at ??:? 0x4027820a in pp_attach at ??:? 0x40278316 in pp_attach at ??:? 0x402772cb in ppTxPkt at ??:? 0x4026050f in ieee80211_output_pbuf at ??:? 0x4010618f in wdt_feed at ??:? 0x402781bb in pp_attach at ??:? 0x40100588 in ets_post at ??:? 0x401063e5 in ets_timer_disarm at ??:? 0x401028b4 in pp_post at ??:? 0x40103938 in wDev_ProcessFiq at ??:? 0x40251581 in glue2esp_linkoutput at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-esp/lwip-esp.c:301 0x40100588 in ets_post at ??:? 0x401028b4 in pp_post at ??:? ```

Additional notes

I'd like to switch to this AsyncWebserver in future. As my project is multiplatform I already tested with success on ESP32. For now I use the esphome fork, but there the ESP8266 also feels unstable (that is reported by many users).

vortigont commented 2 months ago

this is probably due to mem shortage and not a bug in server, advise to monitor your heap size and fragmentation. Or better switch to ESP32.

mathieucarbou commented 2 months ago

@lumapu : I know the project (using OpenDTU myself). The ESP8266 stack traces are quite ugly compared to the ones platformio produced for rsp32.

Questions :

1) did you measure the heap usage (and free heep) in some of the functions listed above before they allocate ? Like @vortigont said th second trace feels like failure to allocate

2) do you have this bug with only this fork, or also with the original one and the one from younodebox (see readme) ?

mathieucarbou commented 2 months ago

@lumapu correct me if I am wrong but I dont see any usage in your project Pio file of this fork, neither in your dev branch... Are you opening this issue in the right location? ;-) Ref: https://github.com/lumapu/ahoy/blob/main/src/platformio.ini

lumapu commented 2 months ago

did you measure the heap usage (and free heep) in some of the functions listed above before they allocate ? Like @vortigont said th second trace feels like failure to allocate

no, not directly at this end, but I have some function to read it during operation via API. For ESP8266 there is the field max_free_block which reads for your fork 9136 bytes and for the esphome 9672 bytes - both after a few clicks in the WebUI. The free heap is in the same region 9600 and 9800.

do you have this bug with only this fork, or also with the original one and the one from younodebox (see readme) ?

checked again the esphome fork. No crash was produceable. Then seconds after a new compiled version with your fork which crashes really fast.

correct me if I am wrong but I dont see any usage in your project Pio file of this fork, neither in your dev branch... Are you opening this issue in the right location? ;-)

Good point. I need to do some basic test before delivering new software to the folks. It was not published using your fork - I have only a feature branch localy. Yesterday I found your fork and wanted to test it immediately. It works like a charm on ESP32 but on ESP8266 I see some problems.

I really appreciate that you want to maintain the AsyncWebserver and completly read your discussion with @egnor. Some month ago I was searching for a better maintained fork coming from the younodebox one and found esphome. Since yesterday I know yours and hope that I can use it in near future.

mathieucarbou commented 2 months ago

@lumapu thank you for these details!

Can you please confirm: you are then using esphome/ESPAsyncTCP-esphome @ 2.0.0 in both your tests and you are then just swapping esphome/ESPAsyncWebServer with mathieucarbou/ESPAsyncWebServer, right ?

The Async lib behind stays the same, but just the ESPAsyncWebServer changes ?

Also, if I look at the traces, you are using SSE, not WebSocket, right ?

I suspect that the difference in heap usage is due to the recent change from @vortigont: the project included a custom-made implementation of a forward linked list, which was replaced with std::list which is bi-directional and allows for constant time additions and removal. So the little heap usage increase is expected.

We are both not using ESP8266 on a daily basis so it would help a lot if you had the opportunity to create a minimal reproductible test case in an .ino file we could add to the project.

If the Async lib stays the same, but only the ESPAsyncWebServer fork is swapped, it would be interesting to find the issue indeed.

The only big changes from ESPHome fork regarding SSE are in commits bb4eb89c8e028005ef84f875417d32ca095147e7 and 48968b5be5ffc7dd0b763752e8e7255fbc6c2871 for SSE (@vortigont fyi) - not considering the more common api (request / response / handlers)

vortigont commented 2 months ago

9k of heap is definitely too low to work reliably even with a single connection. Running on the edge its is just a matter of time when you'll hit the out of mem issue. If your project if so memory stressed then I would not target for 8266 at all. Sorry, but this chip is too old to invest considerable time to optimize the code for such a limited conditions. This is just my opinion.

mathieucarbou commented 2 months ago

@lumapu : you would need to measure the free heap just before the allocation requests that are failing (line see see in the stack traces ).

@vortigont : what I do not get is why it works with the ESPhome fork. I agree with you that the free memory is too low and this is asking for problems, but the difference between the 2 forks in terms of memory usage is low.

I was wondering if one of the 2 commits could have introduced a side effect not thought of. Sincerely I do not see any right now that is why I was asking for your second option.

proddy commented 2 months ago

but this chip is too old to invest considerable time to optimize the code for such a limited conditions. This is just my opinion.

I agree, and I had to make the same hard choice on my projects. The ESP8266 is 10 years old now and you can easily swap it with an ESP32 for less then 2 euros which has a lot more memory, power, cores etc

vortigont commented 2 months ago

what I do not get is why it works with the ESPhome fork

that's what I mean - there might be something very specific indeed that could be investigated and even probably fixed or optimized, but to do this on 8266 - nah... have more things to invest time and efforts into :) As I see from traces it fails on malloc or new and vfprintf around, so the most probable cause is mem constrains indeed, either for heap of for stack. I do not have working SSE example to test on, never used it actually, mostly done the changes heuristically. I can try dig into this a bit, but if some minimal reproducible example code provided.

mathieucarbou commented 2 months ago

I can try dig into this a bit, but if some minimal reproducible example code provided.

I agree: without more effort from @lumapu to pinpoint a bit more the issue and have a minimal reproductible use case proving any issue from the library, we cannot do anything but suspect a memory constrain as shown in the stack trace.

@lumapu : you should monitor your free heap at key points where memory is allocated (before these malloc / new / vfprintf calls. CONFIG_ASYNC_TCP_STACK_SIZE is for AsyncTCP, which you are not using since you are on ESP8266. There is no task and stack size to configure.

mathieucarbou commented 2 months ago

@lumapu : could you please walk me through your project and tell me exactly which API ou are calling, with which kind of data when it fails ?

I guess it all start here, but please be more specific. https://github.com/lumapu/ahoy/blob/main/src/web/web.h

I am willing to help more, but the lack of information you give is not helping ;-)

Specifically, what I am searching for, is if a change in method signature regarding PROGMEM usage could have made it so that the content is now not read from flash but loaded into ram.

So I need to know what exactly you are using for the ESpAsync API.

As I understand right now, your html pages are generated with a python script and their type is const uint8_t {}[] PROGMEM right ?

And you are using beginResponse_P to serve them ?

So the method which is called is:

AsyncWebServerResponse *beginResponse_P(int code, const String& contentType, const uint8_t * content, size_t len, AwsTemplateProcessor callback=nullptr);`

which is implemented in ESPHome fork and original repo as:

AsyncWebServerResponse * AsyncWebServerRequest::beginResponse_P(int code, const String& contentType, const uint8_t * content, size_t len, AwsTemplateProcessor callback){
  return new AsyncProgmemResponse(code, contentType, content, len, callback);
}

In our repo, this method is deprecated and redirected:

    [[deprecated("Replaced by beginResponse(...)")]]
    AsyncWebServerResponse* beginResponse_P(int code, const String& contentType, const uint8_t* content, size_t len, AwsTemplateProcessor callback = nullptr) {
      return beginResponse(code, contentType, content, len, callback);
    }

and goes to:

AsyncWebServerResponse* AsyncWebServerRequest::beginResponse(int code, const String& contentType, const uint8_t* content, size_t len, AwsTemplateProcessor callback) {
  return new AsyncProgmemResponse(code, contentType, content, len, callback);
}

Can you please have a deeper look at the method signatures used like this example ?

Thanks!

lumapu commented 2 months ago

Can you please confirm: you are then using esphome/ESPAsyncTCP-esphome @ 2.0.0 in both your tests and you are then just swapping esphome/ESPAsyncWebServer with mathieucarbou/ESPAsyncWebServer, right ?

From my understanding this comes with the Webserver, in my ´platformio.inithere is no extra point for this. The only other dependency I can see ishttps://github.com/me-no-dev/ESPAsyncUDP` which is used for NTP.

The Async lib behind stays the same, but just the ESPAsyncWebServer changes ?

I only change line 29 in my platformio.ini which points to the AsyncWebserver repositiory.

Also, if I look at the traces, you are using SSE, not WebSocket, right ?

Not completly shure what you mean, let my discribe how it's done in Ahoy: Almost all pages are static html which loads the data dynamically using AJAX. Only the webconsole is using a websocket.

We are both not using ESP8266 on a daily basis so it would help a lot if you had the opportunity to create a minimal reproductible test case in an .ino file we could add to the project.

I can try to do so - give me some time - I don't want to waste too much time in ESP8266 (as you also mentioned 😉)

lumapu commented 2 months ago

9k of heap is definitely too low to work reliably even with a single connection. Running on the edge its is just a matter of time when you'll hit the out of mem issue. If your project if so memory stressed then I would not target for 8266 at all. Sorry, but this chip is too old to invest considerable time to optimize the code for such a limited conditions. This is just my opinion.

full ack - the ESP8266 was the chip where I started at and somehow it is possible to run the most recent software of Ahoy on it, but sadly not with this fork. It's not high prio for me but anyway it would be cool if it is supported. I know that the memory is too low on ESP8266, but this by design, the chip has not more 😉. Web applications alwasys become really big once they need to be nice.

lumapu commented 2 months ago

@lumapu : you would need to measure the free heap just before the allocation requests that are failing (line see see in the stack traces ).

correct, it's measured and stored until it's transfered to WebUI by JSON-API

lumapu commented 2 months ago

@lumapu : could you please walk me through your project and tell me exactly which API ou are calling, with which kind of data when it fails ?

That's not that easy. I random click on different menu items in the WebUI and from time to time it crashes. I can try to do a screen video to describe better.

I guess it all start here, but please be more specific. https://github.com/lumapu/ahoy/blob/main/src/web/web.h

I am willing to help more, but the lack of information you give is not helping ;-)

I'm sorry for that - I will help as much as I can. You guys are that fast - I really apreciate it. I was talking about the development branch, which is more than 200 commits apart from main: https://github.com/lumapu/ahoy/tree/development03

Specifically, what I am searching for, is if a change in method signature regarding PROGMEM usage could have made it so that the content is now not read from flash but loaded into ram.

Maybe this line: https://github.com/lumapu/ahoy/blob/83b386deda9a25ed5279b1efb720b52d33859aef/src/web/web.h#L378

So I need to know what exactly you are using for the ESpAsync API.

As I understand right now, your html pages are generated with a python script and their type is const uint8_t {}[] PROGMEM right ?

yes that's correct. The python script is used to do some preprocessor and translation things. Also some generic content like menu and footer are included.

And you are using beginResponse_P to serve them ?

yes, I think so: https://github.com/lumapu/ahoy/blob/83b386deda9a25ed5279b1efb720b52d33859aef/src/web/web.h#L248

So the method which is called is:

AsyncWebServerResponse *beginResponse_P(int code, const String& contentType, const uint8_t * content, size_t len, AwsTemplateProcessor callback=nullptr);`

which is implemented in ESPHome fork and original repo as:

AsyncWebServerResponse * AsyncWebServerRequest::beginResponse_P(int code, const String& contentType, const uint8_t * content, size_t len, AwsTemplateProcessor callback){
  return new AsyncProgmemResponse(code, contentType, content, len, callback);
}

In our repo, this method is deprecated and redirected:

    [[deprecated("Replaced by beginResponse(...)")]]
    AsyncWebServerResponse* beginResponse_P(int code, const String& contentType, const uint8_t* content, size_t len, AwsTemplateProcessor callback = nullptr) {
      return beginResponse(code, contentType, content, len, callback);
    }

and goes to:

AsyncWebServerResponse* AsyncWebServerRequest::beginResponse(int code, const String& contentType, const uint8_t* content, size_t len, AwsTemplateProcessor callback) {
  return new AsyncProgmemResponse(code, contentType, content, len, callback);
}

yes, I was notified by the deprecation and renamed the beginResponse_P calls to beginResponse. Maybe I missed something around this change. Do I need to change anything else than the function name?

Thank you for all your efforts, it feels really professional here

mathieucarbou commented 2 months ago

Do I need to change anything else than the function name?

No... Just changing the name is enough. This is the same signature and implementation behind like explained.

lumapu commented 2 months ago

I started another (private) project using this AsyncWebserver again. This project does not include websockets for now. Even if I request pages on a high frequency no crash was seen so far. I will further try to dig around this to get better information.

The behavior feels the same as described in newer issue:

mathieucarbou commented 2 months ago

@lumapu : ws implementation in this fork is relying on the std::shared_ptr<std::vector<uint8_t>> mechanism from youbox-node fork which is not in original repo and esphome fork... Maybe a lead ?

lumapu commented 2 months ago

@mathieucarbou I did not use the youbox-node-fork for a long time. Thanks for the hint - I think I can easily switch for a test the Webserver library to youbox-node and see if the issue is still there.

As a subscriber of esphome fork I heard about the following, maybe it could be related to my problem or at least an improvment:

mathieucarbou commented 2 months ago

You are using SSE ?

mathieucarbou commented 1 month ago

@mathieucarbou I did not use the youbox-node-fork for a long time. Thanks for the hint - I think I can easily switch for a test the Webserver library to youbox-node and see if the issue is still there.

As a subscriber of esphome fork I heard about the following, maybe it could be related to my problem or at least an improvment:

@lumapu I have included this patch in this version => v3.2.3

mathieucarbou commented 1 month ago

Hi @lumapu , In latest version, I fixed an issue in the method overload for ESP8266 (regarding the PGM usage). You were using the methods with const uint8_t* content, not char*, so I guess this fix won't help much, but I wanted to drop a note just in case ;-)

mathieucarbou commented 2 weeks ago

Hello,

I've just fixed a bug regarding string usages for ESP8266 (long time bug):

https://github.com/mathieucarbou/ESPAsyncWebServer/releases/tag/v3.3.17

If possible, please let me know if it solves the issue....