mathieucarbou / ESPAsyncWebServer

Asynchronous HTTP and WebSocket Server Library for ESP32, ESP8266 and RP2040
https://mathieu.carbou.me/ESPAsyncWebServer/
GNU Lesser General Public License v3.0
74 stars 16 forks source link

[BUG] ESP8266 Crash with websockets #85

Closed philfifi closed 2 weeks ago

philfifi commented 2 months ago

Description

Bonjour Mathieu,

The ESP8266 is crashing in various ways when using websocket (and some string manipulation).

The crash could be a WDT reset like this one

 ets Jan  8 2013,rst cause:4, boot mode:(3,6)

wdt reset

Or stack trace


Exception (3):
epc1=0x40233f7a epc2=0x00000000 epc3=0x00000000 excvaddr=0x4002b340 depc=0x00000000

>>>stack>>>

ctx: sys
sp: 3fffeb80 end: 3fffffb0 offset: 0150
3fffecd0:  3ffeed00 40104985 3ffed7b8 00000000  
3fffece0:  00000020 4021950d 3fff0014 00000000  
[…]

The conditions are:

Observations:

Affects:

Board

wemos D1 (esp8266) -> crash esp32-C3 (super mini) -> works

Stack trace It varies, here is an exemple:

0x40233f7a: umm_info at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/umm_malloc/umm_info.c:65
0x4021950d: glue2esp_linkoutput at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-esp/lwip-esp.c:301
0x4023415b: umm_fragmentation_metric at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/umm_malloc/umm_info.c:247
0x40209f60: EspClass::getHeapFragmentation() at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/Esp-frag.cpp:62
0x40201204: printStackSize() at /home/pluc/developpement/accelero_sonnette/test_just_async_websocket/src/main.cpp:45
0x40204021: AsyncWebSocketClient::text(char const*) at /home/pluc/developpement/accelero_sonnette/test_just_async_websocket/.pio/libdeps/esp8266/ESPAsyncWebServer/src/AsyncWebSocket.cpp:680
0x40201240: onWsEventEmpty(AsyncWebSocket*, AsyncWebSocketClient*, AwsEventType, void*, unsigned char*, unsigned int) at /home/pluc/developpement/accelero_sonnette/test_just_async_websocket/src/main.cpp:59
0x40210302: std::_Function_handler::_M_invoke(std::_Any_data const&, AsyncWebSocket*&&, AsyncWebSocketClient*&&, AwsEventType&&, void*&&, unsigned char*&&, unsigned int&&) at /home/pluc/.platformio/packages/toolchain-xtensa/xtensa-lx106-elf/include/c++/10.3.0/bits/std_function.h:293
0x4021c154: tcp_receive at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/tcp_in.c:1158
0x4010015e: std::function::operator()(AsyncWebSocket*, AsyncWebSocketClient*, AwsEventType, void*, unsigned char*, unsigned int) const at /home/pluc/.platformio/packages/toolchain-xtensa/xtensa-lx106-elf/include/c++/10.3.0/bits/std_function.h:623
0x40202960: AsyncWebSocket::_handleEvent(AsyncWebSocketClient*, AwsEventType, void*, unsigned char*, unsigned int) at /home/pluc/developpement/accelero_sonnette/test_just_async_websocket/.pio/libdeps/esp8266/ESPAsyncWebServer/src/AsyncWebSocket.cpp:768
0x4020395c: AsyncWebSocketClient::_onData(void*, unsigned int) at /home/pluc/developpement/accelero_sonnette/test_just_async_websocket/.pio/libdeps/esp8266/ESPAsyncWebServer/src/AsyncWebSocket.cpp:569
0x402039a5: _M_invoke at /home/pluc/.platformio/packages/toolchain-xtensa/xtensa-lx106-elf/include/c++/10.3.0/bits/std_function.h:293
0x4010031d: std::function::operator()(void*, AsyncClient*, void*, unsigned int) const at /home/pluc/.platformio/packages/toolchain-xtensa/xtensa-lx106-elf/include/c++/10.3.0/bits/std_function.h:623
0x4020ed92: std::__shared_ptr::get() const at /home/pluc/.platformio/packages/toolchain-xtensa/xtensa-lx106-elf/include/c++/10.3.0/bits/shared_ptr_base.h:1325
0x4020ed92: std::__shared_ptr_access::_M_get() const at /home/pluc/.platformio/packages/toolchain-xtensa/xtensa-lx106-elf/include/c++/10.3.0/bits/shared_ptr_base.h:1024
0x4020ed92: std::__shared_ptr_access::operator->() const at /home/pluc/.platformio/packages/toolchain-xtensa/xtensa-lx106-elf/include/c++/10.3.0/bits/shared_ptr_base.h:1018
0x4020ed92: AsyncClient::_recv(std::shared_ptr&, tcp_pcb*, pbuf*, int) at /home/pluc/developpement/accelero_sonnette/test_just_async_websocket/.pio/libdeps/esp8266/ESPAsyncTCP-esphome/src/ESPAsyncTCP.cpp:649
0x4021e69a: tcp_output at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/tcp_out.c:1362
0x4020edf8: AsyncClient::_s_recv(void*, tcp_pcb*, pbuf*, int) at /home/pluc/developpement/accelero_sonnette/test_just_async_websocket/.pio/libdeps/esp8266/ESPAsyncTCP-esphome/src/ESPAsyncTCP.cpp:741
0x4021d7f1: tcp_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/tcp_in.c:501
0x4021a449: mem_malloc at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/mem.c:210
0x4022274d: ip4_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/ipv4/ip4.c:1467
0x40100a35: umm_free_core at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/umm_malloc/umm_malloc.cpp:642
0x40219b11: ethernet_input_LWIP2 at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/netif/ethernet.c:188
0x40219924: git2glue_err at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-lwip/lwip-git.c:118
0x40219924: esp2glue_ethernet_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-lwip/lwip-git.c:494
0x402427f9: ethernet_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-esp/lwip-esp.c:365
0x4024280b: ethernet_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-esp/lwip-esp.c:373
0x40100d70: umm_init at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/umm_malloc/umm_malloc.cpp:569
0x401000ab: app_entry_redefinable at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:386
0x401003f4: ets_post at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:238
0x401003f4: ets_post at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:238
0x401003f4: ets_post at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:238
0x401003f4: ets_post at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:238
0x401003f4: ets_post at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:238
0x401003f4: ets_post at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:238
0x4020ac98: String::changeBuffer(unsigned int) at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/WString.cpp:253
0x4020ac6c: String::changeBuffer(unsigned int) at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/WString.cpp:246
0x4020e07f: run_scheduled_functions() at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/Schedule.cpp:171
0x40100a35: umm_free_core at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/umm_malloc/umm_malloc.cpp:642
0x40100fd9: realloc at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/umm_malloc/umm_malloc.cpp:1254
0x4020ac6c: String::changeBuffer(unsigned int) at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/WString.cpp:246
0x4020ad43: String::reserve(unsigned int) at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/WString.cpp:192
0x4020b0f4: String::concat(char const*, unsigned int) at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/WString.cpp:369
0x40100d92: free at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/umm_malloc/umm_malloc.cpp:688
0x4020b2ca: String::concat(char) at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/WString.cpp:385
0x401003f4: ets_post at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:238
0x40100415: esp_schedule at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:136
0x4020c362: loop_wrapper at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:264

Additional notes

While it affects all the forks tested, I opened the ticket just here, thanks Mathieu for maintaining this fork :-)

The reproducer is very minimal, plenty of heap is available:

 Heap free:34480 frag:6 maxFreeBlock:39664

Yes, I should not use ESP8266 anymore :-) Still, it looks like a software bug, triggered by the string manipulation and the websocket together, hence this report. I didn't managed to pinpoint further, lacking debugger connection with ESP8266 :-(

Reproducer: main.cpp

#include <Arduino.h>
#ifdef ESP8266
#include <ESP8266WiFi.h>
#endif
#ifdef ESP32
#include <WiFi.h>
#endif
#include <ESPAsyncWebServer.h>

void connect_wifi() {
    WiFi.mode(WIFI_STA);
    WiFi.begin("wifi", "wifi");

    Serial.print("Connecting");
    while (WiFi.status() != WL_CONNECTED) {
        delay(500);
        Serial.print(".");
    }
    Serial.println();

    Serial.print("Connected, IP address: ");
    Serial.println(WiFi.localIP());
}

void notFound(AsyncWebServerRequest *request) {
    request->send(404, "text/plain", "Not found");
}

AsyncWebServer server(80);
AsyncWebSocket ws("/ws");

// initial stack
char *stack_start;

void printStackSize() {
    char stack;
    Serial.print(F("stack size "));
    Serial.print(stack_start - &stack);
    Serial.print(F(" | Heap free:"));
    Serial.print(ESP.getFreeHeap());
#ifdef ESP8266
    Serial.print(F(" frag:"));
    Serial.print(ESP.getHeapFragmentation());
    Serial.print(F(" maxFreeBlock:"));
    Serial.print(ESP.getMaxFreeBlockSize());
#endif
    Serial.println();

}

void onWsEventEmpty(AsyncWebSocket * server, AsyncWebSocketClient * client, AwsEventType type, void * arg, uint8_t *data, size_t len) {
  printStackSize();
  // Just answer
  client->text("PONGTEST");
}

void serve_upload(AsyncWebServerRequest *request, const String& filename, size_t index, uint8_t *data, size_t len, bool final) {
  Serial.print("> onUpload ");
  Serial.print("index: "); Serial.print(index);
  Serial.print(" len:"); Serial.print(len);
  Serial.print(" final:"); Serial.print(final);
  Serial.println();
}

void setup() {
      char stack;
    stack_start = &stack;

    Serial.begin(115200);
    Serial.println("\n\n\nStart");
    Serial.printf("stack_start: %p\n", stack_start);

    connect_wifi();

    server.onNotFound(notFound);

    ws.onEvent(onWsEventEmpty);
    server.addHandler(&ws);

    server.on("/upload", HTTP_POST, [](AsyncWebServerRequest *request){
      request->send(200);
    }, serve_upload);

    server.begin();
    Serial.println("Server started");
}

String msg = "";
uint32_t count = 0;
void loop() {

    ws.cleanupClients();

    count += 1;

    // Concatenate some string, and clear it after some time
    static unsigned long millis_string = millis();
    if (millis() - millis_string > 1) {
      millis_string += 100;
      if (count % 100 == 0) {
        // printStackSize();
        // Serial.print(msg);
        msg = String();
      } else {
        msg += 'T';
      }
    }
}

Versions used:

Platform espressif8266 @ 4.2.1 (required: espressif8266 @ 4.2.1)
├── framework-arduinoespressif8266 @ 3.30102.0 (required: platformio/framework-arduinoespressif8266 @ ^3.30102)
├── tool-esptool @ 1.413.0 (required: platformio/tool-esptool @ <2)
├── tool-esptoolpy @ 1.30000.201119 (required: platformio/tool-esptoolpy @ ~1.30000.0)
├── tool-mklittlefs @ 1.203.210628 (required: platformio/tool-mklittlefs @ ~1.203.0)
├── tool-mkspiffs @ 1.200.0 (required: platformio/tool-mkspiffs @ ~1.200.0)
└── toolchain-xtensa @ 2.100300.220621 (required: platformio/toolchain-xtensa @ ~2.100300.220621)

Libraries
└── ESPAsyncWebServer @ 3.1.5 (required: mathieucarbou/ESPAsyncWebServer @ 3.1.5)
│   └── ESPAsyncTCP-esphome @ 2.0.0 (required: esphome/ESPAsyncTCP-esphome @ ^2.0.0)
philfifi commented 2 months ago

Code for websocket client:

#!/usr/bin/env python

import asyncio
import time

from websockets.asyncio.client import connect

async def hello():
    uri = "ws://192.168.14.248:80/ws"

    nb = 0

    async with connect(uri) as websocket:
        while(True):
            msg_trunk = "PINGTEST"
            print("WS send:", msg_trunk);
            await websocket.send(msg_trunk)

            ret = await websocket.recv()
            print(f"WS received", ret)
            nb += 1
            print("nb:", nb)
            time.sleep(0.1)

if __name__ == "__main__":
    asyncio.run(hello())
mathieucarbou commented 2 months ago

Your test case is not valid: you are fragmenting your heap until it exhaust and cannot allocate anymore.

Please work on a minimal reproductible test case only involving websocket, no upload code or stuff like string concatenation.

I wil close this issue as invalid for now.

mathieucarbou commented 2 months ago

Sorry I misunderstood the problem. But I don't see what can be done here, and it's also possible that this library is not at fault. I will leave this issue in triage until someone can prove there is a big in the library, with a valid stack trace we can work on. I am not using esp8266 (only have a D1 mini for testing).

Here the stack shows it crashes when using the heap metrics. So this is not related to the library.

philfifi commented 2 months ago

I didn't explained the String part. In my original code, the msg was used to store messages that exceeds the size of 1 callback. (index != 0). At the end of the message, the string was "cleared" to free the memory. In this minimal test case I reproduced a similar pattern.

From the log of the heap fragmentation, it appears to not fragment, and have > 35k available for the minimal block.

Sure, I can't confirm the lib is at fault here. Still the doubt remains because it crashes when doing websocket transfers, and works fine when doing GET/PUT/POST.

philfifi commented 2 months ago

Here another stack trace, which I admit has little value too :-(

<<<stack<<<

0x4022393b in ip_chksum_pseudo at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/inet_chksum.c:395
0x4021c2b9 in tcp_receive at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/tcp_in.c:1295 (discriminator 4)
0x4021d556 in tcp_process at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/tcp_in.c:984
 (inlined by) tcp_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/tcp_in.c:438
0x4022274d in ip4_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/ipv4/ip4.c:1467
0x40100a35 in umm_free_core at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/umm_malloc/umm_malloc.cpp:642
0x40219b11 in ethernet_input_LWIP2 at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/netif/ethernet.c:188
0x40219924 in git2glue_err at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-lwip/lwip-git.c:118
 (inlined by) esp2glue_ethernet_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-lwip/lwip-git.c:494
0x402427f9 in ethernet_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-esp/lwip-esp.c:365
0x4024280b in ethernet_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-esp/lwip-esp.c:373
0x4023f273 in ppPeocessRxPktHdr at ??:?
0x40243adf in ets_snprintf at ??:?
0x401055e5 in call_user_start_local at ??:?
0x401055eb in call_user_start_local at ??:?
0x40100d70 in umm_init at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/umm_malloc/umm_malloc.cpp:569
0x4010000d in call_user_start at ??:?
0x401000ab in app_entry_redefinable at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:386
0x402403c7 in pp_attach at ??:?
0x40240416 in pp_attach at ??:?
0x401003f4 in ets_post at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:238
0x4010226c in pp_post at ??:?
0x40104309 in lmacIsIdle at ??:?
0x401003f4 in ets_post at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:238
0x401003f4 in ets_post at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:238
0x40105503 in lmacRxDone at ??:?
0x4010226c in pp_post at ??:?
0x40105503 in lmacRxDone at ??:?
0x40102df7 in rcReachRetryLimit at ??:?
0x40102fd4 in rcReachRetryLimit at ??:?
0x40102fd4 in rcReachRetryLimit at ??:?
0x4010348e in wDev_ProcessFiq at ??:?
0x401003f4 in ets_post at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:238
0x4010226c in pp_post at ??:?
0x40105503 in lmacRxDone at ??:?
0x401003f4 in ets_post at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:238
0x401003f4 in ets_post at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:238
0x4010226c in pp_post at ??:?
0x40105503 in lmacRxDone at ??:?
0x4010226c in pp_post at ??:?
0x40102df7 in rcReachRetryLimit at ??:?
0x401003f4 in ets_post at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:238
0x40102fd4 in rcReachRetryLimit at ??:?
0x401003f4 in ets_post at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:238
0x4010348e in wDev_ProcessFiq at ??:?
0x401031b0 in wDev_ProcessFiq at ??:?
0x4020b2bb in String::concat(char) at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/WString.cpp:383
0x40201332 in loop at src/main.cpp:115
0x4020ac6c in String::changeBuffer(unsigned int) at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/WString.cpp:246 (discriminator 4)
0x4020ad43 in String::reserve(unsigned int) at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/WString.cpp:192 (discriminator 4)
0x4020b0f4 in String::concat(char const*, unsigned int) at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/WString.cpp:369
0x40100d92 in free at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/umm_malloc/umm_malloc.cpp:688
0x4020b2ca in String::concat(char) at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/WString.cpp:385
0x401003f4 in ets_post at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:238
0x40100415 in esp_schedule at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:136
0x4020c362 in loop_wrapper() at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:264
philfifi commented 2 months ago

Another one, with some malloc involved and string concatenation:

Could it be a bug in the String library?

0x402403c7 in pp_attach at ??:?
0x40100dc9 in malloc at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/umm_malloc/umm_malloc.cpp:912
0x40227985 in ieee80211_decap at ??:?
0x401005dc in pvPortMalloc at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/heap.cpp:360
0x4021a449 in mem_malloc at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/mem.c:210
0x4021ab0e in pbuf_alloc_LWIP2 at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/pbuf.c:284
0x4022c004 in sta_input at ??:?
0x402198ec in esp2glue_alloc_for_recv at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-lwip/lwip-git.c:478
0x402427cc in ethernet_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-esp/lwip-esp.c:354
0x4023f922 in pp_tx_idle_timeout at ??:?
0x4023f273 in ppPeocessRxPktHdr at ??:?
0x40243adf in ets_snprintf at ??:?
0x401055e5 in call_user_start_local at ??:?
0x401055eb in call_user_start_local at ??:?
0x40100d70 in umm_init at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/umm_malloc/umm_malloc.cpp:569
0x4010000d in call_user_start at ??:?
0x401000ab in app_entry_redefinable at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:386
0x402403c7 in pp_attach at ??:?
0x40240416 in pp_attach at ??:?
0x401003f4 in ets_post at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:238
0x4010226c in pp_post at ??:?
0x40104309 in lmacIsIdle at ??:?
0x40105451 in lmacTxFrame at ??:?
0x401003f4 in ets_post at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:238
0x4010269d in rcUpdateTxDone at ??:?
0x40105503 in lmacRxDone at ??:?
0x4010226c in pp_post at ??:?
0x401054eb in lmacTxFrame at ??:?
0x401046c6 in lmacRecycleMPDU at ??:?
0x40102fd4 in rcReachRetryLimit at ??:?
0x40104b37 in lmacRecycleMPDU at ??:?
0x40102fd4 in rcReachRetryLimit at ??:?
0x401045ff in lmacProcessTxSuccess at ??:?
0x401034c8 in wDev_ProcessFiq at ??:?
0x401003f4 in ets_post at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:238
0x401003f4 in ets_post at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:238
0x401031b0 in wDev_ProcessFiq at ??:?
0x4010226c in pp_post at ??:?
0x40105503 in lmacRxDone at ??:?
0x40102df7 in rcReachRetryLimit at ??:?
0x401003f4 in ets_post at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:238
0x4010226c in pp_post at ??:?
0x40105503 in lmacRxDone at ??:?
0x401003f4 in ets_post at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:238
0x40105d7d in ets_timer_disarm at ??:?
0x4020b0e3 in String::concat(char const*, unsigned int) at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/WString.cpp:364
0x4020b2ca in String::concat(char) at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/WString.cpp:385
0x40100a35 in umm_free_core at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/umm_malloc/umm_malloc.cpp:642
0x40100fd9 in realloc at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/umm_malloc/umm_malloc.cpp:1254
0x4020ac6c in String::changeBuffer(unsigned int) at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/WString.cpp:246 (discriminator 4)
0x4020ad43 in String::reserve(unsigned int) at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/WString.cpp:192 (discriminator 4)
0x4020b0f4 in String::concat(char const*, unsigned int) at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/WString.cpp:369
0x40100d92 in free at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/umm_malloc/umm_malloc.cpp:688
0x4020b2ca in String::concat(char) at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/WString.cpp:385
0x401003f4 in ets_post at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:238
0x40100415 in esp_schedule at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:136
0x4020c362 in loop_wrapper() at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:264
philfifi commented 2 months ago

Here the stack shows it crashes when using the heap metrics. So this is not related to the library.

Note that completely removing the heap metrics still makes the esp8266 to crash.

philfifi commented 2 months ago

I changed the test case a tiny bit to remove the dependency with string, and just keep malloc:


void * malloc_ptr;
void stress_loop_with_malloc_simple() {
    static uint32_t count = 0;

    if (count % 2 == 0) {
      malloc_ptr = malloc(15);
    } else {
      free(malloc_ptr);
    }

    count += 1;
}

And got a (more) interesting stack trace:

0x40100a35 in umm_free_core at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/umm_malloc/umm_malloc.cpp:642
0x40100d92 in free at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/umm_malloc/umm_malloc.cpp:688
0x4020230a in webSocketSendFrame(AsyncClient*, bool, unsigned char, bool, unsigned char*, unsigned int) at .pio/libdeps/esp8266/ESPAsyncWebServer/src/AsyncWebSocket.cpp:112
0x40100dc9 in malloc at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/umm_malloc/umm_malloc.cpp:912
0x4020242c in AsyncWebSocketMessage::send(AsyncClient*) at .pio/libdeps/esp8266/ESPAsyncWebServer/src/AsyncWebSocket.cpp:258 (discriminator 6)
0x40202b58 in AsyncWebSocketClient::_runQueue() at .pio/libdeps/esp8266/ESPAsyncWebServer/src/AsyncWebSocket.cpp:371
0x4020bf17 in operator new(unsigned int) at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/abi.cpp:41
0x40233e9e in umm_info at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/umm_malloc/umm_info.c:46
0x40203e16 in AsyncWebSocketClient::_queueMessage(std::shared_ptr<std::vector<unsigned char, std::allocator<unsigned char> > >, unsigned char, bool) at .pio/libdeps/esp8266/ESPAsyncWebServer/src/AsyncWebSocket.cpp:443
0x40203e40 in std::__shared_count<(__gnu_cxx::_Lock_policy)0>::~__shared_count() at /home/pluc/.platformio/packages/toolchain-xtensa/xtensa-lx106-elf/include/c++/10.3.0/bits/shared_ptr_base.h:732
 (inlined by) std::__shared_ptr<std::vector<unsigned char, std::allocator<unsigned char> >, (__gnu_cxx::_Lock_policy)0>::~__shared_ptr() at /home/pluc/.platformio/packages/toolchain-xtensa/xtensa-lx106-elf/include/c++/10.3.0/bits/shared_ptr_base.h:1183
 (inlined by) std::shared_ptr<std::vector<unsigned char, std::allocator<unsigned char> > >::~shared_ptr() at /home/pluc/.platformio/packages/toolchain-xtensa/xtensa-lx106-elf/include/c++/10.3.0/bits/shared_ptr.h:121
 (inlined by) AsyncWebSocketClient::text(std::shared_ptr<std::vector<unsigned char, std::allocator<unsigned char> > >) at .pio/libdeps/esp8266/ESPAsyncWebServer/src/AsyncWebSocket.cpp:667
0x40203f54 in std::__shared_count<(__gnu_cxx::_Lock_policy)0>::~__shared_count() at /home/pluc/.platformio/packages/toolchain-xtensa/xtensa-lx106-elf/include/c++/10.3.0/bits/shared_ptr_base.h:732
 (inlined by) std::__shared_ptr<std::vector<unsigned char, std::allocator<unsigned char> >, (__gnu_cxx::_Lock_policy)0>::~__shared_ptr() at /home/pluc/.platformio/packages/toolchain-xtensa/xtensa-lx106-elf/include/c++/10.3.0/bits/shared_ptr_base.h:1183
 (inlined by) std::shared_ptr<std::vector<unsigned char, std::allocator<unsigned char> > >::~shared_ptr() at /home/pluc/.platformio/packages/toolchain-xtensa/xtensa-lx106-elf/include/c++/10.3.0/bits/shared_ptr.h:121
 (inlined by) AsyncWebSocketClient::text(unsigned char const*, unsigned int) at .pio/libdeps/esp8266/ESPAsyncWebServer/src/AsyncWebSocket.cpp:671
0x4020a700 in Print::println() at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/Print.cpp:182
0x40203f8c in AsyncWebSocketClient::text(char const*, unsigned int) at .pio/libdeps/esp8266/ESPAsyncWebServer/src/AsyncWebSocket.cpp:676
0x40203fb5 in AsyncWebSocketClient::text(char const*) at .pio/libdeps/esp8266/ESPAsyncWebServer/src/AsyncWebSocket.cpp:680
0x40201248 in onWsEventEmpty(AsyncWebSocket*, AsyncWebSocketClient*, AwsEventType, void*, unsigned char*, unsigned int) at src/main.cpp:58
0x40210296 in std::_Function_handler<void (AsyncWebSocket*, AsyncWebSocketClient*, AwsEventType, void*, unsigned char*, unsigned int), void (*)(AsyncWebSocket*, AsyncWebSocketClient*, AwsEventType, void*, unsigned char*, unsigned int)>::_M_invoke(std::_Any_data const&, AsyncWebSocket*&&, AsyncWebSocketClient*&&, AwsEventType&&, void*&&, unsigned char*&&, unsigned int&&) at /home/pluc/.platformio/packages/toolchain-xtensa/xtensa-lx106-elf/include/c++/10.3.0/bits/std_function.h:293
0x4010015e in std::function<void (AsyncWebSocket*, AsyncWebSocketClient*, AwsEventType, void*, unsigned char*, unsigned int)>::operator()(AsyncWebSocket*, AsyncWebSocketClient*, AwsEventType, void*, unsigned char*, unsigned int) const at /home/pluc/.platformio/packages/toolchain-xtensa/xtensa-lx106-elf/include/c++/10.3.0/bits/std_function.h:623
0x40100a35 in umm_free_core at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/umm_malloc/umm_malloc.cpp:642
0x40100d92 in free at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/umm_malloc/umm_malloc.cpp:688
0x402028f0 in AsyncWebSocket::_handleEvent(AsyncWebSocketClient*, AwsEventType, void*, unsigned char*, unsigned int) at .pio/libdeps/esp8266/ESPAsyncWebServer/src/AsyncWebSocket.cpp:768
0x402038f0 in AsyncWebSocketClient::_onData(void*, unsigned int) at .pio/libdeps/esp8266/ESPAsyncWebServer/src/AsyncWebSocket.cpp:569
0x4022b759 in scan_parse_beacon at ??:?
0x40203939 in std::_Function_handler<void (void*, AsyncClient*, void*, unsigned int), AsyncWebSocketClient::AsyncWebSocketClient(AsyncWebServerRequest*, AsyncWebSocket*)::{lambda(void*, AsyncClient*, void*, unsigned int)#5}>::_M_invoke(std::_Any_data const&, void*&&, AsyncClient*&&, std::_Any_data const&, unsigned int&&) at /home/pluc/.platformio/packages/toolchain-xtensa/xtensa-lx106-elf/include/c++/10.3.0/bits/std_function.h:293
0x4010031d in std::function<void (void*, AsyncClient*, void*, unsigned int)>::operator()(void*, AsyncClient*, void*, unsigned int) const at /home/pluc/.platformio/packages/toolchain-xtensa/xtensa-lx106-elf/include/c++/10.3.0/bits/std_function.h:623
0x4020ed26 in std::__shared_ptr<ACErrorTracker, (__gnu_cxx::_Lock_policy)0>::get() const at /home/pluc/.platformio/packages/toolchain-xtensa/xtensa-lx106-elf/include/c++/10.3.0/bits/shared_ptr_base.h:1325
 (inlined by) std::__shared_ptr_access<ACErrorTracker, (__gnu_cxx::_Lock_policy)0, false, false>::_M_get() const at /home/pluc/.platformio/packages/toolchain-xtensa/xtensa-lx106-elf/include/c++/10.3.0/bits/shared_ptr_base.h:1024
 (inlined by) std::__shared_ptr_access<ACErrorTracker, (__gnu_cxx::_Lock_policy)0, false, false>::operator->() const at /home/pluc/.platformio/packages/toolchain-xtensa/xtensa-lx106-elf/include/c++/10.3.0/bits/shared_ptr_base.h:1018
 (inlined by) AsyncClient::_recv(std::shared_ptr<ACErrorTracker>&, tcp_pcb*, pbuf*, int) at .pio/libdeps/esp8266/ESPAsyncTCP-esphome/src/ESPAsyncTCP.cpp:649
0x4022c3ef in sta_input at ??:?
0x4020ed8c in AsyncClient::_s_recv(void*, tcp_pcb*, pbuf*, int) at .pio/libdeps/esp8266/ESPAsyncTCP-esphome/src/ESPAsyncTCP.cpp:741
0x4021d785 in tcp_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/tcp_in.c:501 (discriminator 1)
0x40248eb8 in etharp_output at ??:?
0x401005dc in pvPortMalloc at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/heap.cpp:360
0x402226e1 in ip4_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/ipv4/ip4.c:1467
0x40100a35 in umm_free_core at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/umm_malloc/umm_malloc.cpp:642
0x40219aa5 in ethernet_input_LWIP2 at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/netif/ethernet.c:188
0x402198b8 in git2glue_err at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-lwip/lwip-git.c:118
 (inlined by) esp2glue_ethernet_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-lwip/lwip-git.c:494
0x4024278d in ethernet_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-esp/lwip-esp.c:365
0x4024279f in ethernet_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-esp/lwip-esp.c:373
0x4023f207 in ppPeocessRxPktHdr at ??:?
0x40243a73 in ets_snprintf at ??:?
0x401055e5 in call_user_start_local at ??:?
0x401055eb in call_user_start_local at ??:?
0x40100d70 in umm_init at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/umm_malloc/umm_malloc.cpp:569
0x4010000d in call_user_start at ??:?
0x401000ab in app_entry_redefinable at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:386
0x4024035b in pp_attach at ??:?
0x402403aa in pp_attach at ??:?
0x401003f4 in ets_post at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:238
0x4010226c in pp_post at ??:?
0x4023f46b in ppTxPkt at ??:?
0x402290a7 in ieee80211_send_probereq at ??:?
0x4022aee3 in scan_remove_probe_ssid at ??:?
0x4022aab4 in scan_start at ??:?
0x4022d120 in cnx_start_handoff_cb at ??:?
0x40105ec1 in ets_timer_arm_new at ??:?
0x401003f4 in ets_post at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:238
0x4022d120 in cnx_start_handoff_cb at ??:?
0x40105d7d in ets_timer_disarm at ??:?
0x4022ca24 in chm_start_op at ??:?
0x4020bf11 in operator new(unsigned int) at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/abi.cpp:40
0x401003f4 in ets_post at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:238
0x40210300 in std::_Function_handler<void (AsyncWebServerRequest*, String const&, unsigned int, unsigned char*, unsigned int, bool), void (*)(AsyncWebServerRequest*, String const&, unsigned int, unsigned char*, unsigned int, bool)>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation) at /home/pluc/.platformio/packages/toolchain-xtensa/xtensa-lx106-elf/include/c++/10.3.0/bits/std_function.h:271
0x4010226c in pp_post at ??:?
0x40105503 in lmacRxDone at ??:?
0x401003f4 in ets_post at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:238
0x4010269d in rcUpdateTxDone at ??:?
0x4010226c in pp_post at ??:?
0x40105503 in lmacRxDone at ??:?
0x401003f4 in ets_post at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:238
0x40102df7 in rcReachRetryLimit at ??:?
0x4010226c in pp_post at ??:?
0x401003f4 in ets_post at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:238
0x4010226c in pp_post at ??:?
0x40105503 in lmacRxDone at ??:?
0x40102df7 in rcReachRetryLimit at ??:?
0x40102fd4 in rcReachRetryLimit at ??:?
0x401003f4 in ets_post at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:238
0x4010348e in wDev_ProcessFiq at ??:?
0x40105d7d in ets_timer_disarm at ??:?
0x401031b0 in wDev_ProcessFiq at ??:?
0x40101192 in cont_wrapper at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/cont.S:80
0x40101190 in cont_wrapper at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/cont.S:79
0x4020c2bc in loop_wrapper() at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:250
0x4020a700 in Print::println() at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/Print.cpp:182
0x4020a724 in Print::println(char const*) at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/Print.cpp:194
0x402016a9 in setup at src/main.cpp:90
0x40100a35 in umm_free_core at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/umm_malloc/umm_malloc.cpp:642
0x40100dc9 in malloc at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/umm_malloc/umm_malloc.cpp:912
0x401003f4 in ets_post at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:238
0x40100415 in esp_schedule at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:136
0x4020c2f6 in loop_wrapper() at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:264

Which shows l 112 of AsyncWebSocket.cpp:

  if (client->add((const char*)buf, headLen) != headLen) {
    // os_printf("error adding %lu header bytes\n", headLen);
    free(buf);
    // Serial.println("SF 4");
    return 0;
  }
  free(buf);
philfifi commented 2 months ago

Another trace mentioning websocket differently


0x40100cfe in free at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/umm_malloc/umm_malloc.cpp:688
0x40201dfe in std::_Sp_counted_ptr_inplace<std::vector<unsigned char, std::allocator<unsigned char> >, std::allocator<std::vector<unsigned char, std::allocator<unsigned char> > >, (__gnu_cxx::_Lock_policy)0>::_M_dispose() at /home/pluc/.platformio/packages/toolchain-xtensa/xtensa-lx106-elf/include/c++/10.3.0/bits/shared_ptr_base.h:561
0x402029ce in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)0>::_M_release() at /home/pluc/.platformio/packages/toolchain-xtensa/xtensa-lx106-elf/include/c++/10.3.0/bits/shared_ptr_base.h:316
 (inlined by) std::__shared_count<(__gnu_cxx::_Lock_policy)0>::~__shared_count() at /home/pluc/.platformio/packages/toolchain-xtensa/xtensa-lx106-elf/include/c++/10.3.0/bits/shared_ptr_base.h:733
 (inlined by) std::__shared_ptr<std::vector<unsigned char, std::allocator<unsigned char> >, (__gnu_cxx::_Lock_policy)0>::~__shared_ptr() at /home/pluc/.platformio/packages/toolchain-xtensa/xtensa-lx106-elf/include/c++/10.3.0/bits/shared_ptr_base.h:1183
 (inlined by) std::shared_ptr<std::vector<unsigned char, std::allocator<unsigned char> > >::~shared_ptr() at /home/pluc/.platformio/packages/toolchain-xtensa/xtensa-lx106-elf/include/c++/10.3.0/bits/shared_ptr.h:121
 (inlined by) AsyncWebSocketMessage::~AsyncWebSocketMessage() at .pio/libdeps/esp8266/ESPAsyncWebServer/src/AsyncWebSocket.h:130
 (inlined by) void __gnu_cxx::new_allocator<AsyncWebSocketMessage>::destroy<AsyncWebSocketMessage>(AsyncWebSocketMessage*) at /home/pluc/.platformio/packages/toolchain-xtensa/xtensa-lx106-elf/include/c++/10.3.0/ext/new_allocator.h:156
 (inlined by) void std::allocator_traits<std::allocator<AsyncWebSocketMessage> >::destroy<AsyncWebSocketMessage>(std::allocator<AsyncWebSocketMessage>&, AsyncWebSocketMessage*) at /home/pluc/.platformio/packages/toolchain-xtensa/xtensa-lx106-elf/include/c++/10.3.0/bits/alloc_traits.h:531
 (inlined by) std::deque<AsyncWebSocketMessage, std::allocator<AsyncWebSocketMessage> >::pop_front() at /home/pluc/.platformio/packages/toolchain-xtensa/xtensa-lx106-elf/include/c++/10.3.0/bits/stl_deque.h:1535
 (inlined by) AsyncWebSocketClient::_clearQueue() at .pio/libdeps/esp8266/ESPAsyncWebServer/src/AsyncWebSocket.cpp:308
0x40202c00 in AsyncWebSocketClient::_onAck(unsigned int, unsigned int) at .pio/libdeps/esp8266/ESPAsyncWebServer/src/AsyncWebSocket.cpp:339
0x401031b0 in wDev_ProcessFiq at ??:?
0x4021a324 in mem_free at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/mem.c:237
0x4022bdc2 in sta_input at ??:?
0x40202c29 in std::_Function_handler<void (void*, AsyncClient*, unsigned int, unsigned int), AsyncWebSocketClient::AsyncWebSocketClient(AsyncWebServerRequest*, AsyncWebSocket*)::{lambda(void*, AsyncClient*, unsigned int, unsigned int)#2}>::_M_invoke(std::_Any_data const&, void*&&, AsyncClient*&&, unsigned int&&, AsyncClient*&&) at /home/pluc/.platformio/packages/toolchain-xtensa/xtensa-lx106-elf/include/c++/10.3.0/bits/std_function.h:293
0x401002c9 in std::function<void (void*, AsyncClient*, unsigned int, unsigned int)>::operator()(void*, AsyncClient*, unsigned int, unsigned int) const at /home/pluc/.platformio/packages/toolchain-xtensa/xtensa-lx106-elf/include/c++/10.3.0/bits/std_function.h:623
0x4020eac6 in std::__shared_ptr<ACErrorTracker, (__gnu_cxx::_Lock_policy)0>::get() const at /home/pluc/.platformio/packages/toolchain-xtensa/xtensa-lx106-elf/include/c++/10.3.0/bits/shared_ptr_base.h:1325
 (inlined by) std::__shared_ptr_access<ACErrorTracker, (__gnu_cxx::_Lock_policy)0, false, false>::_M_get() const at /home/pluc/.platformio/packages/toolchain-xtensa/xtensa-lx106-elf/include/c++/10.3.0/bits/shared_ptr_base.h:1024
 (inlined by) std::__shared_ptr_access<ACErrorTracker, (__gnu_cxx::_Lock_policy)0, false, false>::operator->() const at /home/pluc/.platformio/packages/toolchain-xtensa/xtensa-lx106-elf/include/c++/10.3.0/bits/shared_ptr_base.h:1018
 (inlined by) AsyncClient::_sent(std::shared_ptr<ACErrorTracker>&, tcp_pcb*, unsigned short) at .pio/libdeps/esp8266/ESPAsyncTCP-esphome/src/ESPAsyncTCP.cpp:557
0x4020eb08 in AsyncClient::_s_sent(void*, tcp_pcb*, unsigned short) at .pio/libdeps/esp8266/ESPAsyncTCP-esphome/src/ESPAsyncTCP.cpp:756
0x4021d668 in tcp_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/tcp_in.c:469 (discriminator 1)
0x40222601 in ip4_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/ipv4/ip4.c:1467
0x401009a1 in umm_free_core at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/umm_malloc/umm_malloc.cpp:642
0x402199c5 in ethernet_input_LWIP2 at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/netif/ethernet.c:188
0x402197d8 in git2glue_err at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-lwip/lwip-git.c:118
 (inlined by) esp2glue_ethernet_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-lwip/lwip-git.c:494
0x40242279 in ethernet_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-esp/lwip-esp.c:365
0x4024228b in ethernet_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-esp/lwip-esp.c:373
0x4023ecf3 in ppPeocessRxPktHdr at ??:?
0x4024355f in ets_snprintf at ??:?
0x401055e5 in call_user_start_local at ??:?
0x401055eb in call_user_start_local at ??:?
0x40100cdc in umm_init at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/umm_malloc/umm_malloc.cpp:569
0x4010000d in call_user_start at ??:?
0x401000ab in app_entry_redefinable at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:386
0x4023fe47 in pp_attach at ??:?
0x4023fe96 in pp_attach at ??:?
0x401003c4 in ets_post at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:238
0x4010226c in pp_post at ??:?
0x4023ef57 in ppTxPkt at ??:?
0x40228fc7 in ieee80211_send_probereq at ??:?
0x4022ae03 in scan_remove_probe_ssid at ??:?
0x4022a9d4 in scan_start at ??:?
0x4022d040 in cnx_start_handoff_cb at ??:?
0x40105ec1 in ets_timer_arm_new at ??:?
0x4022c9cf in chm_start_op at ??:?
0x401003c4 in ets_post at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:238
0x4022c944 in chm_start_op at ??:?
0x40105d7d in ets_timer_disarm at ??:?
0x4022a998 in scan_start at ??:?
0x401003c4 in ets_post at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:238
0x4010269d in rcUpdateTxDone at ??:?
0x4010226c in pp_post at ??:?
0x40105503 in lmacRxDone at ??:?
0x401003c4 in ets_post at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:238
0x4010269d in rcUpdateTxDone at ??:?
0x4010226c in pp_post at ??:?
0x40105503 in lmacRxDone at ??:?
0x401003c4 in ets_post at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:238
0x40102df7 in rcReachRetryLimit at ??:?
0x4010226c in pp_post at ??:?
0x40102fd4 in rcReachRetryLimit at ??:?
0x4010226c in pp_post at ??:?
0x4010348e in wDev_ProcessFiq at ??:?
0x401031b0 in wDev_ProcessFiq at ??:?
0x401003c4 in ets_post at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:238
0x40100d38 in malloc at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/umm_malloc/umm_malloc.cpp:912
0x40100d35 in malloc at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/umm_malloc/umm_malloc.cpp:912
0x4020a640 in Print::println() at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/Print.cpp:182
0x4020a664 in Print::println(char const*) at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/Print.cpp:194
0x40201625 in setup at src/main.cpp:91
0x401009a1 in umm_free_core at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/umm_malloc/umm_malloc.cpp:642
0x40100d35 in malloc at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/umm_malloc/umm_malloc.cpp:912
0x401003c4 in ets_post at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:238
0x401003e5 in esp_schedule at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:136
0x4020c11a in loop_wrapper() at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_main.cpp:264
mathieucarbou commented 2 months ago

it crashes when doing websocket transfers,

No it crashes (first stack trace) when calling printStackTrace() from the websocket handler

0x40201204: printStackSize() at /home/pluc/developpement/accelero_sonnette/test_just_async_websocket/src/main.cpp:45

There is no tcp transfer involved at the time of the crash.

I will have a look at the other traces...

mathieucarbou commented 2 months ago

Here another stack trace, which I admit has little value too :-(

This one says it crashed during the string concat from main loop. So unrelated to this lib.

0x4020b2bb in String::concat(char) at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/WString.cpp:383
0x40201332 in loop at src/main.cpp:115
0x4020ac6c in String::changeBuffer(unsigned int) at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/WString.cpp:246 (discriminator 4)
0x4020ad43 in String::reserve(unsigned int) at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/WString.cpp:192 (discriminator 4)
0x40
mathieucarbou commented 2 months ago

Could it be a bug in the String library?

I would suspect your memory instead.

The String library is quite (should be) stable

0x4020b0e3 in String::concat(char const*, unsigned int) at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/WString.cpp:364
0x4020b2ca in String::concat(char) at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/WString.cpp:385
0x40100a35 in umm_free_core at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/umm_malloc/umm_malloc.cpp:642
0x40100fd9 in realloc at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/umm_malloc/umm_malloc.cpp:1254
0x4020ac6c in String::changeBuffer(unsigned int) at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/WString.cpp:246 (discriminator 4)
0x4020ad43 in String::reserve(unsigned int) at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/WString.cpp:192 (discriminator 4)
0x4020b0f4 in String::concat(char const*, unsigned int) at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/WString.cpp:369
0x40100d92 in free at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/umm_malloc/umm_malloc.cpp:688
0x4020b2ca in String::concat(char) at /home/pluc/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/WString.cpp:385
mathieucarbou commented 2 months ago

I changed the test case a tiny bit to remove the dependency with string, and just keep malloc:

Here you are not testing the lib, you are just stressing the malloc api.

I think you might be able to reproduce the same without the lib... Imagine you have a callback which is executed based on network event or gpio events, doing some free and malloc, and at the same time a loop doing teh same thing without any delay or more cpu work.

I just think you are stressing the esp memory allocator...

And because you see String or ESpASyncWS traces in the stack, you think it might come from there.

mathieucarbou commented 2 months ago

Another trace mentioning websocket differently

Well, it is also mentioning Arduino libraries, it is also mentionning String, etc etc.

It does not mean the bug is in one of the call stack. It crashes when freeing the memory, several times.

philfifi commented 2 months ago

Thanks for the 2nd pair of eyes :-) I was pointing the lib (websocket in particular) because that is the only remaining thing apart the malloc/free in the main loop ;-) (and I got the first crashes in my project when I added ws connections). All is working fine when only using (and hammering in the same conditions) GET/POST requests.

It is definitely something fishy with the memory (as per the various traces and WDT reset sometimes). Could that be the memory allocator of ESP8266 that is not completely interrupt friendly?

mathieucarbou commented 2 months ago

I don't know... Or the board that is old and corrupted ? I really don't know at this point.

philfifi commented 2 months ago

Or the board that is old and corrupted

3 different boards (with ESP8266) from different epochs fails the same way :-(

mathieucarbou commented 2 weeks ago

Hello,

I've just fixed a bug regarding string usages for ESP8266 (long time bug):

https://github.com/mathieucarbou/ESPAsyncWebServer/releases/tag/v3.3.16

If possible, please let me know if it solves the issue....

philfifi commented 2 weeks ago

Hello,

https://github.com/mathieucarbou/ESPAsyncWebServer/releases/tag/v3.3.16

Something fishy happens with this specific version :-( #132

mathieucarbou commented 2 weeks ago

@philfifi : do you still have websocket issues with 3.3.15 ?

philfifi commented 2 weeks ago

Yes, 3.3.15 suffers the same issue :-(

mathieucarbou commented 2 weeks ago

I added you test case in the examples here: https://github.com/mathieucarbou/ESPAsyncWebServer/blob/main/examples/Issue85/Issue85.ino

I am using AP mode instead.

flashed on a d1_mini, then using websocat ws://192.168.4.1/ws to connect to the websocket, I see PONGTEST.

image

I do not have any crash and the loop is running since 5 min now.

[env:esp8266]
platform = espressif8266
board = d1_mini
lib_deps = 
  bblanchon/ArduinoJson @ 7.2.0
  esphome/ESPAsyncTCP-esphome @ 2.0.0

Note that your python code is maybe sending too many websocket events (each 100 ms right ?) : on ESP32 I would say yes, but on an ESP8266, just forget about that! I did some perf test with another app once (WebSerial) and the limit on ESP8266 was 1 message per 500ms to not exhaust the message queue. But WebSerial is a bigger app than this use case.

mathieucarbou commented 2 weeks ago

Another test: sending a PING ws message each 500 ms:

image

Works fine.

Also trying each 100 ms:

> while true; do echo "PING"; sleep 0.1; done | websocat ws://192.168.4.1/ws

And it works dine too.

image

The numbers stay consistent.

philfifi commented 2 weeks ago

Hummm ... Can't reproduce it either with your testcase (with thousands of pings). I just reproduced with "my" original test case (in 37 pings), with the exact same board, and exact same python code. Let me investigate.

philfifi commented 2 weeks ago

I made it crash :-) Some notes:

It /could/ be explained by the code reading some uninitialised RAM, making the behavior different from reset. (but nothing looks like this in the code :-( )

philfifi commented 2 weeks ago

the limit on ESP8266 was 1 message per 500ms to not exhaust the message queue

Which message queue?

mathieucarbou commented 2 weeks ago

Which message queue?

When you send websocket messages, they go in a queue, which is dequeued depending on the speed of the board / network. This queue can be set to a different value (see the header).

mathieucarbou commented 2 weeks ago
  • It ran hundreds of ping fine separated by 500ms. It crashed on the 3rd ping separated by 100ms.
  • "sometimes" it don't crash for thousands of pings (at 100ms)
  • after a hard reset, it can crash in less than 100 pings.

that's too much for an esp8266. You are just doing a DoS on your board ;-) you cannot expect to have with ESPAsyncWebserver on a 1 core esp8266, a speed that is as fast as what a desktop browser can do with a recent multicore CPU and obviously i don't know how much more memory.

mathieucarbou commented 2 weeks ago

I will close this issue. ESP8266 / ESP32 boards have hardware / physical limitations, ESP8266 even more and that's not the kind of chip to use if one needs more rates per sec on websocket and more concurrency also.

philfifi commented 2 weeks ago
  • It ran hundreds of ping fine separated by 500ms. It crashed on the 3rd ping separated by 100ms.
  • "sometimes" it don't crash for thousands of pings (at 100ms)
  • after a hard reset, it can crash in less than 100 pings.

that's too much for an esp8266. You are just doing a DoS on your board ;-) you cannot expect to have with ESPAsyncWebserver on a 1 core esp8266, a speed that is as fast as what a desktop browser can do with a recent multicore CPU and obviously i don't know how much more memory.

I think we misunderstood each other here, I am not DoS'ing the board. A WS request is sent every 100ms. The board replies after ~1ms. Which leaves ~99ms for the board to do nothing between the requests. The (useful) payload is 8 bytes, we are talking about a bandwidth of 80 bytes per seconds. The board can do a couple order of magnitude more.

I see the queue WS_MAX_QUEUED_MESSAGES is defined to 8 for esp8266. Once, the board crashed after the 3rd ping (which means in 300ms), the queue is not even half-full here.

Moreover, the queue seems guarded by messages like "Too many messages queued: closing connection". I am fine if messages are dropped when the queue is full. But I am not fine with a crash :-(

philfifi commented 2 weeks ago

By the way, it just crashes with 0.5s between WS requests:

0.009 WS send: PINGTEST
0.012 WS received PONGTEST
nb: 1
0.512 WS send: PINGTEST
0.512 WS received PONGTEST
nb: 2
1.012 WS send: PINGTEST
1.012 WS received PONGTEST
nb: 3
1.512 WS send: PINGTEST
1.512 WS received PONGTEST
nb: 4
2.013 WS send: PINGTEST
2.013 WS received PONGTEST
nb: 5
2.513 WS send: PINGTEST
2.513 WS received PONGTEST
nb: 6
3.013 WS send: PINGTEST
3.013 WS received PONGTEST
nb: 7
3.514 WS send: PINGTEST
3.514 WS received PONGTEST
nb: 8
4.014 WS send: PINGTEST
4.014 WS received PONGTEST
nb: 9
4.514 WS send: PINGTEST
4.514 WS received PONGTEST
nb: 10
5.014 WS send: PINGTEST

The float value at the beginning of the line is the time (in s) of the message since the start of program.

In this exemple, no PONGTEST was received, the esp8266 crashed in between.

mathieucarbou commented 2 weeks ago

And you run exactly the same sample as the one I uploaded, with your modifications ?

Can you test in AP mode ?

Right now that's the only difference I see...

mathieucarbou commented 2 weeks ago

that's too much for an esp8266. You are just doing a DoS on your board ;-) you cannot expect to have with ESPAsyncWebserver on a 1 core esp8266, a speed that is as fast as what a desktop browser can do with a recent multicore CPU and obviously i don't know how much more memory.

I think we misunderstood each other here, I am not DoS'ing the board. A WS request is sent every 100ms. The board replies after ~1ms. Which leaves ~99ms for the board to do nothing between the requests. The (useful) payload is 8 bytes, we are talking about a bandwidth of 80 bytes per seconds. The board can do a couple order of magnitude more.

We don't seem to have the same results... For example, if I run with sleep 0.1 (code above) to send a WS message to the esp each 100ms, actually it was slowing down and I was only at 2-3 messages per second. So the reply will never exhaust the queue.

It would if we were testing sending messages fast from the loop (which is what I was talking about since ether beging - sorry).

But here this is from websocket to the esp (which replies).

Sorry! Forget what I said then about that!

philfifi commented 2 weeks ago

This crash was obtained with the exact file from Issue85/ in the master branch. Compiled with platformio

PACKAGES: 
 - framework-arduinoespressif8266 @ 3.30102.0 (3.1.2) 
 - tool-esptool @ 1.413.0 (4.13) 
 - tool-esptoolpy @ 1.30000.201119 (3.0.0) 
 - tool-mklittlefs @ 1.203.210628 (2.3) 
 - tool-mkspiffs @ 1.200.0 (2.0) 
 - toolchain-xtensa @ 2.100300.220621 (10.3.0)
Dependency Graph
|-- ESPAsyncWebServer @ 3.3.17
|-- ESP8266WiFi @ 1.0

platformio.ini:

[env:d1_mini]
platform = espressif8266
board = d1_mini
framework = arduino

lib_deps = 
 mathieucarbou/ESPAsyncWebServer@^3.3.17

Using AP mode, my PC connected directly to this AP. firmware.bin.gz

philfifi commented 2 weeks ago

We don't seem to have the same results... For example, if I run with sleep 0.1 (code above) to send a WS message to the esp each 100ms, actually it was slowing down and I was only at 2-3 messages per second. So the reply will never exhaust the queue.

Hum, that is an interesting difference. In all my experiments, the WS response was always very fast (~1 millisecond), and I never saw a slowdown.

mathieucarbou commented 2 weeks ago

This crash was obtained with the exact file from Issue85/ in the master branch. Compiled with platformio

PACKAGES: 
 - framework-arduinoespressif8266 @ 3.30102.0 (3.1.2) 
 - tool-esptool @ 1.413.0 (4.13) 
 - tool-esptoolpy @ 1.30000.201119 (3.0.0) 
 - tool-mklittlefs @ 1.203.210628 (2.3) 
 - tool-mkspiffs @ 1.200.0 (2.0) 
 - toolchain-xtensa @ 2.100300.220621 (10.3.0)
Dependency Graph
|-- ESPAsyncWebServer @ 3.3.17
|-- ESP8266WiFi @ 1.0

I have:

PLATFORM: Espressif 8266 (4.2.1) > WeMos D1 R2 and mini
HARDWARE: ESP8266 80MHz, 80KB RAM, 4MB Flash
PACKAGES: 
 - framework-arduinoespressif8266 @ 3.30102.0 (3.1.2) 
 - tool-esptool @ 1.413.0 (4.13) 
 - tool-esptoolpy @ 1.30000.201119 (3.0.0) 
 - toolchain-xtensa @ 2.100300.220621 (10.3.0)
Converting Issue85.ino
LDF: Library Dependency Finder -> https://bit.ly/configure-pio-ldf
LDF Modes: Finder ~ chain, Compatibility ~ soft
Found 48 compatible libraries
Scanning dependencies...
Dependency Graph
|-- ArduinoJson @ 7.2.0
|-- ESPAsyncTCP-esphome @ 2.0.0
|-- ESP8266WiFi @ 1.0
|-- src
Building in release mode
Compiling .pio/build/esp8266/src/Issue85.ino.cpp.o
Retrieving maximum program size .pio/build/esp8266/firmware.elf
Checking size .pio/build/esp8266/firmware.elf
Advanced Memory Usage is available via "PlatformIO Home > Project Inspect"
RAM:   [====      ]  36.8% (used 30180 bytes from 81920 bytes)
Flash: [===       ]  29.5% (used 308361 bytes from 1044464 bytes)

So same thing.... lol!

platformio.ini:

[env:d1_mini]
platform = espressif8266
board = d1_mini
framework = arduino

lib_deps = 
 mathieucarbou/ESPAsyncWebServer@^3.3.17

same!

and what is strange is that you are able to reproduce also with other boards right ? I only have the d1 mini at home...