me-no-dev / ESPAsyncWebServer

Async Web Server for ESP8266 and ESP32
3.7k stars 1.21k forks source link

High load websockets causing crashes + SOLUTION #1334

Open hoeken opened 1 year ago

hoeken commented 1 year ago

I'm writing a server that handles websocket connections using ESPAsyncWebserver and ArduinoJSON. As part of that, I was testing to see how many messages it could handle and I was just generally having things crash a lot. Often, I would get a message like this:

++++ Error when running speed test and sending ping messages.  Looks like async_tcp just got too busy.

E (19871) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (19871) task_wdt:  - async_tcp (CPU 0/1)
E (19871) task_wdt: Tasks currently running:
E (19871) task_wdt: CPU 0: IDLE
E (19871) task_wdt: CPU 1: IDLE
E (19871) task_wdt: Aborting.

abort() was called at PC 0x40106b70 on core 0

After digging around online, I found this good suggestion on Stackoverflow.

Based on that, I implemented something in my code sort of like below. After that, the crashes are pretty much gone unless I try to push it past 150 messages per second. Its more than happy to chug along at 100 messages per second.

When you do a lot of work in the callback, you run the risk of a watchdog timer reset. If you change the callback to basically just take the message and mark it for processing, then that callback becomes very fast. You then process the message and respond in your main loop which can take as long as you like.

I haven't seen anything in the documentation, so I thought I'd share this with you all as it might save you some headache.

As a side note, running the ESPAsyncWebserver + websockets + ArduinoJSON + serving static html+js+css through SPIFFS is such a killer combo. You can build really nice, responsive websites. I absolutely cannot believe that this thing is running on a little microcontroller. So good. If you want to see how I'm doing it, check out Yarrboard.

CircularBuffer<WebsocketRequest*, YB_RECEIVE_BUFFER_COUNT> wsRequests;

void loop()
{
  //sometimes websocket clients die badly.
  ws.cleanupClients();

  //process our websockets outside the callback.
  if (!wsRequests.isEmpty())
    handleWebsocketMessageLoop(wsRequests.shift());
}

void handleWebSocketMessage(void *arg, uint8_t *data, size_t len, AsyncWebSocketClient *client)
{
  AwsFrameInfo *info = (AwsFrameInfo *)arg;
  if (info->final && info->index == 0 && info->len == len && info->opcode == WS_TEXT)
  {
    if (!wsRequests.isFull())
    {
      WebsocketRequest* wr = new WebsocketRequest;
      wr->client_id = client->id();
      strlcpy(wr->buffer, (char*)data, YB_RECEIVE_BUFFER_LENGTH); 
      wsRequests.push(wr);
    }
  }
}

void handleWebsocketMessageLoop(WebsocketRequest* request)
{
  char jsonBuffer[MAX_JSON_LENGTH];
  StaticJsonDocument<2048> output;
  StaticJsonDocument<1024> input;

  DeserializationError err = deserializeJson(input, request->buffer);

  String mycmd = input["cmd"] | "???";

  if (err)
    generateErrorJSON(output, error);
  else
    handleReceivedJSON(input, output, YBP_MODE_WEBSOCKET, request->client_id);

  if (output.size())
  {
    serializeJson(output, jsonBuffer);

    if (ws.availableForWrite(request->client_id))
      ws.text(request->client_id, jsonBuffer);
    else
      Serial.println("[socket] client full");
  }

  delete request;
}
Belleson commented 1 year ago

That's an interesting solution, I like it, thanks for posting! I agree with your comment about how a really nice app is possible with the combination of libraries you mention.

My web socket app does not have a high message rate, just some knobs and buttons on a browser for human interaction. At first I had similar watchdog timer crashes and learned (after days of debug) that doing almost anything in the message handler causes the watchdog to bite.

My solution is maybe not so fast as yours but maybe more flexible. The message handler triggers a timer Ticker (with 1millisec delay) that calls a function and passes a JSON argument. The function asynchronously processes the arg, then calls a client notifier to socket.textAll to update all clients. It requires some care to prevent successive timer calls from short-circuiting previous calls but it's not possible to hang the main loop.

It has not rebooted itself in nearly a year of adding code. Here is a sample message handler, where the incoming message has an action and optional value

Ticker socketActionTicker;
void handleSocketMsg(void *arg, uint8_t *data, size_t len)
{
    AwsFrameInfo *info = (AwsFrameInfo *)arg;
    if (info->final && info->index == 0 && info->len == len && info->opcode == WS_TEXT)
    {
        const uint8_t size = JSON_OBJECT_SIZE(8);
        StaticJsonDocument<size> json;
        DeserializationError err = deserializeJson(json, data);
        if (err)
        {
#if DEBUG_OUTPUT
            Serial.printf("\ndeserializeJson() failed with code %s", err.c_str());
            void * dataStr[256];
            memset(dataStr, 0, 256);
            memcpy(dataStr, data, 255);
            Serial.printf("\nSocket message from client:\n%s\n", dataStr);
            serializeJsonPretty(json, Serial);
#endif
            return;
        }

#if DEBUG_OUTPUT
        Serial.println("\nSocket message from client:\n");
        serializeJsonPretty(json, Serial);
#endif

        const char *action = json["action"];
        if (strcmp(action, "setInput") == 0)
        {
            uint_fast8_t newIn = json["value"];
            if (newIn != input)
                socketActionTicker.once_ms(1, setInput, newIn);
        }
        else if (strcmp(action, "setPwrDn") == 0)
        {
            socketActionTicker.once_ms(1, powerDn);
        }
        else if (strcmp(action, "setPwrUp") == 0)
        {
            socketActionTicker.once_ms(1, powerUp);
        }
        else if (strcmp(action, "getInput") == 0)
        {
            socketActionTicker.once_ms(1, sendInput);
        }
        else if (strcmp(action, "getPwrState") == 0)
        {
            uint_fast8_t powerState = digitalRead(POWR); // state is read from hardware
            socketActionTicker.once_ms(1, sendPwrSts);

        }
    }
}

//  and a client notification
void notifyClients(JsonDocument& jsonDoc)
{
#if MONITOR_STATUS
    Serial.printf("\nENTERING notifyClients(jsonDoc)");
#endif

#if DEBUG_OUTPUT
    Serial.printf("\n");
    serializeJsonPretty(jsonDoc, Serial);
#endif
    char buffer[512];
    size_t len = serializeJson(jsonDoc, buffer);
    socket.textAll(buffer, len);
}
hoeken commented 11 months ago

So, quick update here.

At high message rates, it was still occasionally crashing (like every 12h or so, at 100 messages/sec)

I changed to this fork (https://github.com/nthd/ESPAsyncWebServer) that has some stability fixes applied and that helped quite a bit. It still crashes occasionally under higher load, but not nearly as much. Under light load I haven't seen any crashes in a while.

ninja- commented 10 months ago

see #1363 to see the code that causes mentioned crash

ninja- commented 10 months ago

looks like the fork also suffers from this bug

ninja- commented 10 months ago

after closer look, the fork will crash but after much higer load as you said...it might be related to active number of connections. maybe in between calls ws.cleanupClients which I do only once per second. unfortunately, platformio esp exception decoder does not seem to be able to decode the crash even with debug build

ninja- commented 10 months ago

I believe this will be fixed after applying this set of patches https://github.com/Aircoookie/WLED/issues/3382#issuecomment-1751363805, which I plan to do soon on top of nthd fork

ninja- commented 10 months ago

I've done some more digging, which might be helpful to someone. After getting the exception decoder working, abort during stress tests was always coming down to lack of memory so allocation crash.

There doesn't seem to be any leak when connection is open and websockets data is streamed, but there is a per-connection leak after connection is closed. I am not able to trace it, and I haven't found any information on it in in any of the forks.

So a simple while true; do curl esp32/anyurl; sleep 0.2; done will always result in an out-of-memory crash

I will be looking for more luck with Espressif native websockets server....

hoeken commented 10 months ago

@ninja- funny you say that.... I've been working on a new project that is a wrapper in the style of ESPAsyncWebServer but uses the native ESP-IDF server calls under the hood. literally just wrote most of it in the last week but its functional and almost feature complete.

Here's the link: https://github.com/hoeken/PsychicHTTP/

ninja- commented 10 months ago

@hoeken that's some really nice code - looks like we've been duplicating effort of porting the websockets demo code to c++, a quite slow and painful process :)

hoeken commented 10 months ago

@ninja- thanks man. yeah the esp-idf library is great, but its not exactly user friendly. it does make a fantastic base for a proper HTTP server though.

I don't know where you are in your process, but maybe we can collaborate. I just have a few things I'd like to sort out (uploads) before I make a v1.0.0 release of my library. My personal project Yarrboard has now been running on it for a couple days flawlessly, and I also did a load test on the example code which went well . There are definitely some performance gains to be made, but its definitely acceptable for now, and more importantly rock solid.

ninja- commented 10 months ago

I've got caught up in a couple issues today:

Sooo only now I got it to compile and upload with espressif 1.5.2, so I am looking forward to testing the stability with WiFI, BLE and WebSockets enabled...

hoeken commented 10 months ago

@ninja- have you seen the new v3.0 beta of the official esp-idf/arduino framework? looks like its using latest v5.1.x branch of esp-idf. https://blog.espressif.com/announcing-the-arduino-esp32-core-version-3-0-0-3bf9f24e20d4

What platform are you using? Currently I'm only targeting Arduino, but it would be cool to have pure esp-idf support as well. This is my setup in platformio:

platform = espressif32 board = esp32dev framework = arduino

If you want, lets move this thread to the psychichttp github issues tracker

ninja- commented 10 months ago

I do yes, but it's not compatible with platformio without using a fork.

platform = https://github.com/tasmota/platform-espressif32.git#Arduino/IDF5
framework =
    arduino
    espidf
hoeken commented 10 months ago

If anyone is following this thread, PsychicHttp v1.0 has been released. Feel free to give it a try and let me know your feedback. https://github.com/hoeken/PsychicHTTP/