me-no-dev / ESPAsyncWebServer

Async Web Server for ESP8266 and ESP32
3.79k stars 1.23k forks source link

Task watchdog got triggered #686

Closed williamesp2015 closed 4 years ago

williamesp2015 commented 4 years ago

I'm using combination of Asyncwebserver+Websocket+CaptivePortal on an ESP32 using Arduino platform on the PlatformIO. When AP conencted to laptop, I send a command to start a series of functions and when finished I get this Watchdog Error.

E (89151) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time: E (89151) task_wdt: - async_tcp (CPU 1) E (89151) task_wdt: Tasks currently running: E (89151) task_wdt: CPU 0: IDLE0 E (89151) task_wdt: CPU 1: IDLE1

IoTThinks commented 4 years ago

Break your functions into small pieces.

Dont let the functions take so much MCU at a time.

stale[bot] commented 4 years ago

[STALE_SET] This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

justoke commented 4 years ago

I have experienced the same issue: E (226998) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time: E (226998) task_wdt: - async_tcp (CPU 0/1) E (226998) task_wdt: Tasks currently running: E (226998) task_wdt: CPU 0: IDLE0 E (226998) task_wdt: CPU 1: IDLE1 E (226998) task_wdt: Aborting. abort() was called at PC 0x400e9693 on core 0

Backtrace: 0x4008c8b0:0x3ffbe170 0x4008cae1:0x3ffbe190 0x400e9693:0x3ffbe1b0 0x40084bed:0x3ffbe1d0 0x401766e7:0x3ffbc790 0x400eaa7e:0x3ffbc7b0 0x4008a7dd:0x3ffbc7d0 0x40088ff9:0x3ffbc7f0

I will try reducing the work load in the functions called in the web socket receive event functions to see if this can be resolved.

stale[bot] commented 4 years ago

[STALE_CLR] This issue has been removed from the stale queue. Please ensure activity to keep it openin the future.

iafilius commented 4 years ago

Hi, a few ideas: for debugging purposes it might be a good idea to disable the wdt (for async_tcp) en then try to find out where it spend all the time.(a hardware debugger might be extremely usefull) Start measuring the duration of your call’s, if you wait synchronous... in the recent past i’d had numerous issues in asynctcp and espasyncwebserver, make sure you use latest upstream versions. You may want to look at a few issues i fixed in asyncevents (SSE), overflooding queue and/or allowing unlimited rentrance of _runqueue(), causing all kinda race conditions. You might for debugging purposes force to use only a single core of your esp32, to avoid most basic smp related race conditions. Hope that helps, not using websockets myself but the async events (SSE). Regards

justoke commented 4 years ago

Thank you for the advice. I'm not familiar with the library beyond using it in general. How is the WDT disabled - is this part of the API or does this require modifying the library code within the project? I do have a hardware debugger. As is often the case with such issues they don't occur always and replicating them is never straight forward.

iafilius commented 4 years ago

Hi @justoke ,

if you look into tcpasync.h you'll see:

//If core is not defined, then we are running in Arduino or PIO
#ifndef CONFIG_ASYNC_TCP_RUNNING_CORE
#define CONFIG_ASYNC_TCP_RUNNING_CORE -1 //any available core
#define CONFIG_ASYNC_TCP_USE_WDT 1 //if enabled, adds between 33us and 200us per event
#endif

define CONFIG_ASYNC_TCP_USE_WDT yourself to be 0 on top of your code.

that doesn't make this error not go away, but at least removes this fixup. Then you will notice the real issue, and can debug/investigate. Forcing code of asynctcp on same CPU as your webserver is running on might hide (a lot of) SMP related issues.

Also you really should decode the backtrace, with a watchdog timeout maybe not that useful, but it might. Decoding: Backtrace: 0x4008c8b0:0x3ffbe170 0x4008cae1:0x3ffbe190 0x400e9693:0x3ffbe1b0 0x40084bed:0x3ffbe1d0 0x401766e7:0x3ffbc790 0x400eaa7e:0x3ffbc7b0 0x4008a7dd:0x3ffbc7d0 0x40088ff9:0x3ffbc7f0

without having decoded this, no-one is going to help you, because there is too few info

But it s only useful as long you did't recompile your project. There are multiple ways of doing the decode: Arduino-gui (addon) via CLI using expressive tools yourself platformio using a monitor plugin . curious to your progress.

Regards,

Arjan

stale[bot] commented 4 years ago

[STALE_SET] This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] commented 4 years ago

[STALE_DEL] This stale issue has been automatically closed. Thank you for your contributions.

Connect-and-Exchange commented 3 years ago

Same problem here. this should not be closed as the funcitons i'm running are <2ms loop time end-to-end. Also the API calls are small. This is a re-occuring problem in the project.

triggered. The following tasks did not reset the watchdog in time: E (188508) task_wdt: - async_tcp (CPU 0/1)

utoConfigWifi is starting up... Wifi init commpleted. Init() LogWriter... LogWriter initializing... LogWriter init is completed. Init() Data structs... Init() OtaFlash... mDNS responder started Init() FileManager... Init() Charger Portal... Init() MqttServer... Init() WebUpdate... Init() Firmware update... Init() charger simulator... Init() Command processor... Init() worker class... Init() App Watchdog... Setup completed... E (188508) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time: E (188508) task_wdt: - async_tcp (CPU 0/1) E (188508) task_wdt: Tasks currently running: E (188508) task_wdt: CPU 0: IDLE0 E (188508) task_wdt: CPU 1: IDLE1 E (188508) task_wdt: Aborting. abort() was called at PC 0x40173f6c on core 0

ELF file SHA256: 0000000000000000

Backtrace: 0x40093eb8:0x3ffbfce0 0x40094131:0x3ffbfd00 0x40173f6c:0x3ffbfd20 0x40090895:0x3ffbfd40 0x4019b617:0x3ffcaab0 0x40175913:0x3ffcaad0 0x400968f1:0x3ffcaaf0 0x40095136:0x3ffcab10

== Back trace ==

Error 0x40173f6c task_wdt_isr 0x40093eb8 invoke_abort 0x40094131 abort 0x40090895 _xt_lowint1 0x4019b617 esp_pm_impl_waiti 0x40175913 esp_vApplicationIdleHook 0x400968f1 prvIdleTask :355 (discriminator 1):::0x40095136:vPortTaskWrapper : ?? ??:0

Connect-and-Exchange commented 2 years ago

Purchased a hardware debugger as i traced this issue back to calling an non-existing web url. When the 404 gets triggered the system crashes. No clue to why...

snipped of the code, which is very short. The ifdef is ignored during compile time - only enabled when running debug versions:


server->onNotFound(notFound);

void MgmtPortalClass::notFound(AsyncWebServerRequest* request)
{
#ifdef DEBUG_MgmtPortal
    LogWriter.ConsoleOutputLn(errorMessage + " http://" + request->host() + request->url());
#endif

    request->send(404, "text/plain", "You are doing it WRONG!");
}

E (262383) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time: E (262383) task_wdt: - async_tcp (CPU 1) E (262383) task_wdt: Tasks currently running: E (262383) task_wdt: CPU 0: IDLE E (262383) task_wdt: CPU 1: IDLE E (262383) task_wdt: Aborting.

abort() was called at PC 0x4011ea1c on core 0

Backtrace:0x40083e7d:0x3ffbec9c |<-CORRUPTED

johnnytolengo commented 2 years ago

Hi all, the same problem here, anyone solved this "Task watchdog got triggered" issue?

SinglWolf commented 2 years ago

Hi all, the same problem here, anyone solved this "Task watchdog got triggered" issue?

Ncrease the size of the _async_queue in AsyncTCP.cpp

_async_queue = xQueueCreate(32, sizeof(lwip_event_packet_t *));

from 32 to 64 or even higher if you can spare the space. It helped me.

45gfg9 commented 2 years ago

I've been annoyed with task async_tcp triggering TWDT for many days. My ESP32-side code is to just stream a file like this


  char pathbuf[64];
  snprintf(pathbuf, 64, "/image/%ld.jpg", id);
  if (!SD.exists(pathbuf)) {
    log_d("file %s does not exist", pathbuf);
    request->send(404);
  } else {
    log_d("streaming file %s", pathbuf);
    request->send(SD, pathbuf, mimeTable[jpg].mimeType);
  }

But my webpage requests six image files in parallel. Yes, I could make my webpages request less frequently but IMO as a "Async Web Server" library this is more of a library issue.

Edit: Increasing _async_queue size really helps.

huster-songtao commented 7 months ago

_async_queue = xQueueCreate(32, sizeof(lwip_event_packet_t *));

It did not help me, the code do not wok well. Task watchdog got triggered for many days too.....................

[WiFi] Stop scanNetworks.

[WiFi] Connect To WiFi with new credentials. [WiFi] Save new Credientials [WiFi] Current Hostname : Wi-Fi Portal ECDA3BBF5CB0 [WiFi] Connecting to Huawei.....E (64739) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time: E (64739) task_wdt: - async_tcp (CPU 0) E (64739) task_wdt: Tasks currently running: E (64739) task_wdt: CPU 0: loopTask E (64739) task_wdt: Aborting.

abort() was called at PC 0x420210cb on core 0 Core 0 register dump: MEPC : 0x40381dc6 RA : 0x40386e34 SP : 0x3fc932f0 GP : 0x3fc8e400
TP : 0x3fc7622c T0 : 0x37363534 T1 : 0x7271706f T2 : 0x33323130
S0/FP : 0x3fc9331c S1 : 0x3fc9331c A0 : 0x3fc93328 A1 : 0x3fc9330a
A2 : 0x00000000 A3 : 0x3fc93355 A4 : 0x00000001 A5 : 0x3fc97000
A6 : 0x7a797877 A7 : 0x76757473 S2 : 0x3fc97624 S3 : 0x7fffffff
S4 : 0x3fc8f7ec S5 : 0x3fc8f7dc S6 : 0x3fc8f7e4 S7 : 0x3fc8f7dc
S8 : 0x00000000 S9 : 0x00000000 S10 : 0x00000000 S11 : 0x00000000
T3 : 0x6e6d6c6b T4 : 0x6a696867 T5 : 0x66656463 T6 : 0x62613938
MSTATUS : 0x00001801 MTVEC : 0x40380001 MCAUSE : 0x00000007 MTVAL : 0x00000000
MHARTID : 0x00000000

Stack memory: 3fc932f0: 0x00000000 0x00000000 0x3fc93308 0x4038c454 0x3fc8f7e4 0x3fc8f7dc 0x3fc80030 0x3fc90a28 3fc93310: 0x3fc9331c 0x3fc90a44 0x3fc93308 0x32303234 0x62633031 0x00000000 0x726f6261 0x20292874 3fc93330: 0x20736177 0x6c6c6163 0x61206465 0x43502074 0x34783020 0x31323032 0x20626330 0x63206e6f 3fc93350: 0x2065726f 0x00000030 0x3c0c0000 0x40ef2484 0x600c2000 0x3c0ce000 0x3fc97000 0x420210ce 3fc93370: 0x00000008 0x00000001 0x0000001d 0x0000fce3 0x3fce0000 0x3fce0000 0x3fca28b8 0x00000000 3fc93390: 0x00000000 0x00000000 0x00000000 0x00000001 0x00001881 0x80000007 0x00000000 0x403801ee 3fc933b0: 0x00000001 0x3fc933b8 0xffffffff 0x3fc9cd7c 0x3fc9cd7c 0x00000001 0x3fc933cc 0xffffffff 3fc933d0: 0x3fc9f580 0x3fc9f580 0x00000000 0x3fc933e0 0xffffffff 0x3fc933e0 0x3fc933e0 0x00000000 3fc933f0: 0x3fc933f4 0xffffffff 0x3fc933f4 0x3fc933f4 0x00000000 0x3fc93408 0xffffffff 0x3fc93408 3fc93410: 0x3fc93408 0x00000000 0x3fc9341c 0xffffffff 0x3fc9341c 0x3fc9341c 0x00000000 0x3fc93430 3fc93430: 0xffffffff 0x3fc93430 0x3fc93430 0x00000000 0x3fc93444 0xffffffff 0x3fc93444 0x3fc93444 3fc93450: 0x00000000 0x3fc93458 0xffffffff 0x3fc93458 0x3fc93458 0x00000000 0x3fc9346c 0xffffffff 3fc93470: 0x3fc9346c 0x3fc9346c 0x00000000 0x3fc93480 0xffffffff 0x3fc93480 0x3fc93480 0x00000000 3fc93490: 0x3fc93494 0xffffffff 0x3fc93494 0x3fc93494 0x00000000 0x3fc934a8 0xffffffff 0x3fc934a8 3fc934b0: 0x3fc934a8 0x00000000 0x3fc934bc 0xffffffff 0x3fc934bc 0x3fc934bc 0x00000000 0x3fc934d0 3fc934d0: 0xffffffff 0x3fc934d0 0x3fc934d0 0x00000000 0x3fc934e4 0xffffffff 0x3fc934e4 0x3fc934e4 3fc934f0: 0x00000000 0x3fc934f8 0xffffffff 0x3fc934f8 0x3fc934f8 0x00000000 0x3fc9350c 0xffffffff 3fc93510: 0x3fc9350c 0x3fc9350c 0x00000000 0x3fc93520 0xffffffff 0x3fc93520 0x3fc93520 0x00000000 3fc93530: 0x3fc93534 0xffffffff 0x3fc93534 0x3fc93534 0x00000000 0x3fc93548 0xffffffff 0x3fc93548 3fc93550: 0x3fc93548 0x00000000 0x3fc9355c 0xffffffff 0x3fc9355c 0x3fc9355c 0x00000000 0x3fc93570 3fc93570: 0xffffffff 0x3fc93570 0x3fc93570 0x00000000 0x3fc93584 0xffffffff 0x3fc93584 0x3fc93584 3fc93590: 0x00000000 0x3fc93598 0xffffffff 0x3fc93598 0x3fc93598 0x00000002 0x3fc935ac 0xffffffff 3fc935b0: 0x3fc9c0bc 0x3fcae7d8 0x00000000 0x3fc935c0 0xffffffff 0x3fc935c0 0x3fc935c0 0x00000000 3fc935d0: 0x3fc935d4 0xffffffff 0x3fc935d4 0x3fc935d4 0x00000004 0x3fc935e8 0xffffffff 0x3fca1914 3fc935f0: 0x3fca49b0 0x00000000 0x3fc935fc 0xffffffff 0x3fc935fc 0x3fc935fc 0x00000000 0x3fc93608 3fc93610: 0x00000000 0x00000000 0x00000000 0x3fc93620 0xffffffff 0x3fc93620 0x3fc93620 0x00000000 3fc93630: 0x3fc93634 0xffffffff 0x3fc93634 0x3fc93634 0x00000001 0x00000001 0x00000000 0x0001ffff 3fc93650: 0x00000000 0xb33fffff 0x00000000 0x00000000 0x3fc9365c 0x00000000 0x00000000 0x00000000 3fc93670: 0x3fc93674 0xffffffff 0x3fc93674 0x3fc93674 0x00000000 0x3fc93688 0xffffffff 0x3fc93688 3fc93690: 0x3fc93688 0x00000001 0x00000001 0x00000000 0x0001ffff 0x00000000 0xb33fffff 0x00000000 3fc936b0: 0x00000009 0x3fc98c84 0x3fc98cec 0x3fc98d54 0x00000000 0x00000000 0x00000001 0x00000000 3fc936d0: 0x00000000 0x00000000 0x420a7240 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000

ELF file SHA256: 3f9ec2be28daff50

Rebooting... ESP-ROM:esp32c3-api1-20210207 Build:Feb 7 2021 rst:0x3 (RTC_SW_SYS_RST),boot:0xf (SPI_FAST_FLASH_BOOT) Saved PC:0x4038199e SPIWP:0xee mode:DIO, clock div:1 load:0x3fcd5810,len:0x438 load:0x403cc710,len:0x90c load:0x403ce710,len:0x25f4 entry 0x403cc710 Serial begin... Board: XIAO_ESP32C3

mediocre9 commented 4 months ago

any solution or workaround to this problem?

mediocre9 commented 3 months ago

I fixed my problem. I was using a delay function, which the documentation says we should not use. I was also using the deprecated firebase-esp32 library, which is synchronous.

For example, in my case firebase.listen() was blocking code which i was using in one of my routes, which caused a watchdog error because it waited for a cloud response. I removed all delay functions from my code and moved the blocking code, like firebase.listen(), to CPU 1. Here is my code:

TaskHandle_t firebaseTask = NULL;

void firebaseListenerTask(void *parameter)
{
        WebServer::firebase.listen();
        vTaskDelay(pdMS_TO_TICKS(10000)); // its a FreeRTOS non blocking delay function 
}

void WebServer::start()
{
    server.begin();

    xTaskCreatePinnedToCore(
        firebaseListenerTask,
        "firebase-listener-task",
        10000,
        NULL,
        1,
        &firebaseTask,
        1);
} 

void WebServer::setupRoutes()
{
    server.on("/", HTTP_GET, handlerGET); // dont use delay or any blocking code
    server.on("/some-route", HTTP_POST, handlerPOST); 
}

You can learn more about freeRTOS and watchdog timers from the below links I found while looking for solutions: Watchdog freeRTOS

45gfg9 commented 3 months ago

I fixed my problem. I was using a delay function, which the documentation says we should not use. I was also using the deprecated firebase-esp32 library, which is synchronous.

For example, in my case firebase.listen() was blocking code which i was using in one of my routes, which caused a watchdog error because it waited for a cloud response. I removed all delay functions from my code and moved the blocking code, like firebase.listen(), to CPU 1. Here is my code:

TaskHandle_t firebaseTask = NULL;

void firebaseListenerTask(void *parameter)
{
        WebServer::firebase.listen();
        vTaskDelay(pdMS_TO_TICKS(10000)); // its a FreeRTOS non blocking delay function 
}

void WebServer::start()
{
    server.begin();

    xTaskCreatePinnedToCore(
        firebaseListenerTask,
        "firebase-listener-task",
        10000,
        NULL,
        1,
        &firebaseTask,
        1);
} 

void WebServer::setupRoutes()
{
    server.on("/", HTTP_GET, handlerGET); // dont use delay or any blocking code
    server.on("/some-route", HTTP_POST, handlerPOST); 
}

You can learn more about freeRTOS and watchdog timers from the below links I found while looking for solutions: Watchdog freeRTOS

No, the delay function is indeed implemented using vTaskDelay, see esp32-hal-misc.c. This is because the Arduino setup / loop itself is implemented as a FreeRTOS task (loopTaskHandle), so delay shouldn't be the problem.

alkonosst commented 3 months ago

Hi all, the same problem here, anyone solved this "Task watchdog got triggered" issue?

Ncrease the size of the _async_queue in AsyncTCP.cpp

_async_queue = xQueueCreate(32, sizeof(lwip_event_packet_t *));

from 32 to 64 or even higher if you can spare the space. It helped me.

Worked for me, thanks! In my code I do a lot of redirections for a Captive Portal, with lazy loading and stuff. This worked fine, and also adjusting the watchdog timer:

if (esp_task_wdt_init(2 * 60, true) != ESP_OK) {
    ESP_LOGE(tag, "Failed to set watchdog timeout");
}