u-blox / ubxlib

Portable C libraries which provide APIs to build applications with u-blox products and services. Delivered as add-on to existing microcontroller and RTOS SDKs.
Apache License 2.0
287 stars 82 forks source link

***ERROR*** A stack overflow in task eventTask has been detected. #230

Open alexmaron81 opened 2 months ago

alexmaron81 commented 2 months ago

Hi Rob, I get a stack overflow but not always. What is the best way to proceed?

As a reminder, I use ESP32

RobMeades commented 2 months ago

Hi Alex: searching the code for eventTask, that would seem to be the UART callback task:

https://github.com/u-blox/ubxlib/blob/ffa9636be40b22cd5fca48902b6c4320787bc00e/port/platform/esp-idf/src/u_port_uart.c#L514-L519

The usual way to check this would be to hack the code above to add a fixed amount, say 2048 bytes, to the stack requested for that task in the function call (i.e. stackSizeBytes + 2048) and then monitor the "low watermark" of that stack by calling uPortUartEventStackMinFree(), periodically from your application. If you see it go below 2048 bytes then you have found the problem case. It is quite likely to be some particular combination of events that causes a sudden jump in stack usage, rather than a slow reduction; at least that would be my guess.

Once you have the problem case, you can decide whether it is something that can be fixed somehow or whether it is just expected operation, in which case the stack size would need to be increased somewhat at the point that the UART event task is created:

https://github.com/u-blox/ubxlib/blob/ffa9636be40b22cd5fca48902b6c4320787bc00e/common/at_client/src/u_at_client.c#L2839-L2843

This you could do by overriding U_AT_CLIENT_URC_TASK_STACK_SIZE_BYTES at build time.

alexmaron81 commented 2 months ago

uPortUartEventStackMinFree expects a handle. Where can I get this?

RobMeades commented 2 months ago

Ah, yes, sorry, you can get it with something like the following (this code not compiled by me, just to point you in the right direction):


// Assuming that your device handle is in devHandle

uAtClientHandle_t atHandle = NULL;
uAtClientStreamHandle_t stream = U_AT_CLIENT_STREAM_HANDLE_DEFAULTS;

if (uCellAtClientHandleGet(devHandle, &atHandle)) == 0) {
    uAtClientStreamGetExt(atHandle, &stream);
    if (stream.type == U_AT_CLIENT_STREAM_TYPE_UART) {
        // The UART handle of the AT client for the cellular device is now in stream.handle.int32
        printf("Calling uPortUartEventStackMinFree() on UART handle %d gives %d bytes.\n",
               stream.handle.int32,
               uPortUartEventStackMinFree(stream.handle.int32));
    } else {
        printf("AT client handle %p has stream handle type %d (expected %d)!\n",
               atHandle, stream.type, U_AT_CLIENT_STREAM_TYPE_UART);
    }
} else {
    printf("Unable to get AT client handle for device %p!\n", devHandle);
}
alexmaron81 commented 2 months ago

several times at the beginning: Calling uPortUartEventStackMinFree() on UART handle 1 gives 2424 bytes.

and finally several times: Calling uPortUartEventStackMinFree() on UART handle 1 gives 2184 bytes.

RobMeades commented 2 months ago

Getting a bit low but always above 2048, so you haven't reached a stack overflow case yet I think.

alexmaron81 commented 1 month ago

The problem does not always occur. I try to restart the process to possibly catch the overflow.

So far I have also seen the following value: Calling uPortUartEventStackMinFree() on UART handle 1 gives 2088 bytes.

How does it even come about that the value is different?

RobMeades commented 1 month ago

How does it even come about that the value is different?

Timing and different paths through the code will generally cause stack usage to differ between runs. But like I say, I suspect your issue is not so much a grey-scale thing as something that suddenly goes off the rails, a bug of some form.

So you've never seen the stack of the UART event queue task go below 2048 bytes minimum free with the 2048 bytes hacked-in?

RobMeades commented 1 month ago

One thing you could potentially try is setting CONFIG_COMPILER_STACK_CHECK_MODE to something other than CONFIG_COMPILER_STACK_CHECK_MODE_NONE (the default). This will tell the compiler to add additional stack guards on a per-function basis to check for stack breaches that will be checked on entry and exit.

The issue with this is that it will change the run-time behaviour, 'cos of all the checking going on, and that may have other effects, but if you haven't managed to trap the problem yet and you think it should have occurred by now then it may be worth a try.

RobMeades commented 1 month ago

Hi @alexmaron81: how did this go in the end?

alexmaron81 commented 1 month ago

Hi Rob, I still couldn't find the error.