u-blox / ubxlib

Portable C libraries which provide APIs to build applications with u-blox products and services. Delivered as add-on to existing microcontroller and RTOS SDKs.
Apache License 2.0
310 stars 94 forks source link

ESP32: uCellNetConnectStart - stack overflow in task atCallbacks has been detected #267

Open andagent opened 3 months ago

andagent commented 3 months ago

Hi,

I am facing an issue with uCellNetConnectStart with ESP-IDF on ESP32s3. When using the uCellNetConnectStart. My code looks like this

static const uDeviceCfg_t gDeviceCfg = {
        .version = 0,
        .deviceType = U_DEVICE_TYPE_CELL,
        .deviceCfg = {
            .cfgCell = {
                .version = 0,
                .moduleType = U_CELL_MODULE_TYPE_SARA_R422, // Initialize moduleType member
                .pSimPinCode = 0, /* SIM pin */
                .pinEnablePower = U_CFG_APP_PIN_CELL_ENABLE_POWER,
                .pinPwrOn = U_CFG_APP_PIN_CELL_PWR_ON,
                .pinVInt = U_CFG_APP_PIN_CELL_VINT,
                .pinDtrPowerSaving = U_CFG_APP_PIN_CELL_DTR
            },
        },
        .transportType = U_DEVICE_TRANSPORT_TYPE_UART,
        .transportCfg = {
            .cfgUart = {
                .version = 0,
                .uart = U_CFG_APP_CELL_UART,
                .baudRate = U_CELL_UART_BAUD_RATE,
                .pinTxd = U_CFG_APP_PIN_CELL_TXD,
                .pinRxd = U_CFG_APP_PIN_CELL_RXD,
                .pinCts = U_CFG_APP_PIN_CELL_CTS,
                .pinRts = U_CFG_APP_PIN_CELL_RTS,
                #ifdef U_CFG_APP_UART_PREFIX
                    .pPrefix = U_PORT_STRINGIFY_QUOTED(U_CFG_APP_UART_PREFIX) // Relevant for Linux only
                #else
                    .pPrefix = NULL
                #endif
            },
        },
        .pCfgName = 0
    };
    // NETWORK configuration for cellular
    static const uNetworkCfgCell_t gNetworkCfg = {
        .version = 0,
        .type = U_NETWORK_TYPE_CELL,
        .pApn = "iot.1nce.net",
        .timeoutSeconds = 2400, /*2400*/ /* Connection timeout in seconds */
        .pKeepGoingCallback = 0,
        .pUsername = NULL,
        .pPassword = NULL,
        .authenticationMode = 1,
        .pMccMnc = 0
    };
    static const uNetworkType_t gNetType = U_NETWORK_TYPE_CELL;

    // Init the variables
    uDeviceHandle_t device = NULL;
    int32_t returnCode;

    // Initialization code here
    uPortInit();
    uDeviceInit();

    // Open the device
    returnCode = uDeviceOpen(&gDeviceCfg, &device);
    ESP_LOGI(TAG, "Opened device with return code %ld.", returnCode);

    if (returnCode == 0) {
        ESP_LOGI(TAG, "Bringing up the network..");

        // Disconnect from the network
        uCellNetDisconnect(device, NULL);

        check_and_set_rat_ranks(device, my_rat_rang);   
        // za le en RAT:
        // uCellCfgSetRat(device, U_CELL_NET_RAT_NB1); //Omogocen bo le ta RAT - za testiranje je super!

        // Bring up the network
        // if (uNetworkInterfaceUp(device, gNetType, &gNetworkCfg) != 0) {
        //     ESP_LOGW(TAG, "Unable to bring up the device!\n");
        // }

        if (uCellNetConnectStart(device, gNetworkCfg.pApn, gNetworkCfg.pUsername, gNetworkCfg.pPassword) != 0) {
            ESP_LOGW(TAG, "Unable to bring up the device!\n");
        }

        // Read current RAT
        *currentRat = uCellNetGetActiveRat(device);
        uCellNetGetOperatorStr(device, apn_string, U_CELL_NET_MAX_OPERATOR_LENGTH_BYTES);
        ESP_LOGI(TAG, "Cellular is up!");

        return device;
    } else return NULL;

Terminal logs are showing me this

+CGREG: 2
-1: Search
-1: Search

+CEREG: 2,,,,,,,,
-1: Search
AT+CEREG?

+CEREG: 4,2

OK
-1: Search
AT+CREG?

+CREG: 2,2

OK
-1: Search
AT+CGREG?

+CGREG: 2,4

OK
-1: OoC

+CREG: 5,"2775","00075921",9

9: RegR
U_CELL_NET: Activating context

+CEREG: 5,"2775","75921",9,,,,
9: RegR
AT+CGACT=1,1

OK
AT+CGACT?

+CGACT: 1,1

OK
AT+UPSD=0,0,0

***ERROR*** A stack overflow in task atCallbacks has been detected.

Could you please help and let me know if I am missing something or if there are some issues between the library and the ESP-IDF.

My stack sizes in sdkconfig

CONFIG_ESP_SYSTEM_EVENT_QUEUE_SIZE=32
CONFIG_ESP_SYSTEM_EVENT_TASK_STACK_SIZE=2304
CONFIG_ESP_MAIN_TASK_STACK_SIZE=6114
CONFIG_ESP_MINIMAL_SHARED_STACK_SIZE=2048
m-abubakar commented 3 months ago

Hi @andagent,

uCellNetConnectStart is automatically invoked within uNetworkInterfaceUp, so it's only necessary to set asyncConnect to true in gNetworkCfg.

Please note that in asynchronous mode, the registration status is communicated through a callback function, which can be registered using the uCellNetSetRegistrationStatusCallback API. Since registration may take some time, it does not occur immediately.

Alternatively, you may also opt for non-asynchronous connect mode, in which case don't set the asyncConnect field and make a call to uNetworkInterfaceUp.

andagent commented 3 months ago

Hi @m-abubakar,

sadly after trying this out, I am still getting the same error.

My callback function looks like:

void networkRegistrationCallback(uCellNetRegDomain_t domain, uCellNetStatus_t status, void *pContext)
{
    if (status == U_CELL_NET_STATUS_REGISTERED_HOME || status == U_CELL_NET_STATUS_REGISTERED_ROAMING)
    {
        ESP_LOGI(TAG, "Network registration status changed to %d. Registered!", status);
    }
    else
    {
        ESP_LOGW(TAG, "Network registration status changed to %d. Not yet registered!", status);
    }
}

Modem init looks like this. I added an asyncConnect = 1 param.

static const uDeviceCfg_t gDeviceCfg = {
        .version = 0,
        .deviceType = U_DEVICE_TYPE_CELL,
        .deviceCfg = {
            .cfgCell = {
                .version = 0,
                .moduleType = U_CELL_MODULE_TYPE_SARA_R422, // Initialize moduleType member
                .pSimPinCode = 0,                           /* SIM pin */
                .pinEnablePower = U_CFG_APP_PIN_CELL_ENABLE_POWER,
                .pinPwrOn = U_CFG_APP_PIN_CELL_PWR_ON,
                .pinVInt = U_CFG_APP_PIN_CELL_VINT,
                .pinDtrPowerSaving = U_CFG_APP_PIN_CELL_DTR},
        },
        .transportType = U_DEVICE_TRANSPORT_TYPE_UART,
        .transportCfg = {
            .cfgUart = {
                .version = 0, .uart = U_CFG_APP_CELL_UART, .baudRate = U_CELL_UART_BAUD_RATE, .pinTxd = U_CFG_APP_PIN_CELL_TXD, .pinRxd = U_CFG_APP_PIN_CELL_RXD, .pinCts = U_CFG_APP_PIN_CELL_CTS, .pinRts = U_CFG_APP_PIN_CELL_RTS,
#ifdef U_CFG_APP_UART_PREFIX
                .pPrefix = U_PORT_STRINGIFY_QUOTED(U_CFG_APP_UART_PREFIX) // Relevant for Linux only
#else
                .pPrefix = NULL
#endif
            },
        },
        .pCfgName = 0};
    // NETWORK configuration for cellular
    static const uNetworkCfgCell_t gNetworkCfg = {
        .version = 0,
        .type = U_NETWORK_TYPE_CELL,
        .pApn = "iot.1nce.net",
        .timeoutSeconds = 2400,
        /*2400*/ /* Connection timeout in seconds */
        .pKeepGoingCallback = 0,
        .pUsername = NULL,
        .pPassword = NULL,
        .authenticationMode = 1,
        .asyncConnect = 1,
        .pMccMnc = 0};
    static const uNetworkType_t gNetType = U_NETWORK_TYPE_CELL;

    // Init the variables
    uDeviceHandle_t device = NULL;
    int32_t returnCode;

    // Initialization code here
    uPortInit();
    uDeviceInit();

    // Open the device
    returnCode = uDeviceOpen(&gDeviceCfg, &device);
    ESP_LOGI(TAG, "Opened device with return code %ld.", returnCode);

    if (returnCode == 0)
    {
        ESP_LOGI(TAG, "Bringing up the network..");

        // Disconnect from the network
        uCellNetDisconnect(device, NULL);

        check_and_set_rat_ranks(device, my_rat_rang);
        // za le en RAT:
        // uCellCfgSetRat(device, U_CELL_NET_RAT_NB1); //Omogocen bo le ta RAT - za testiranje je super!

        // Register async network registration status callback
        uCellNetSetRegistrationStatusCallback(device, networkRegistrationCallback, NULL);

        // Bring up the network
        if (uNetworkInterfaceUp(device, gNetType, &gNetworkCfg) != 0) {
            ESP_LOGW(TAG, "Unable to bring up the device!\n");
        }

        // if (uCellNetConnectStart(device, gNetworkCfg.pApn, gNetworkCfg.pUsername, gNetworkCfg.pPassword) != 0) {
        //     ESP_LOGW(TAG, "Unable to bring up the device!\n");
        // }

        // Read current RAT
        *currentRat = uCellNetGetActiveRat(device);
        uCellNetGetOperatorStr(device, apn_string, U_CELL_NET_MAX_OPERATOR_LENGTH_BYTES);
        ESP_LOGI(TAG, "Cellular is up!");

        return device;
    }
    else
        return NULL;

Console outputs:

U_CELL_NET: setting automatic network selection mode...
AT+COPS?

+COPS: 0

OK
AT+CFUN=1

OK
AT+CREG?

+CREG: 2,0

OK
-1: NReg
W (9391) MODEM: Network registration status changed to 1. Not yet registered!
AT+CGREG?

+CGREG: 2,4

OK
-1: OoC
W (9731) MODEM: Network registration status changed to 5. Not yet registered!
AT+CEREG?

+CEREG: 4,0

OK
-1: NReg
W (10071) MODEM: Network registration status changed to 1. Not yet registered!
AT+CREG?

+CREG: 2,0

OK
-1: NReg
W (10411) MODEM: Network registration status changed to 1. Not yet registered!
AT+CGREG?

+CGREG: 2,4

OK
-1: OoC
W (10751) MODEM: Network registration status changed to 5. Not yet registered!
AT+CEREG?

+CEREG: 4,0

OK
-1: NReg
W (11091) MODEM: Network registration status changed to 1. Not yet registered!
AT+CREG?

+CREG: 2,0

OK
-1: NReg
W (11431) MODEM: Network registration status changed to 1. Not yet registered!
AT+CGREG?

+CGREG: 2,4

OK

+CREG: 2

+CGREG: 2
-1: OoC
W (11771) MODEM: Network registration status changed to 5. Not yet registered!
-1: Search
W (11791) MODEM: Network registration status changed to 3. Not yet registered!
-1: Search
W (11791) MODEM: Network registration status changed to 3. Not yet registered!

+CEREG: 2,,,,,,,,
-1: Search
W (11811) MODEM: Network registration status changed to 3. Not yet registered!
AT+CEREG?

+CEREG: 4,2

OK
-1: Search
W (12131) MODEM: Network registration status changed to 3. Not yet registered!
AT+CREG?

+CREG: 2,2

OK
-1: Search
W (12471) MODEM: Network registration status changed to 3. Not yet registered!

+CREG: 5,"2775","0003EF20",9

9: RegR
U_CELL_NET: Activating context

+CEREG: 5,"2775","3EF20",9,,,,
9: RegR
AT+CGACT=1,1

OK
AT+CGACT?

+CGACT: 1,1

OK
AT+UPSD=0,0,0

***ERROR*** A stack overflow in task atCallbacks has been detected.

I also tried increasing main task stack to 7168 but it has no effect.

michaelboeding commented 2 months ago

Hey @andagent , I think you are doing some type of processing in the callback that is causing it to overflow. From my experience you should always immediately move any callbacks from ubxlib into your own task contexts so that the internal tasks do not overflow. So if you are doing any type of processing in the callback without passing it off to its own task it will most likely overflow.

andagent commented 2 months ago

Hi, my callback is fully empty. I tried just doing some logs and also leaving it empty. In both cases it crashes with this error


From: Michael Boeding @.> Sent: Sunday, August 4, 2024 10:27:28 PM To: u-blox/ubxlib @.> Cc: Andrei Morozov @.>; Mention @.> Subject: Re: [u-blox/ubxlib] ESP32: uCellNetConnectStart - stack overflow in task atCallbacks has been detected (Issue #267)

Hey @andagenthttps://github.com/andagent , I think you are doing some type of processing in the callback that is causing it to overflow. From my experience you should always immediately move any callbacks from ubxlib into your own task contexts so that the internal tasks do not overflow. So if you are doing any type of processing in the callback without passing it off to its own task it will most likely overflow.

— Reply to this email directly, view it on GitHubhttps://github.com/u-blox/ubxlib/issues/267#issuecomment-2267659160, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AS2AQBBR63HOWBD3P6YFXRLZP2FDBAVCNFSM6AAAAABLYVT3SWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRXGY2TSMJWGA. You are receiving this because you were mentioned.Message ID: @.***>

michaelboeding commented 2 months ago

Could you share some of your code? I ran into this a long time ago and fixed it by passing it off to another task with a queue.

andagent commented 2 months ago

Of course. Please see it posted here. It has the callback setup, config, and init process.

https://github.com/u-blox/ubxlib/issues/267#issuecomment-2262742936

andagent commented 2 months ago

Just an update. After making a manual change in the library inside u_at_client inside U_AT_CLIENT_CALLBACK_TASK_STACK_SIZE_BYTES, more specifically updating task stack size from 2560 to 2560 * 2 it started to work. I know it is not an ideal solution, however it works. So I guess there is some issue in ubxlib that needs to be fixed

michaelboeding commented 2 months ago

I wonder if it has something to do with your task priority's etc. Maybe the task handling the at callbacks is getting a lot of callbacks but can't service them because another task is blocking. Which would cause the task handling your AT commands to overflow? Just an idea.

m-abubakar commented 2 months ago

Hi @andagent, I’m glad to hear that you’ve found a temporary workaround. I appreciate @michaelboeding's insightful observation, it could be a task prioritization issue. Rest assured, we are actively looking into the matter and are working on a solution. We’ll provide an update and fix as soon as possible.

Thank you.

valcarcexyz commented 2 months ago

Can reproduce same error, increasing stack size seems solve the issue for a while, but ends raising to

valcarcexyz commented 2 months ago

Solved: in my case had a stack initialization in the callback, removing it and passing it through the callback params, seems to solve the issue