u-blox / ubxlib

Portable C libraries which provide APIs to build applications with u-blox products and services. Delivered as add-on to existing microcontroller and RTOS SDKs.
Apache License 2.0
287 stars 82 forks source link

SARA-R5 Power On and Enable expected hardware connections #234

Closed ws998116 closed 1 month ago

ws998116 commented 1 month ago

Hi, I have a custom board with an ESP32 driving the SARA-R5 and I would like to design the hardware so that ubxlib can control power to the modem correctly. My current connections to PWR_ON and RESET_N are based off of the XPLR-IOT-1 design.

image

.cfgCell = {.moduleType = U_CELL_MODULE_TYPE_SARA_R5,
            .pSimPinCode = NULL,
            .pinEnablePower = 17,
            .pinPwrOn = 18,
            .pinVInt = -1,
            .pinDtrPowerSaving = -1}

My issue is that the modem is not properly powered on with these connections.

U_CELL: initialising with enable power pin 17 (0x11) (where 1 is on), PWR_ON pin 18 (0x12) (and is toggled from 1 to 0) and VInt pin not connected.
I (1418) gpio: GPIO[18]| InputEn: 0| OutputEn: 1| OpenDrain: 1| Pullup: 1| Pulldown: 0| Intr:0
I (1428) gpio: GPIO[17]| InputEn: 1| OutputEn: 1| OpenDrain: 0| Pullup: 0| Pulldown: 0| Intr:0
AT
AT
AT
AT
AT
U_CELL_PWR: powering on.
AT
AT
AT
AT
AT
AT

Now, if I set the Enable pin to -1, then the modem will power up once, but then after a software reset it does not come back up.

Do you see any problems with my set up? Thank you!

RobMeades commented 1 month ago

Hi, and thanks for posting.

I'm no HW guy but, looking at the ubxlib examples repo for XPLR-IOT, the PWR_ON pin is configured as inverted; the PWR_ON pin to a cellular module needs to be pulled low for longer than some time (1 second in the SARA-R5 case) to power the module on while, in the case of the u-blox XPLR-IOT HW at least, the MCU pin needs to be driven high for longer than 1 second in order to effect this. Inversion is invoked by ORing the pin value with U_CELL_PIN_INVERTED (which sets the top bit of the int32_t).

Could that be your issue? I'd guess that Q403/Q404 would end up inverting the sense of the logic?

ws998116 commented 1 month ago

Thanks for the quick response @RobMeades, I think you're on to something 🧐.

I first tried inverting pinPwrOn, but couldn't get a response. Then, I disabled pinEnablePower (= -1) and it's working much better!

.cfgCell = {.moduleType = U_CELL_MODULE_TYPE_SARA_R5,
            .pSimPinCode = NULL,
            .pinEnablePower = -1,
            .pinPwrOn = 18 | U_CELL_PIN_INVERTED,
            .pinVInt = -1,
            .pinDtrPowerSaving = -1}

It seems that there is still one issue, and I can live with it, but I'd love to fix it if possible. When the device software resets, the modem doesn't come on the first time, but when it resets again, then it comes back. Here's a log from that first reset. It looks like the modem is still on from before but then gets turned off when trying to initialize. The random bytes coming through are from the M8 chip (I'm working with a SARA-R510M8S and using both cell and GNSS). These bytes only show up after a software reset like this. It's interesting because the modem doesn't respond to the AT, but it is still reporting from the previous session.

Log

``` U_CELL: initialising with enable power pin not connected, PWR_ON pin 18 (0x12) (and is toggled from 0 to 1) and VInt pin not connected. I (1428) gpio: GPIO[18]| InputEn: 0| OutputEn: 1| OpenDrain: 0| Pullup: 0| Pulldown: 0| Intr:0 [f9][11][ff][c9][b5]b[01][07]\[00]@B[0f][00][df][07][12][00][10]([f0][ff][ff][ff][ff][00][00][00][00][00][00][04][00][00][00][00][00][00][00][00][00][00][00][00][00][98][bd][ff][ff][ff][ff][ff][ff][00]F[85][df][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00] N[00][00][80][a8][12][01][0f]'[00][00][e0]J#[00][00][00][00][00][00][00][00][00]DW[95][f9]AT AT AT AT [f9][11][ff][c9][b5]b[01][07]\[00](F[0f][00][df][07][12][00][10])[f0][ff][ff][ff][ff][00][00][00][00][00][00][04][00][00][00][00][00][00][00][00][00][00][00][00][00][98][bd][ff][ff][ff][ff][ff][ff][00]F[85][df][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00] N[00][00][80][a8][12][01][0f]'[00][00][e0]J#[00][00][00][00][00][00][00][00][00]1u[95][f9]AT [f9][11][ff][c9][b5]b[01][07]\[00][10]J[0f][00][df][07][12][00][10]*[f0][ff][ff][ff][ff][00][00][00][00][00][00][04][00][00][00][00][00][00][00][00][00][00][00][00][00][98][bd][ff][ff][ff][ff][ff][ff][00]F[85][df][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00] N[00][00][80][a8][12][01][0f]'[00][00][e0]J#[00][00][00][00][00][00][00][00][00][1e][93][95][f9][f9][11][ff][c9][b5]b[01][07]\[00][f8]M[0f][00][df][07][12][00][10]+[f0][ff][ff][ff][ff][00][00][00][00][00][00][04][00][00][00][00][00][00][00][00][00][00][00][00][00][98][bd][ff][ff][ff][ff][ff][ff][00]G[85][df][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00] N[00][00][80][a8][12][01][0f]'[00][00][e0]J#[00][00][00][00][00][00][00][00][00][0b][85][95][f9][f9][11][ff][c9][b5]b[01][07]\[00][e0]Q[0f][00][df][07][12][00][10],[f0][ff][ff][ff][ff][00][00][00][00][00][00][04][00][00][00][00][00][00][00][00][00][00][00][00][00][98][bd][ff][ff][ff][ff][ff][ff][00]G[85][df][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00] N[00][00][80][a8][12][01][0f]'[00][00][e0]J#[00][00][00][00][00][00][00][00][00][f8][a3][95][f9]U_CELL_PWR: powering on. [f9][11][ff][c9][b5]b[01][07]\[00][c8]U[0f][00][df][07][12][00][10]-[f0][ff][ff][ff][ff][00][00][00][00][00][00][04][00][00][00][00][00][00][00][00][00][00][00][00][00][98][bd][ff][ff][ff][ff][ff][ff][00]G[85][df][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00] N[00][00][80][a8][12][01][0f]'[00][00][e0]J#[00][00][00][00][00][00][00][00][00][e5][c1][95][f9][f9][05][ff][19] +CREG: 0 Q[f9][f9][11][ff][c9][b5]b[01][07]\[00][b0]Y[0f][00][df][07][12][00][10].[f0][ff][ff][ff][ff][00][00][00][00][00][00][04][00][00][00][00][00][00][00][00][00][00][00][00][00][98][bd][ff][ff][ff][ff][ff][ff][00]H[85][df][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00] N[00][00][80][a8][12][01][0f]'[00][00][e0]J#[00][00][00][00][00][00][00][00][00][d3][0e][95][f9][f9][05][ff][1b] +CEREG: 0 [b2][f9][f9][05][ff][1b] +CGREG: 0 [b2][f9][f9][05][ff]' +UUMQTTC: 0,101 [9f][f9][f9][05][ff][1d] +UUPSDD: 0 V[f9]AT AT AT AT AT [f9][11][ff][c9][b5]b[01][07]\[00][98]][0f][00][df][07][12][00][10]/[f0][ff][ff][ff][ff][00][00][00][00][00][00][04][00][00][00][00][00][00][00][00][00][00][00][00][00][98][bd][ff][ff][ff][ff][ff][ff][00]H[85][df][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00] N[00][00][80][a8][12][01][0f]'[00][00][e0]J#[00][00][00][00][00][00][00][00][00][c0],[95][f9]AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT U_CELL_PWR: powering on, module is alive. ATE0 AT AT AT AT ATE0 I (72718) cellular: Opened device with return code -257. Unable to bring up the device! Done. ```

RobMeades commented 1 month ago

Excellent, definitely progress.

Does your HW have a pinEnablePower, i.e. a pin that controls the state of a switch in the VCC supply to SARA-R5? It is relatively rare for a design to incorporate such a thing but, if you have one, it may simply be a case of getting the sense of that pin the correct way up also. That pin is assumed to be high for "VCC is supplied to SARA-R5", low for "VCC is not supplied to SARA-R5" and the ORing with U_CELL_PIN_INVERTED has the same effect in the API there also, i.e. if the sense of your switch is low for "VCC is supplied to SARA-R5" and high for "VCC is not supplied to SARA-R5" then you should OR your pinEnablePower with U_CELL_PIN_INVERTED for it to work correctly.

You say "when the device software resets": by that I guess you mean the MCU? It looks that way from your log since the SARA-R5 module is still as it was, configured in CMUX mode by ubxlib in order to fetch data from the GNSS device, while the ubxlib code on the MCU is completely unaware of this (i.e. has been reset) and is trying to send AT commands to a SARA-R5 in the normal way, i.e. without CMUX; the two are out of synch.

You need to be sure, somehow, that the two are coordinated: if the MCU is to be reset then uNetworkInterfaceDown() and uDeviceClose() must be called to take the SARA-R5 down in an organised way so that the module is returned to a quiescent state. That or, when the MCU is reset, the module must also be reset, but I would warn against resetting a module in that way: in the extreme it can lead to flash corruption; I have seen this happen only twice in four years of running constant testing of ubxlib code on many, many, SARA-R5 modules, but it can happen and hence our SARA-R5 integration manual advises against it.

ws998116 commented 1 month ago

Thanks for the diagnosis. I think you're spot on.

For the pinEnablePower, I had originally thought that this was meant to control the RESET_N pin on the SARA-R5. It sounds like it is supposed to be a switch controlling VCC to the modem, which I don't have so I'm leaving it set to -1 for now.

You are correct about the "software resets". I am talking about the MCU resetting. I've now tested with the following function to attempt a more graceful reset for the modem.

void restart_system(void)
{
    uMqttClientClose(pContext);
    uNetworkInterfaceDown(devHandle, U_NETWORK_TYPE_GNSS);
    uNetworkInterfaceDown(devHandle, U_NETWORK_TYPE_CELL);
    uDeviceClose(devHandle, true);
    vTaskDelay(pdMS_TO_TICKS(100));
    esp_restart();
}

Now, tell me if there's something wrong here, but when I call this function after all networks are up and running, I get the following assertion.

Assertion

``` +UUMQTTC: 9,1 AT OK U_CELL_MQTT: trying to disconnect... AT+UMQTTC=0 OK U_CELL_MQTT: waiting for response for up to 120 second(s)... +UUMQTTC: 0,1 U_CELL_MQTT: disconnected after 1 second(s). AT+UGPS? +UGPS: 1,15,99 OK AT+UGPS=0 OK U_GNSS: sent command b5 62 06 01 08 00 01 07 00 00 00 00 00 00 17 dc. U_GNSS: sent command b5 62 06 01 08 00 01 07 00 00 00 00 00 00 17 dc. U_GNSS: sent command b5 62 06 01 08 00 01 07 00 00 00 00 00 00 17 dc. AT+UPSV=1,1300 OK assert failed: 0x401eca55 0x401eca55: xTaskPriorityDisinherit at C:/Espressif/frameworks/esp-idf-v5.2.1/components/freertos/FreeRTOS-Kernel/tasks.c:5073 (discriminator 1) ```

So I believe it's only executing uNetworkInterfaceDown(devHandle, U_NETWORK_TYPE_GNSS);, then crashing and resetting. After the reset though, the modem powers up correctly the first time. So again, progress, but something's not quite right. I've included most of the stack trace below. From what I can tell, the assertion is originating from FreeRTOS, but I believe it's due to something in ubxlib. It's definitely possible that something else in my firmware is causing the crash, so let me know what you think.

Stack Trace

``` 0x400812e2: panic_abort at C:/Espressif/frameworks/esp-idf-v5.2.1/components/esp_system/panic.c:472 0x40093149: esp_system_abort at C:/Espressif/frameworks/esp-idf-v5.2.1/components/esp_system/port/esp_system_chip.c:93 0x40098b8f: __assert_func at C:/Espressif/frameworks/esp-idf-v5.2.1/components/newlib/assert.c:40 0x401eca4d: xTaskPriorityDisinherit at C:/Espressif/frameworks/esp-idf-v5.2.1/components/freertos/FreeRTOS-Kernel/tasks.c:5073 (discriminator 1) 0x400937d1: prvCopyDataToQueue at C:/Espressif/frameworks/esp-idf-v5.2.1/components/freertos/FreeRTOS-Kernel/queue.c:2469 0x401eae65: xQueueGenericSend at C:/Espressif/frameworks/esp-idf-v5.2.1/components/freertos/FreeRTOS-Kernel/queue.c:964 0x4011e865: uPortMutexUnlock at C:/Users/Wyatt/Repos/ubxlib/port/platform/esp-idf/src/u_port_os.c:448 0x4011f3dc: unlockNoDataCheck at C:/Users/Wyatt/Repos/ubxlib/common/at_client/src/u_at_client.c:2506 0x40121032: uAtClientUnlock at C:/Users/Wyatt/Repos/ubxlib/common/at_client/src/u_at_client.c:3322 0x4012a9ec: uCellMuxPrivateDisable at C:/Users/Wyatt/Repos/ubxlib/cell/src/u_cell_mux.c:1903 0x4012afbf: uCellMuxDisable at C:/Users/Wyatt/Repos/ubxlib/cell/src/u_cell_mux.c:2117 0x4011defe: setGnssNetworkContext at C:/Users/Wyatt/Repos/ubxlib/common/network/src/u_network_private_gnss.c:119 0x4011e00d: uNetworkPrivateChangeStateGnss at C:/Users/Wyatt/Repos/ubxlib/common/network/src/u_network_private_gnss.c:344 0x4011a385: networkInterfaceChangeState at C:/Users/Wyatt/Repos/ubxlib/common/network/src/u_network.c:103 0x4011a598: uNetworkInterfaceDown at C:/Users/Wyatt/Repos/ubxlib/common/network/src/u_network.c:279 0x400db27a: restart_system at ... 0x400d9fcc: jsonMessageHandler at ... 0x400db0d4: mqttRxCellular_task at ... ```

RobMeades commented 1 month ago

Interesting. The assert seems to be line 5073:

https://github.com/espressif/esp-idf/blob/a322e6bdad4b6675d4597fb2722eea2851ba88cb/components/freertos/FreeRTOS-Kernel/tasks.c#L5069-L5073

...which I believe is complaining that a mutex is being unlocked by a task that does not hold it. From your call-tree, this seems to be occurring when uNetworkInterfaceDown(devHandle, U_NETWORK_TYPE_GNSS) is called; taking the GNSS network interface down when the GNSS device is inside a cellular module will switch out of CMUX mode, which is what we see in the call-tree.

A little earlier, when you called uNetworkInterfaceUp() for U_NETWORK_TYPE_GNSS on the cellular device, the CMUX was brought up and a new AT client was brought into existence to be run on the CMUXed AT channel. All of the configuration of the existing AT client was copied into the new AT client and the previous AT client was left there and left locked (i.e. uAtClientLock() was left called on it) so that we notice if something decides to use it by mistake. When CMUX is taken down the process is reversed and the old AT client is unlocked to become active again, which is what is happening in your call-trace.

So the implication is that, somehow, a different task is calling uNetworkInterfaceDown() to the one that called uNetworkInterfaceUp(). Is that a possible scenario in your application? If so it is not necessarily a wrong thing to do, just not something we had anticipated. Would need to think of the correct fix.

ws998116 commented 1 month ago

Once again, I believe you are spot on. In my case, uNetworkInterfaceUp() is called from an init function in main, which then creates a task (mqttRxCellular_task) to watch for incoming messages after the network is up. To trigger a reboot, I send a message which is received by mqttRxCellular_task where my restart_system() function is called.

I think I could probably change this to work more like the MQTT client example where the entire init function is its own task and handles the messages there. If you think that's better practice, I'll make the switch.

RobMeades commented 1 month ago

If you think that's better practice, I'll make the switch.

Not necessarily better practice but on the other hand, if you are able to do so, it would be a fix under your control :-).

I will have a think about the problem; thing is that, should something accidentally call the wrong AT client, I'm not sure what would happen so I quite like the idea of leaving it locked. On the other hand there is no reason for the ubxlib code to place a restriction on where uNetworkInterfaceUp() or uNetworkInterfaceDown() are called from. In fact, the ubxlib code hasn't, FreeRTOS has, but I guess that's because the way we are using the mutex in this particular scenario stretches FreeRTOS's expectations of how a mutex will be used to breaking point.

For now I will add a note to the descriptions of uNetworkInterfaceUp() and uNetworkInterfaceDown() pointing out the problem and I'll raise an internal ticket to see if we can't find a way of fixing it properly.

ws998116 commented 1 month ago

Sounds good. I should be able to get switched over, and once I test that I will close this issue. Thank you again for all your help! 😁

philwareublox commented 1 month ago

Just to note, we have a Cellular Tracker application written with UBXLIB here: https://github.com/u-blox/ubxlib_evk_applications/tree/main

With this application we start multiple threads to look after the cellular connection and the MQTT messaging and the other application tasks. All application 'tasks' use a queue to communicate to and the main() function just starts those tasks up.

It might give you some hints on how to use multiple tasks/threads for the various cellular controls you need.

You can compile this for linux or windows and it runs against an EVK.

ws998116 commented 1 month ago

Just to note, we have a Cellular Tracker application written with UBXLIB here: https://github.com/u-blox/ubxlib_evk_applications/tree/main

With this application we start multiple threads to look after the cellular connection and the MQTT messaging and the other application tasks. All application 'tasks' use a queue to communicate to and the main() function just starts those tasks up.

It might give you some hints on how to use multiple tasks/threads for the various cellular controls you need.

You can compile this for linux or windows and it runs against an EVK.

Thanks for the tip! Is the repo private? I'm getting a 404.

philwareublox commented 1 month ago

Sorry - I thought this was public, changed now to public.