u-blox / ubxlib

Portable C libraries which provide APIs to build applications with u-blox products and services. Delivered as add-on to existing microcontroller and RTOS SDKs.
Apache License 2.0
287 stars 82 forks source link

uGnssPosGet sometimes fails with U_ERROR_COMMON_NOT_INITIALISED #237

Closed KjartanOli closed 1 month ago

KjartanOli commented 1 month ago

I'm using ESP-IDF 5.2 with a TTGO LORA32 board connected to a ublox ZED-F9P via I2C. I call the following init function at the start of app_main to initialise the GNSS module and then proceed to simply call uGnssPosGet in a loop. Sometimes it simply works, others it returns -2 (U_ERROR_COMMON_NOT_INITIALISED). When it fails pressing the reset button on the TTLooking at the code of uGnssPosGet it appears to me to only return U_ERROR_COMMON_NOT_INITIALISED when gUGnssPrivateMutex == NULL GO usually makes it work again. This is is the part that really confuses me: if something is wrong with my code, why does rebooting the device make it work, if nothing is wrong with my code why does it only work some of the time?

auto init() -> std::int32_t
    {
        const uDeviceCfg_t gDeviceCfg = {
            .deviceType = U_DEVICE_TYPE_GNSS,
            .deviceCfg = {
                .cfgGnss = {
                    .moduleType = U_GNSS_MODULE_TYPE_M9,
                    .pinEnablePower = -1,
                }
            },
            .transportType = U_DEVICE_TRANSPORT_TYPE_I2C,
            .transportCfg = {
                .cfgI2c = {
                    .i2c = 0,
                    .pinSda = 21,
                    .pinScl = 22,
                    .alreadyOpen = false
                }
            }
        };
        auto status = uPortInit();
        ESP_LOGI(TAG.data(), "Port initialised: %ld", status);
        status = uPortI2cInit(); // You only need this if an I2C interface is used
        ESP_LOGI(TAG.data(), "I2C Port initialised: %ld", status);
        status = uDeviceInit();
        ESP_LOGI(TAG.data(), "Device initialised: %ld", status);
        status = uDeviceOpen(&gDeviceCfg, &gnssHandle);
        ESP_LOGI(TAG.data(), "Device opened: %ld", status);
        status = uPortI2cOpen(0, 21, 22, true);
        ESP_LOGI(TAG.data(), "Port opened: %ld", status);
        return status;
    }

Just in case its related: uPortI2cOpen seems to return -5, but that seems to happen regardless of whether uGnssPosGet is working or failing.

RobMeades commented 1 month ago

Hi, and sorry you're having trouble with this. I have to dash away so can't reply properly right now but one thing to get you moving: you don't need to call uPortI2cOpen(); uDeviceOpen() does all of that for you based on the gDeviceCfg you have passed to it. uPortI2cOpen() is likely returning -5 because that I2C port is already open. That shouldn't cause the problem you are seeing, will see if I can figure out what's happening with that later.

RobMeades commented 1 month ago

Looking at the code of uGnssPosGet() it appears to me to only return U_ERROR_COMMON_NOT_INITIALISED when gUGnssPrivateMutex == NULL

Indeed: when the error occurs does it persist, i.e. is the only route to recovery through resetting the MCU? Could you share the code where you call uGnssPosGet() in a loop?

KjartanOli commented 1 month ago

Yes the error persists until the MCU is rebooted. The loop looks like this:

  for(std::size_t i = 0;; ++i) {
        ESP_LOGI(TAG.data(), "Acquiring position");
        radio.transmit(reinterpret_cast<std::uint8_t*>(Frame(id,
                                                                                                                 base,
                                                                                                                 Protocol::reserved,
                                                                                                                 "Acquiring position").serialise().data()),
                                     Frame::serialised_size);
        auto clock = stopwatch();
        // auto pos = position{};
        // auto status = ublox_get_pos(&pos.latitude, &pos.longitude, &pos.altitude, &pos.radius);
        auto pos = ublox::get_position();
        auto elapsed = clock.elapsed();
        if (pos)
            {
                auto buffer = std::array<char, Frame::payload_size>{};
                auto result = std::format_to_n(buffer.begin(), buffer.size(),
                                                                             "Round {}:\nI am here: {:.3},{:.3}\nTime {}\n",
                                                                             i,
                                                                             ((double) pos->latitude) / 10000000,
                                                                             ((double) pos->longitude) / 10000000,
                                                                             elapsed
                                                                             );
                auto frame = Frame(id,
                                                     base,
                                                     Protocol::reserved,
                                                     std::span<const std::byte>(reinterpret_cast<const std::byte*>(buffer.data()),
                                                                                                            result.size
                                                                                                            )
                                                     );
                std::format_to_n(buffer.begin(), buffer.size(),
                                             "src: {}\ndest: {}\npayload: {}",
                                             frame.source(),
                                             frame.destination(),
                                             std::string_view(reinterpret_cast<const char*>(frame.payload().data()), frame.payload().size())
                                             );
                ESP_LOGI(TAG.data(), "%s", buffer.data());
                state = radio.transmit(reinterpret_cast<std::uint8_t*>(frame.serialise().data()), Frame::serialised_size);
                if(state == RADIOLIB_ERR_NONE) {
                    // the packet was successfully transmitted
                    ESP_LOGI(TAG.data(), "success!");

                } else {
                    ESP_LOGI(TAG.data(), "failed, code %d\n", state);
                }
            }
        else
            {
                char buffer[200];
                sprintf(buffer, "failed to get position: %ld", pos.error());
                ESP_LOGI(TAG.data(), "%s", buffer);
                state = radio.transmit(reinterpret_cast<std::uint8_t*>(Frame(id,
                                                                                                                                         base,
                                                                                                                                         Protocol::reserved,
                                                                                                                                         buffer).serialise().data()),
                                                             Frame::serialised_size);
                if(state == RADIOLIB_ERR_NONE) {
                    // the packet was successfully transmitted
                    ESP_LOGI(TAG.data(), "success!");
                } else {
                    ESP_LOGI(TAG.data(), "failed, code %d\n", state);
                }
                fflush(stdout);
            }

        ESP_LOGI(TAG.data(), "sleeping");
        std::this_thread::sleep_for(5s);
  }

with the call to uGnssPosGet inside ublox::get_pos:

namespace ublox 
{
    struct position
    {
        std::int32_t latitude;
        std::int32_t longitude;
        std::int32_t altitude;
        std::int32_t radius;
    }
    auto get_position() -> std::expected<position, status_type>
    {
        auto pos = position{};
        auto status = uGnssPosGet(gnssHandle, &pos.latitude, &pos.longitude, &pos.altitude, &pos.radius, NULL, NULL, NULL, NULL);
        if (status != 0)
            return std::unexpected{status};
        return pos;
    }
};
KjartanOli commented 1 month ago

Everything after the call to ubxlib::get_pos is concerned only with transmitting and logging the location so it shouldn't be relevant

for(std::size_t i = 0;; ++i) {
    ESP_LOGI(TAG.data(), "Acquiring position");
    auto pos = ublox::get_position();
    if (pos)
        {
            auto buffer = std::array<char, Frame::payload_size>{};
            auto result = std::format_to_n(buffer.begin(), buffer.size(),
                                                                         "Round {}:\nI am here: {:.3},{:.3}\nTime {}\n",
                                                                         i,
                                                                         ((double) pos->latitude) / 10000000,
                                                                         ((double) pos->longitude) / 10000000,
                                                                         elapsed
                                                                         );
            ESP_LOGI(TAG.data(), "%s", buffer.data());
        }
    else
        {
            char buffer[200];
            sprintf(buffer, "failed to get position: %ld", pos.error());
            ESP_LOGI(TAG.data(), "%s", buffer);
        }

    ESP_LOGI(TAG.data(), "sleeping");
    std::this_thread::sleep_for(5s);
}

This should be equivalent for this problem.

RobMeades commented 1 month ago

All seems straightforward enough. One thing you might try, if you haven't already, is leaving just the initialisation and the call to get_position() in place but not calling any of the code that processes the results, see if the problem remains, just to "divide and conquer".

KjartanOli commented 1 month ago

I've reduced the loop to:

for(std::size_t i = 0;; ++i) {
        ESP_LOGI(TAG.data(), "Acquiring position");
        auto pos = ublox::get_position();
        if (!pos)
            {
                ESP_LOGI(TAG.data(), "failed to get position: %ld", pos.error());
            }
        else
            {
                ESP_LOGI(TAG.data(), "Got position");
            }

        ESP_LOGI(TAG.data(), "sleeping");
        std::this_thread::sleep_for(5s);
    }

I then did the following:

  1. Flash the new firmware (no issue)
  2. Unplug and re-plug the MCU (issue appears)
  3. Press the reset button on the MCU (issue disappears)
  4. Unplug and replug the MCU (issue reappears)
  5. Press the reset button on the MCU (issue disappears)
mazgch commented 1 month ago

If you are using i2c, and just interrupt the transfer with unplugging power make the i2c bus is released when you restart next time, this can be done by resetting the gnss, or forcing few sck cycles at start.

Maybe you can measure with a logic analyser or read with goios the state of the i2c lines in the locked-up case.

RobMeades commented 1 month ago

@KjartanOli: your description is clear but, understanding why it does what it does, less so. There should be no difference between an MCU reset and resetting the MCU, IYSWIM. Zero-initialised, or statically initialized, memory should be set up in both of those scenarios, whether the MCU was reset via a reset pin or the MCU was started afresh from being powered off and on again.

The code does the same stuff either way.

The failure you see, which suggests that gUGnssPrivateMutex is NULL, would only be systematically introduced if GNSS were deinitialized, which there is no evidence of. That leaves us with something writing zeroes to memory in that area, which is equally unlikely.

Hmph.

RobMeades commented 1 month ago

One more question for my clarification: when the issue appears, how many times around the loop does it take for it to appear?

Something you might try: look in the .map file for where the gUGnssPrivateMutex is being placed in memory and see what is either side of it. Find those variables in the code and hack some checks in which print out their values, to see if they remain as expected while gUGnssPrivateMutex becomes NULL. That, or add some guard variables before and after gUGnssPrivateMutex in the code, verify in the .map file that they really are being placed either side of it in memory, then initialise those variables to known values and extern them from your application so that you can check them.

KjartanOli commented 1 month ago

As far as I can tell the issue appears immediately. The code either works every time or not at all.

Here is the section around gUGnssPrivateMutex in the .map file if it tells you anything. I'll try hacking something with the variables around it.

 .bss.gDeviceI2c
                0x3ffb2b44       0x50 esp-idf/ubxlib/libubxlib.a(u_device_private.c.obj)
 .bss.gI2c      0x3ffb2b94       0x30 esp-idf/ubxlib/libubxlib.a(u_device_private.c.obj)
 .bss.gHeapAllocCount
                0x3ffb2bc4        0x4 esp-idf/ubxlib/libubxlib.a(u_port_heap.c.obj)
 .bss.gUGnssPrivateMutex
                0x3ffb2bc8        0x4 esp-idf/ubxlib/libubxlib.a(u_gnss_private.c.obj)
                0x3ffb2bc8                gUGnssPrivateMutex
 .bss.gpUGnssPrivateInstanceList
                0x3ffb2bcc        0x4 esp-idf/ubxlib/libubxlib.a(u_gnss_private.c.obj)
                0x3ffb2bcc                gpUGnssPrivateInstanceList
 .bss.gStdoutCounter
                0x3ffb2bd0        0x4 esp-idf/ubxlib/libubxlib.a(u_port_debug.c.obj)
                0x3ffb2bd0                gStdoutCounter
KjartanOli commented 1 month ago

Not sure if this is a new development, or if I somehow just never noticed it before, but it looks like when the issue appears uDeviceOpen returns -8 (U_ERROR_COMMON_PLATFORM).

RobMeades commented 1 month ago

Ah, that would cause the error you are seeing and could relate to the issue that @mazgch suggested might be in play: if uDeviceOpen() is unable to talk to the GNSS device at initialisation, e.g. because of an I2C bus problem, uGnssPosGet() would indeed return U_ERROR_COMMON_NOT_INITIALISED. A log of the debug output from ubxlib would confirm this: logging should be on by default (compiled in if U_CFG_ENABLE_LOGGING is 1 and can be run-time switched off and on with uPortLogOff()/uPortLogOn(), will arrive on the normal ESP-IDF terminal output).

Do you have a spare MCU pin you could connect to the RESET_N line of the GNSS chip and you could toggle low and then high again to get the I2C bus into the correct state at start of day?

mazgch commented 1 month ago

if you detect a blocked I2C bus by reading in GPIO mode. either SDA or SCL is held low by a attached the peripherial, isc transfer returns a bus error.

One way to resolve a blocked I2C bus is to force a series of 10 clocked cycles in GPIO mode:

for (int i = 0; i < 10; i ++) { Set SCL input / resistor will pull SCL high Wait 50 us Set SCL output low Wait 50 us }

// Send Stop Set both SCL / SDA low Wait 100 us Set SCL / SDA input / resistor will pull SCL high

Restart normal i2c transations

KjartanOli commented 1 month ago

Ah, that would cause the error you are seeing and could relate to the issue that @mazgch suggested might be in play: if uDeviceOpen() is unable to talk to the GNSS device at initialisation, e.g. because of an I2C bus problem, uGnssPosGet() would indeed return U_ERROR_COMMON_NOT_INITIALISED. A log of the debug output from ubxlib would confirm this: logging should be on by default (compiled in if U_CFG_ENABLE_LOGGING is 1 and can be run-time switched off and on with uPortLogOff()/uPortLogOn(), will arrive on the normal ESP-IDF terminal output).

Here is what I believe is the relevant part of the log.

I (397) main_task: Started on CPU0
I (407) main_task: Calling app_main()
I (407) ublox.cpp: Port initialised: 0
I (407) ublox.cpp: I2C Port initialised: 0
I (407) ublox.cpp: Device initialised: 0
U_GNSS: initialising with ENABLE_POWER pin not connected, transport type I2C.
U_GNSS: sent command b5 62 0a 06 00 00 10 3a.
I (10607) ublox.cpp: Device opened: -8
I (10607) ublox.cpp: GNSS driver initialised: 0

and the corresponding log when the code works:

I (397) main_task: Started on CPU0
I (407) main_task: Calling app_main()
I (407) ublox.cpp: Port initialised: 0
I (407) ublox.cpp: I2C Port initialised: 0
I (407) ublox.cpp: Device initialised: 0
U_GNSS: initialising with ENABLE_POWER pin not connected, transport type I2C.
U_GNSS: sent command b5 62 0a 06 00 00 10 3a.
U_GNSS: decoded UBX response 0x0a 0x06: 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [body 120 byte(s)].
U_GNSS: sent command b5 62 06 04 04 00 00 00 09 00 17 76.
U_GNSS: sent command b5 62 0a 06 00 00 10 3a.
U_GNSS: decoded UBX response 0x0a 0x06: 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [body 120 byte(s)].
I (2617) ublox.cpp: Device opened: 0
I (2627) ublox.cpp: GNSS driver initialised: 0

If I'm reading this correctly the device never responds to the initial command.

Do you have a spare MCU pin you could connect to the RESET_N line of the GNSS chip and you could toggle low and then high again to get the I2C bus into the correct state at start of day?

I do have spare MCP pins, I'll just need to connect one of them. I'm meeting with the more hardware focused people on my team tomorrow to look at this with a logic analyzer like @mazgch suggested. We'll probably connect the RESET_N line at the same time.

RobMeades commented 1 month ago

If I'm reading this correctly the device never responds to the initial command.

Indeed, that is a correct interpretation. Let us know how it goes after you have discussed with the HW-focused types.

KjartanOli commented 1 month ago

I've just come back from meeting the HW people. We attached an oscilloscope to the SDA line and could see the initial command from the MCU on the wire, it received no response. However, we did some testing and found that introducing a 2 second delay (std::this_thread::sleep_for(2s)) before calling ublox::init fixes the issue. Could the ZED-F9P just be slow to start?

RobMeades commented 1 month ago

If ubxlib is aware that the GNSS device needs to be powered-up before use (i.e. if the value of pinEnablePower in the uDeviceCfgGnss_t structure is non-negative), rather than the GNSS device always being powered, then, after ubxlib has set pinEnablePower to the correct state, it will wait for U_GNSS_POWER_UP_TIME_SECONDS (i.e. 2 seconds) before it tries to talk to the GNSS device.

So yes, if the whole board, with the GNSS device on it is being powered up and the MCU on that board is able to talk to the GNSS device in less than 2 seconds from power-up it would be wise to add such a delay.

KjartanOli commented 1 month ago

Connecting the RST pin and setting the appropriate value for pinEnablePower sems to be working perfectly. Thank you for the help.