RAWX data gaps in RINEX file (logging issue)

derekpickell commented 3 years ago

Subject of the issue

Using the example data logging RAWX/SFRBX (both versions w/ and w/out callbacks), and without changing the Navigational Frequency (1Hz), the processed RINEX file contains data gaps of random intervals (sometimes 3+ seconds). This is also reflected in the serial monitor when the numRAWX counter occasionally hangs up on a single count for several iterations.

I cut the I2C bridges on both MicroMod and peripheral without any change in performance. There doesn't seem to be any issues with the buffers overflowing, and also no improvement in output consistency when I disable SFRBX messages to try to decrease the message load.

Setup

MicroMod Data Logging Carrier Board MicroMod Artemis Processor SparkFun GPS-RTK-SMA Breakout - ZED-F9P UBLOX L1/L2 Antenna Formatted SanDisk 4GB microsd

Configuration

Attempted using DataLoggingExample3_RXM_SFRBX_and_RAWX and version without callbacks. (Only change was CS needed to be changed to pin 41). Hooked up exactly per instructions with Qwiic connector.

Processed raw .ubx data using RTKLIB and UNAVCO teqc, both outputting RINEX files that come with GNSS data gaps.

As a sanity check, I used Ucenter to log Raw .ubx data, and the sampling interval is consistent. Could this be an issue with the SD card? I see no indication from the buffers that it is getting hung up enough to drop data. Also tried slowing setMeasurementRate() to 10 second intervals, which still have data gaps. Unsure what other root cause it could be at this point... Appreciate the help! First time posting something here.

Snippet from RINEX file: Note in red how the subsequent GNSS data timestamp indicates it arrived 2 seconds after the first.

Serial monitor:

EDIT:

After enabling debug messages, following message occurs during hiccups: "insufficient space available! Data will be lost!". This is surprising to me because the fileBufferSize doesn't approach full. Could the write to SD be hanging up the rest of the program?

And checksum failed (lower level debugging enabled):

PaulZC commented 3 years ago

Hi @derekpickell ,

Thanks for reporting this.

Are you using v2.1.1 of the SparkFun Apollo3 (Artemis) Board package? If so, there is a 'feature' in there which makes I2C communication with the u-blox modules problematic. Please try installing v2.1.0 of Apollo3 via the Board Manager and give it another try. (A fix is coming in v2.2.0.)

If you still see gaps, let me know and I can show you how to disable the internal I2C pull-ups inside the Artemis. That can help reduce I2C bus errors too.

Just for information, you will see multiple SRFBX messages for each RAWX message. The module generates one RAWX message per navigation cycle (1Hz), but will generate multiple SFRBX messages depending on how many satellites are being tracked.

Best wishes, Paul

derekpickell commented 3 years ago

Hi @PaulZC,

I've been running on v2.1.0, so haven't had any luck there. So I guess the next thing to try is disabling the internal I2C pull-ups in the Artemis. Let me know the best way to go about this (I assume this has to do with Wire.setPullups(0); but when I move this statement outside of the #if defined() I get the following error: 'class arduino::MbedI2C' has no member named 'setPullups'.

Appreciate the help!

PaulZC commented 3 years ago

Hi @derekpickell ,

(Apologies. I'm horribly jet-lagged so I hope the following makes sense...)

setPullups disappeared in v2.0 of Apollo3, when we moved to Mbed OS. You can still set the Artemis pull-ups but you need to do it manually. You are using the MicroMod Artemis Processor Board so the code below should work (jet-lag permitting!). Please give setQwiicPullups(0) a try and let me know if you still see checksum errors in the debug messages. Thanks!

Best wishes, Paul

void setQwiicPullups(uint32_t qwiicBusPullUps)
{
  //Change SCL and SDA pull-ups manually using pin_config
  am_hal_gpio_pincfg_t sclPinCfg = g_AM_BSP_GPIO_IOM4_SCL; // The MicroMod Artemis PB uses IOM4 for I2C communication
  am_hal_gpio_pincfg_t sdaPinCfg = g_AM_BSP_GPIO_IOM4_SDA; // Ditto

  if (qwiicBusPullUps == 0)
  {
    sclPinCfg.ePullup = AM_HAL_GPIO_PIN_PULLUP_NONE; // No pull-ups
    sdaPinCfg.ePullup = AM_HAL_GPIO_PIN_PULLUP_NONE;
  }
  else if (qwiicBusPullUps == 1)
  {
    sclPinCfg.ePullup = AM_HAL_GPIO_PIN_PULLUP_1_5K; // Use 1K5 pull-ups
    sdaPinCfg.ePullup = AM_HAL_GPIO_PIN_PULLUP_1_5K;
  }
  else if (qwiicBusPullUps == 6)
  {
    sclPinCfg.ePullup = AM_HAL_GPIO_PIN_PULLUP_6K; // Use 6K pull-ups
    sdaPinCfg.ePullup = AM_HAL_GPIO_PIN_PULLUP_6K;
  }
  else if (qwiicBusPullUps == 12)
  {
    sclPinCfg.ePullup = AM_HAL_GPIO_PIN_PULLUP_12K; // Use 12K pull-ups
    sdaPinCfg.ePullup = AM_HAL_GPIO_PIN_PULLUP_12K;
  }
  else
  {
    sclPinCfg.ePullup = AM_HAL_GPIO_PIN_PULLUP_24K; // Use 24K pull-ups
    sdaPinCfg.ePullup = AM_HAL_GPIO_PIN_PULLUP_24K;
  }

  pin_config(PinName(39), sclPinCfg); // MicroMod Artemis PB uses Pin/Pad 39 for SCL
  pin_config(PinName(40), sdaPinCfg);  // MicroMod Artemis PB uses Pin/Pad 40 for SDA
}

Adapted from:

https://github.com/sparkfun/OpenLog_Artemis/blob/5c10915f0a93f4334c200ecfe226d0db837d4110/Firmware/OpenLog_Artemis/OpenLog_Artemis.ino#L848-L882

PaulZC commented 3 years ago

Hi @derekpickell ,

I have updated the data logging examples so they will run correctly on both Apollo3 v1 and v2. I have included fixes for both the chip select pin and the pull-ups. Please update to v2.0.16 of the library when you have the chance.

I'm going to close this issue as I believe 2.0.16 will correct the gaps you were seeing, but please re-open if you see any more errors.

Best wishes, Paul

derekpickell commented 3 years ago

Hi @PaulZC

Thank you again for all the help. I've been running with your suggested code update for the last 24 hours, and no more "Checksum Failed" error--1Hz RAWX is nearly perfect. However, the following still occurs, albeit very infrequently that still results in the occasional data gap:

"processUBX: buffer overrun detected! activePacketBuffer: 3 maximum_payload_size: 2064 process: memory is already allocated for payloadAuto! Deleting.."

I've updated the to v2.0.16 just now and still seeing this particular problem pop up every 30 minutes of logging or so. Not sure if it's related to i2c communication or not but looking into this issue now.

PaulZC commented 3 years ago

Hi @derekpickell ,

Ah. Now that's interesting... It looks like I might be not allocating enough memory for your RAWX messages. The piece of code which does that is:

https://github.com/sparkfun/SparkFun_u-blox_GNSS_Arduino_Library/blob/068f6f1b77cd6aaeffb8725d6fb21f05b21344a4/src/u-blox_structs.h#L1107-L1108

I guessed that the maximum number of blocks in a single RAWX message would be 64. But the error you are seeing suggests it is actually higher than that. If you are up for a little detective work, please edit src/u-blox_structs.h and increase the 64 to perhaps 80. That should make the error you are seeing go away?

Sincere thanks! Paul

derekpickell commented 3 years ago

@PaulZC,

I've been logging now with a modified src/u-blox_structs.h (UBX_RXM_RAWX_MAX_BLOCKS set to 80) for the last two hours and so far no issues. Given the infrequency of the overrun, I'll continue logging for the rest of the day to ensure consistency and report back here.

derekpickell commented 3 years ago

Hi @PaulZC,

After logging for the last two days, I only got two instances of oversized measurements. Digging a little deeper, it seems like others have encountered numMeas in the 80's for block size. I believe the underlying chip has 92 channels, so using that as a proxy for numMeas, bumping the value up to this should be a conservative upper bound (it's unlikely all 92 channels are used to track satellites).

Interestingly, I also encountered two instances where the sample spacing was reduced from 1s to 0.979 seconds when processing the binary data. For my purposes, this is close enough to the targeted 1Hz RAWX rate, but scratching my head over this so thought I should mention this here...

PaulZC commented 3 years ago

Hi @derekpickell ,

Sincere thanks for digging into this!

There are a couple of conflicting numbers in the u-blox documentation. The ZED-F9P product summary says that the module uses a "184-channel u-blox F9 engine". Whereas the UBX-CFG-GNSS setting in u-center suggests that the maximum number of tracking channels is 60. I went with 64 originally just to provide a a little headroom above 60 and because I personally had never seen the number of RAWX blocks go above the mid-50's. Anyway, that's by the by. I'm quite happy to go with 92 especially if others have seen the blocks reach the 80's.

I will merge your pull request shortly and update the library version to match.

Thanks again, Paul

PaulZC commented 3 years ago

Regarding the 0.979 second sample spacing, was that two single instances? Or two 'groups' or instances?

If it was two single instances, I suspect that was probably due to the module refining its lock on GNSS time? I don't think that's anything to worry about, but thank you for mentioning it.

PaulZC commented 3 years ago

Closed by v2.0.17

adamgarbo commented 3 years ago

Hi folks,

Very interested to just discover this issue. Glad to see it was able to be resolved.

@PaulZC this sounds like it's only an Mbed OS issue but to confirm, can this affect v1.2.3 of the Apollo3 Core? I have approximately 50 days (300+ hours) of RAWX/SFRBX data collected from two systems deployed this summer but I haven't looked that closely at whether there were gaps in the RINEX files.

Cheers, Adam

PaulZC commented 3 years ago

Hi Adam (@adamgarbo ),

This was really two issues in one. The first was pull-up related but fixing that revealed the second - that having UBX_RXM_RAWX_MAX_BLOCKS set to 64 was causing occasional loss of data too. The pull-up issue was specific to Apollo3 v2, but the second could affect all platforms. You would only see the data loss if the module was reporting RAWX data for more than 64 individual satellite signals. If you were logging dual-band data from all constellations with a good view of the sky, then, yes, you may see this in your data. The data loss would appear as gaps in your data, missing RAWX messages, not 'bad' data. And it would only apply if you were using "auto" RAWX ( setAutoRXMRAWX with or without callbacks), or if you were using polling but had packetCfgPayloadSize set to around 2KBytes.

I was trying to get my head around whether this might be responsible for issue #48. If you have time, maybe you could repeat your data logging test with v2.0.17 of the library? Running it on Apollo3 v1.2.3 is perfectly OK. It would be really interesting to see if any of those 'gaps' disappear.

Sincere thanks, Paul

adamgarbo commented 3 years ago

Hi Paul,

Thanks for the clarification! While my systems do have setAutoRXMRAWX set as True, I've only enabled the GPS and GLONASS constellations. A quick glance at the RINEX files reveals that there's approximately about 17-20 individual satellites being recorded at this particular site. I don't know if this corresponds direction to the number of channels being used, but I've still got a prototype system with me that I could use to try and set up a test with (all other system are deployed in the Arctic).

Cheers, Adam

PaulZC commented 3 years ago

Hi Adam,

If my understanding is correct, the module generates RAWX data for each individual satellite signal being received. With dual-band, you get two RAWX blocks per satellite. So, with dual-frequency GPS+GLONASS you should fine. I don't think you'll get anywhere near the 64 limit. I don't have a perfect view of the sky, but I've not seen the number here in the UK go any higher than the mid 50's for dual-band using all constellations. The number will be higher for areas that have SBAS / QZSS etc..

All the best, Paul

sparkfun / SparkFun_u-blox_GNSS_Arduino_Library