Closed Stefal closed 1 year ago
new test with the firmware v3.0: The test was short, but I didn't have multiple ubx files. However there are some holes in the log
Idem on RTK Facet with firmware 3.2
@Stefal - We found a fixed a particularly bad bug on May 18th that was causing issues with corrupt log file names and unexpected resets. This bug was introduce on Feb 28 so v3.1 should have been good, while v3.2 and v3.3 have this bug.
Can you share some of the sections of the log where the RTK Fix is lost? Can you tell if the break is due to a system reset or does the system simply lose RTK Fix and then re-gain it?
Same with a RTK Express with 3.2 firmware. Tracks are divided in numerous different files. Sometime, I can observe device reboot. I lost several tracks: while RTK was active and used for Mapillary, I can't find any file on the SD card.
Foot walk with RTK Express recording NMEA+UBX at 1 Hz.
Can you share some of the sections of the log where the RTK Fix is lost? Can you tell if the break is due to a system reset or does the system simply lose RTK Fix and then re-gain it?
I will wait for a new release with the various fixes you have mentionned, and redo some tests. I think that parts of my problems are related to the 5hz gnss frequency that the bluetooth stream can't handle. I forgot that log to file and bluetooth settings are the same, contrary to my old logger in which the stream are independant (1 or 2hz for bluetooth, and 5hz or more with many ubx messages for log to SD card).
Test with the 3.4RC (RTK Firmware: v3.4-May 25 2023) :
Result:
I will share the logs with you in private (email)
I've replicated the issue. An overnight run produced ~30 logs. The unit is closing old logs and opening new ones because no new data is being received for more than 5 seconds:
I haven't yet figured out why the ESP32 thinks the ZED is not outputting data. TBD!
Serial bytes are all being read correctly even when the file size is not increasing.
This would seem to indicate the validation parser is not passing data on to the SD logging routine. I would expect the parser to throw a few sentences out across multiple hours. The fact it's throwing out multiple seconds worth of data is wrong.
tldr; The release candidate should have some major improvements. Please load v3.5RC when possible and report back.
As usual, thanks for reporting this and sorry for the data corruptions!
To replicate this issue:
Above: This is how the system works. The ESP32 takes in NMEA from the ZED transmits that NMEA to the phone over BT. Everything works fine, there are no NMEA or UBX sentence corruptions (ie, no data is dropped).
Above: If we start transmitting RTCM/NTRIP corrections from the phone, back to the ESP32, and back to the ZED, we begin to see bytes dropped (from the ZED-ESP NMEA communication).
Above: Interestingly, if we suppress the transmission of serial from the ESP32 to the ZED (commenting out lines in the Task.ino btReadTask code), there is no data corruption. It's therefore presumed that the BT stack is not interfering with the ESP32 UART2 (it's not an interrupt issue).
Above: Additionally, I verified the ZED was correctly outputting serial at all times. It is. The ESP32 is dropping characters out of the UART.
Things we’ve tried:
Nothing above decreased the number of corrupted sentences coming from the ZED when the ESP32 is writing data.
What is especially telling is the reduction of the UART speed. If we are slowing down the UART speed, all interrupts and tasks have a lot more time to operate before data is lost, yet we still lose data.
I believe there is a very low level issue with the esp32-idf that causes the UART FIFO to misbehave when we are simultaneously reading and writing to it. I am not able to figure it out at this time. However, there is a simple fix:
Above: We can push the RTCM over I2C rather than UART, thus avoiding a serial2.write() while we are doing a serial2.read().
Originally I was seeing corruptions every ~10s or so. With the fix in place, I have not seen a corrupted message at 10Hz, with 10 messages turned on, in 15 minutes.
There is still work to do:
Thanks Nathan,
This RC is uploaded on my express. I will do a quick test tomorrow evening.
Tests results with the 3.5RC, during about 1,5 hour :
At least 90% of what you see in the screenshot should have been green (RTK Fix). (5hz and 4Hz, NMEA + NAV-PVT and NAV-HPPOSLLH)
Next tests : default settings with "automotive" dynamic model. At least 90% of what you see should have been green. Blue is dgps, orange is 3D and yellow is for Float.
The conclusion : it's a failure.
Good points :
It looks like the F9P doesn't always received the rtcm data stream.
Could you swap RTCM and UBX/NMEA ?
Following your investigations, I made new tests.
RTK Express Firmware 3.0
Dynamic : automotive UBX+NMEA at 4 Hz
Rover static on balcony Android smartphone connected to caster with wifi+FTTH. Start up 1 hour recording Shut down 4 files I first checked integrity with UBX_Integrity_Checker (related to https://github.com/sparkfun/SparkFun_RTK_Firmware/issues/345):
-No connection : Integrity OK. -BT connection with SW_Maps, NMEA reading, no RCTM sent: Integrity OK. -BT connection with Lefebure, NMEA + RCTM : 15 resyncs -BT connection with SW_Maps, NMEA + RCTM : 24 resyncs
For the last two files, I also checked the age of differentials:
It is still unstable, like in: https://github.com/sparkfun/SparkFun_RTK_Firmware/issues/359
I was able to replicate Stefal's report: RTK lock would be lost after a brief amount of time (less than 10 minutes).
@PaulZC pointed out the flaw in my solution. We can't push single bytes to the ZED over I2C. Additionally, we were not servicing the Bluetooth buffer before characters were dropped leading to 'RX Full! Discarding X bytes' errors (BT RX buffer is 512 bytes, I2C comm was blocking). A buffer has been implemented. I've had a solid RTK lock for the last hour.
Sorry for the hassle! Please give v3.5 RC June 19th a try.
Tests with the v3.5 RC June 19th:
I don't see any improvement.
@nseidle As you have good results with the latest RC, I had some concerns about my hardware. So I powered up back my old logger. I drove on the same road, with the same antenna and using the same CORS.
Here is the result, more than 96% of RTK Fix
How did you feed the RTCM back to the ZED without a Bluetooth connection?
How did you feed the RTCM back to the ZED without a Bluetooth connection?
I don't understand. I always had a bluetooth connection.
Edit : Ho ! Got it !
My logger is a small fork of the one from @PaulZC. I have added a bluetooth module connected on the ZED UART2
Ah, ok, thanks! I was just looking at the photos and was confused.
Can you walk me through the steps you use to process the ubx files through rtklib?
Rtklib (only rtkplot) is only used to display the trace and get % stats for single/float/fix:
I can open the same pos file in josm, the desktop app to edit OpenStreetMap. I like this way because I use it everyday for Osm contribution, and for all the aerial image sources available :
Excellent! Thank you. I captured a route in my car but did not have the extra msgs turned on. I'll capture another tomorrow.
This morning I spent some times with U-Center on other ubx messages, like MON-MSGPP, MON-COMMS, RXM-COR, RXM-RTCM. I noticed one thing : With MON-MSGPP, if I compare the rtcm message count during 1 minute on my old logger and on the RTK Express, I got:
I'm too busy to go deeper, but it looks like an interesting point.
Thank you Stefal! This is very helpful.
Below is a ~10 minute drive with a baseline of 11.5km (7 miles) using an RTK Express, running v3.5 (release version), and a u-blox magnetic mount antenna. Seems fine to me.
Additionally, your comments about RTCM msg/min had me go looking.
Above is a 5 minute capture via u-center RXM-RTCM. While the ZED should have received 336 messages, a few of the sentence types (1124 in particular) are getting corrupted via I2C (~10% worst case). I would not expect a loss of 10% of a particular message to cause the ZED to drop to RTK Float. I've seen the ZED maintain RTK fix for nearly 30 seconds without RTCM so a lost message, very periodically should not be causing your issue. The mystery continues!
I cannot yet explain why you're dropping so much RTCM via Bluetooth but we'll keep working on it. I think there's a few more spots I can improve.
The base station you've used seems to not send many msm messages. It could explain why we got so different results.
Here are the results with my own base station and its default settings, after a 5 minutes capture. RTCM messages: 1004,1005(10),1006,1008(10),1012,1019,1020,1033(10),1042,1045,1046,1077,1087,1097,1107,1127,1230
RTK Express with 3.5 release
My old logger
1129 vs 2710 messages, that's a big difference !
If you want to test, you can use my own base (mount point EMONIE on caster.centipede.fr:2101)
Ah, well that is a lot! That would explain it. The Bluetooth connection is getting overwhelmed.
I think I've got a solution for you. I just pushed v3.6RC. Please install and then use the debug menu to increase the SPP RX buffer to 2048 (shown below):
I was able to get ~5 minutes of RTCM from your mount point without loosing anything.
As a bit of history (and notes to myself): Prior to Aug 17, 2022 the firmware had a larger SPP RX buffer of 2048 bytes. So since version v2.4, the firmware has been using a smaller 512 byte Bluetooth SPP RX buffer. If you've been using this mount point for the last year, I suspect this Bluetooth overrun issue has been causing problems for you for some time. My apologies. I did not know there were mount points out there with this much data being transmitted.
I made new tests. On balcony with centipede base 10 km away. Dynamic : vehicle. 4 Hz. 30 mn acquisition. Firmware 3.2 Integrity check showed 6 resync.
Firmware 3.6RC Integrity check: no error.
So, the two problems that I mentioned earlier are gone.
I think I've got a solution for you. I just pushed v3.6RC. Please install and then use the debug menu to increase the SPP RX buffer to 2048 (shown below):
This my result with a 48km log. It is much better ! Thanks !
I'll do some other tests later, maybe next week.
New test with a 100 + 100km trip. Everything seem right:
Excellent! Thanks for testing. I'll close this once we do a full release.
Thanks for working through this issue with us. I'm glad we got it fixed! This has been added to the latest v3.6 release.
Subject of the issue
We are using some RTK Express to log data when we walk or drive. Instead of one file for each trip, we have a lot of files, drops inside the files, and too much float or single fix.
Your workbench
Steps to reproduce
Expected behavior
Actual behavior