projecthorus / sondehub-tracker

🎈 Frontend for SondeHub Radiosonde Tracking
https://v2.sondehub.org
MIT License
59 stars 25 forks source link

Uploaded telemetry seems to be discarded #277

Closed Eshco93 closed 1 year ago

Eshco93 commented 1 year ago

Hello,

I'm currently in the process of developing an extension for dxlAPRS based radiosonde receiver stations that will allow uploading telemetry to SondeHub.

Uploading the station position is already working fine - I'm able to see my station on the map. But uploading the received radiosonde telemetry data doesn't seem to work that well. My uploaded telemetry data should be in compliance with the SondeHub Telemetry Format and I also do get a response with a status code 200 - which should mean that the telemetry data was successfully saved into the database.

However, when I'm watching the data of a nearby radiosonde (that is also received and uploaded by my station) on SondeHub, I don't see that any of the displayed frames originates from my receiver station. It seems like the telemetry data uploaded by my station is somehow discarded.

I wonder, might you be able to tell me why my uploaded telemetry data is discarded? Then I would be able to fix the issue. My callsign is "SIMON2-14".

Otherwise, is there any way that I can investigate the issue myself?

Thank you and greetings from germany, Simon

darksidelemm commented 1 year ago

If there's lots of stations receiving a sonde, it can be sometimes difficult to see your particular callsign show up for a frame of telemetry.

I'd suggest using pysondehub ( https://github.com/projecthorus/pysondehub/ ) to look at the raw stream of telemetry for a particular sonde. e.g. once installed:

sondehub --serial "T1234567"

This will output lines of JSON, which can be filtered using grep for your callsign.

Interesting to see dxlAPRS uploading to SondeHub! I thought that was something that would never happen...

Eshco93 commented 1 year ago

Thank you for your quick response. I hope that I can sort out my current issues and dxlAPRS uploading to SondeHub will soon be possible. And I hope that a lot of dxlAPRS users are interested in using my extension. That would be awesome :-)

You are right, once a sonde has reached a decent altitude, it will be received by a lot of stations and it might be unlikely that my callsign will show up on SondeHub. But the sonde that I was watching is launched like 60 km away from my station. With my receiver station I'm able to receive this sonde already at a very low altitude, like 500 meters or so. At this altitude, there are only 2-3 other stations that are also able to receive the sonde. I think at that point, I should clearly be able to see my callsign show up on SondeHub. That's why I think my telemetry data get's somehow discarded.

Thanks for your suggestion of using pysondehub. I'll test that. Does that show the raw data before you are running your validity checks? And is there any way to check whether a packet was deemed valid or invalid?

Oh, and are there any rules for the "software_name" and "software_version"? Like only certain softwares are allowed to upload to SondeHub? I just named my extension "SondeHubUploader", assuming that any name would be allowed.

darksidelemm commented 1 year ago

The software_name and software_version fields are free-text. I would suggest that since it's an extension to dxlAPRS, that you include 'dxlAPRS' in the name. I'd also include the dxlAPRS version in the version number somehow.

pysondehub (which just listens to the same websockets feed that the tracker uses) only shows data that has passed all validity checks. If your data has not passed the checks, then you should get a non-200 response, e.g. 201 or 202. There should also be text in the response indicating what the issue is.

Eshco93 commented 1 year ago

Yes, the name "SondeHubUploader" is only temporary. Something I quickly came up with during development since I had to give the thing some kind of name. The final name will surely somehow include "dxlAPRS". But adding the version might be difficult since dxlAPRS is more like a set of indipendent tools, rather than a single piece of software. I'll think about that later.

I'll take a look at the raw data and see if that helps. But I don't think I'll have time for that until next weekend. I'll get back to you once I have some news on the issue. Thanks!

TheSkorm commented 1 year ago

I'm not seeing any data come through for SondeHubUploader. Are you able to provide a log of the request being made and the response from the server (this way we can try submitting the data and seeing if there is any issue).

image
Eshco93 commented 1 year ago

Here is a log for you. It shows the following things (in this order):

  1. Startup message of my extension, which shows a few of the config parameters of the extension (callsign, user position, ...). This shouldn't be that relevant for you.
  2. Log of the station position upload to SondeHub (which works just fine, like I said)
  3. Debug message of my extension, that confirms the station upload was successful, because status code 200 was returned ("Station Upload: Upload successful")
  4. Debug message that shows the content of a new telemetry package that was received by dxlAPRS
  5. The JSON telemetry data in SondeHub telemetry format that was assembled from the previously received telemetry package from dxlAPRS (in plain text, before gzip compression for uploading)
  6. Log of the telemetry data upload to SondeHub (telemetry data from 5. now compressed)
  7. Debug message of my extension, that confirms the telemetry upload was successful, because status code 200 was returned ("Sonde Upload: Upload successful")

I guess point 5 and 6 should be what you are interested in. Please let me know if this log doesn't help or if you need something different. I'll do my best to provide everything in order to get this problem sorted out :-)

Greetings, Simon

log.txt

darksidelemm commented 1 year ago

Can try putting the telemetry object into an array? e.g. [{telemetry}]

We prefer stations to upload multiple telemetry packets at once (auto_rx uploads every 15 seconds, with whatever it has received in those last 15 seconds), as this helps with the compression.

Eshco93 commented 1 year ago

Sure I can. I've done that now. By default my extension uploads every 30 seconds, but for testing purposes I've set the upload rate to 5 seconds for now. Right now I'm just uploading single telemetry packages, but I'll change the implementation and make sure that all telemetry packages that were received in the time interval are uploaded at once.

TheSkorm commented 1 year ago

Looks like we don't have a check for when a non array of data is sent and it'll return 200 with a null body even though no data was processed. We'll get this looked into fixing.

The same data in an array will result in a 200 with "^v^ telm logged" message.

LukePrior commented 1 year ago

The data seems to be getting uploaded now from looking at ElasticSearch (every 3rd packet it seems).

LukePrior commented 1 year ago

The data doesn't seem to align great but I'm presuming that's just due to differences of dxlAPRS. I've attached a JSON snippet of uploaded data from various programs below.

sondedata.txt

darksidelemm commented 1 year ago

I don't know how well dxlAPRS has been kept up to date with the more recent improvements to RS41 PTU decoding. There appear to be multiple forks of it too, some closed source, and I don't know if the ones that are on github are the ones corresponding to this data.

I know the RSSI value from dxlAPRS will not reflect anything approaching reality, as it's going to be coming out of an uncalibrated RTLSDR. We currently assume that a RSSI value is in dBm, as that's what we get out of the rdz_ttgo_sonde units, and it's not too far removed from reality.

darksidelemm commented 1 year ago

OK, we are seeing a few problems with the telemetry coming from your uploader.

Here's a comparison, from a single frame in the data Luke posted: Field dxlAPRS radiosonde_auto_rx rdz_ttgo_sonde
Frame Number 6197 6197 6197
Datetime 2023-01-21T00:02:41.266692Z 2023-01-21T00:02:57.001000Z 2023-01-21T00:02:57.000Z
Latitude 53.132778 53.12635 53.12635
Longitude 7.550556 7.55038 7.55038
Altitude 24039.58 24039.5609 24039.56055
Vel-V 4.9 4.97849 4.79849
Vel-H 58.1 57.35497 57.35497
Heading 79 78.11584 78.11584
Temperature -63.4 -63.4 -63.4
Humidity 6.2 1.9 2
Pressure 25.7 25.76 25.76

Time

For RS41s we've standardised on the convention of using GPS time, as this is what's being sent by the RS41. The leap second value is not sent in the telemetry, so we don't apply that offset. I see in the dxlAPRS code there is a hard-coded leap second offset of 18 seconds. We elected not to do this, to avoid issues if/when another leap second is added.

All the data on Sondehub is organised via timestamp. We need this to line up else there will be all sorts of weirdness on the map.

For the rest of the data, I'm comparing based on frame number to ensure we're comparing the same frame.

Position

I'm not sure why the dxlAPRS reported position is so different from auto_rx and rdz_ttgosonde. The difference works out to about 700m, which is substantial. I know that auto_rx and rdz use the same code to convert from ECEF coordinates to lat/lon/altitude, so dxlAPRS must be doing something different.

The dxlAPRS code is here: https://github.com/oe5hpm/dxlAPRS/blob/master/src/sondemod.c#L3063

The auto_rx / rdz code is here: https://github.com/projecthorus/radiosonde_auto_rx/blob/master/demod/mod/rs41mod.c#L949

Vel-V / Vel-H / Heading

The differences here are likely related to the differences in lat/lon, as the conversion from the X/Y/Z data provided by the RS41 to horizontal/vertical velocity and heading requires the latitude and longitudes. These differences are less important than the difference in lat/lon.

Humidity / Pressure

These are likely due to the use of different numbers of correction coefficients. I'm not sure how many coefficients the different codebases use. These differences are less important.

So... the differences in time and position really need to be resolved before we accept dxlAPRS data into the SondeHub database. I'm extremely surprised the position difference is so significant, and that this hasn't been noticed previously.

TheSkorm commented 1 year ago

image some graphics demonstrating the issue. Maybe a floating point precision issue?

Eshco93 commented 1 year ago

First of all: Thank you all for investing so much time and effort into investigating these issues. That's awesome! And now to the current problems...

RSSI

So far I haven't really looked into the RSSI value that dxlAPRS provides. I saw that dxlAPRS is able to provide an RSSI value and that SondeHub accepts RSSI values as well. That's why I included this value. I didn't check whether the provided RSSI value is actually meaningful. Since the provided value doesn't seem to make any sense at all, I'll just stop including it in the telemetry packages that will be uploaded to SondeHub. RSSI isn't that important anyway.

Time

I wasn't aware that there is a hard-coded leap second offset in dxlAPRS. I didn't really check that. I just assumed that the transmitted time would be the GPS time of the radiosonde, without any sort of offset correction. But that's not a big issue. I'll just add an extra 18 seconds to even out the offset that is added by dxlAPRS. By the way, I think you made a little mistake in the datetime row in your comparison table. The datetime listed for dxlAPRS (2023-01-21T00:02:41.266692Z) is not the datetime that I uploaded with frame 6197. In fact that's the "time_received" from that package. The datetime from that package is 2023-01-21T00:02:39.000000Z. So there we have the offset of exactly 18 seconds :-)

Position

Luckily that's not due to a difference in the position calculation of dxlAPRS. That would have been a very annoying problem. That's just a bug in my extension. dxlAPRS provides the position in degree and minutes. And for uploading to SondeHub I'm obviously converting that into degrees. Unfortunately, I mistakenly thought that dxlAPRS would provide the position in degrees, minutes and seconds. I mistook the decimal place of the minute for being the seconds. That also explains why the position error is close to zero every once in a while and than gradually builds up (as can be seen in the graphics that TheSkorm provided). I will fix that.

Vel-V / Vel-H / Heading

Vel-V and Heading are exactly what I get from dxlAPRS. Vel-H is provided in knots and I'm converting that to m/s with two decimal places. So maybe I can add some precision to the Vel-H value. The same applies to the altitude that is provided in feet and converted to meters with two decimal places. But other than that, this is the best I can do. But I hope that these values are good enough for you, since the differences are relatively small and since there is no easy way to improve accuracy (without messing with the dxlAPRS code).

Humidity / Pressure

I might have a look at how these values are calculated and which correction coefficients are used. Maybe I can adjust the calculation so that the values provided by my extension match with those from auto_rx/rdz_ttgo_sonde. But for now I hope that these differences are also acceptable.

I'll get back to you as soon as those issues with the time and the position are fixed. Again, thanks a lot!

mwheeler-ep commented 1 year ago

I would probably avoid hard coding in 18 seconds incase that ever changes - however maybe it won't be a big issue in the future https://www.theregister.com/2022/11/22/leap_seconds_discontinued/

Eshco93 commented 1 year ago

You are right, hardcoding stuff like that usually isn't the most elegant and consistent way. But I think I don't want the number of leap seconds explicitly user configurable. I don't want any user to mess around with that setting. Also, since my extension is python-based, hardcoding the leap seconds doesn't mean that much. With Python being an interpreted language, this value can still be changed with almost zero effort.

rs1729 commented 1 year ago

The problem is that even if dxlAPRS and your extension is updated, if another leap second is added, you don't know if the user keeps dxlAPRS and/or your extension up-to-date. But if there won't be any more leap seconds, that would be good news.

Eshco93 commented 1 year ago

Yes, I'm aware of that. In my previous comment I just assumed that this was obvious and therefore just focused on whether to hardcode this or make it an explicit and documented user setting.

Unfortunately there is no easy way to generally avoid this issue. The only way that I can think of would be some kind of leap second estimation that is executed directly after running the extension and before uploading anything to SondeHub. It would do something like the following:

  1. Wait until a frame is received (any frame, doesn't really matter)
  2. Try to pull telemetry data for the same framenumber from SondeHub DB (uploaded by some other user)
  3. Calculate the difference between both datetimes
  4. Set the number of leap seconds to the difference calculated in step 4

But I'm a bit skeptical about that. This seems like a very complex solution to a problem that may never occur. And this solution might also turn out to be error-prone. So in summary: I'm not a big fan of doing that.

But I will definetly mention/explain this issue in the documentation for my extension.

darksidelemm commented 1 year ago

Which dxlAPRS version is the one you're interfacing with? Is it the one on github here: https://github.com/oe5hpm/dxlAPRS or some other fork?

Eshco93 commented 1 year ago

Yes, that's the one.

Oh, and by the way... I fixed the issues and it's working now :-)

2023-01-22_1_00

LukePrior commented 1 year ago

That seems to have improved it but the location still appears to be up to ~20m away from that reported by other programs.

https://grafana.v2.sondehub.org/d/ph5PpcoVz/compare-software?orgId=1&from=1674341106819&to=1674347361428&var-Serial=U1460256

image

Field dxlAPRS rdz_ttgo_sonde radiosonde_auto_rx
Frame Number 7922 7922 7922
Datetime 2023-01-22T00:31:41.000000Z 2023-01-22T00:31:41.000Z 2023-01-22T00:31:41.001000Z
Latitude 53.46767 53.46783 53.46783
Longitude 7.6995 7.69954 7.69954
Altitude 9126.9312 9126.88574 9126.88533
Vel-V -9 -9.09952 -9.09952
Vel-H 20.06332 19.90263 19.90263
Heading 247 246.74159 246.74159
Temperature -57.6 -57.6 -57.6
Humidity 44.6 37.7 37.6
Pressure 297.6 297.64 297.64

Some other things I've noticed you are only sending every third frame and the program name should probably be updated to include dxlAPRS.

darksidelemm commented 1 year ago

@Eshco93 what is the precision of the lat/lon you are getting out of dxlAPRS? We are still seeing some position differences, that I suspect might be due to the precision of what you are getting from dxlAPRS. Is there no way to get the full precision information?

Eshco93 commented 1 year ago

@LukePrior That looks far better :-) The fact that I don't upload every frame is due to my settings for dxlAPRS. dxlAPRS does also have a sending interval, which is set to 3 seconds in my current configuration. In fact dxlAPRS has multiple sending intervals for different altitude ranges. For example you could choose a smaller sending interval for a descending radiosonde that is quite close to the ground (to get better precision of the landing position). Like I previously said, the name "SondeHubUploader" is only temporary whilst the extension is still in development. I'll choose a better name that also includes dxlAPRS before releasing the extension. By the way...any suggestions for a good name? :-D

@darksidelemm From dxlAPRS I'm getting the position in degrees and minutes. Those minutes are with 2 decimal places. Does that explain the small position differences that you are still seeing? I'm not quite sure whether it's possible to increase the precision I'm getting from dxlAPRS. I suspect it's not, because the APRS formatted frames that I'm getting from dxlAPRS are standardized. Here is the documentation for the APRS format:

http://www.aprs.org/doc/APRS101.PDF

Information about time and position formats can be found on page 22 ff.

What is your opinion on those position deviations? Are they small enough that you can accept them, if it's not possible to increase the precision?

darksidelemm commented 1 year ago

I'm well aware of the APRS format, and its limitations. I don't like it for this application for these kinds of reasons.

I'd recommend trying to get higher resolution data out of dxlAPRS somehow.

Eshco93 commented 1 year ago

Okay, I'll take a look and check whether that's possible.

Eshco93 commented 1 year ago

Good news, I think I found a solution for getting the full precision from dxlAPRS. The solution is the APRS precision and datum option. This option is included in the APRS packages that I'm getting from dxlAPRS. I'm currently just not evaluating this. By doing that, I will get two more digits for the minutes - so 4 fractional digits for the minutes in total. That means full precision :-)

Once again, I will get back to you when I've implemented this.

Eshco93 commented 1 year ago

I've now enhanced my position calculation, using the APRS precision and datum option. For a quick test I ran the position data of U1460256, frame no. 7922 through my position calculation (the frame that @LukePrior posted in his last comparison table). I'm now getting the exact same position as rdz_ttgo_sonde and radiosonde_auto_rx. So it seems like that's working properly.

rs1729 commented 1 year ago

Watching U2330433 on Sondehub, I see SIMON2-14 frozen position at 23418m, only the max altitude is updating.

U2330433_20230124_00Z

EDIT: Time looks suspicious, 00:00:18 GPS, i.e. 00:00:00 UTC would be the next frame!?

darksidelemm commented 1 year ago

Sigh. This is what happens when a position is uploaded with an incorrect time (in the future).

In this case, there have been a bunch of positions uploaded with a datetime one day into the future (the 25th). If you have a look at https://api.v2.sondehub.org/sonde/U233043 you can see all these positions at the end.

This needs to be fixed.

rs1729 commented 1 year ago

Probably a roll-over issue, the dxlAPRS time in UTC went 23:59:59 -> 00:00:00 I guess.

darksidelemm commented 1 year ago

If this data is being gathered from an APRS packet, then I'm wondering if date information is even included in packet at all.

Eshco93 commented 1 year ago

No, the date information is not included in the APRS package. Therefore I have to add the date information myself. I'm doing it the same way auto_rx does it for those IMET radiosondes that don't even seem to transmit the date information.

I'm adding those leap seconds in the same function where I add the date. Unfortunately, I added those leap seconds before checking if I have to add or substract a day from the current date in order to get the correct date regarding the sonde time. That lead to this rollover issue that you were seeing, where an additional day was added.

It should be fixed now.

darksidelemm commented 1 year ago

... on a related note, what are your plans for supporting other radiosonde types? If you can get the complete callsign out of dxlAPRS (not just the APRS-formatted callsign), then supporting DFM, M10 and M20 should be possible, but I know that dxlAPRS's method for calculating iMet callsigns is different to mine.

Eshco93 commented 1 year ago

Ultimately, I'd like to support all radiosonde types that are currently supported by both, dxlAPRS and SondeHub. As far as I know this comes down to the following list:

Before starting the development of my extension, I took a quick look at the quirks of some of these radiosondes (in particular the first 5 on the list) in order to anticipate potential problems that could arise during development.

Therefore I already know that the IMET radiosondes don't transmit a serial and that you somehow have to define a unique serial in order to have a means of identification. I'm also aware that dxlAPRS and auto_rx are doing that differently and are therefore generating different serials. But since your serial generation is based on the time, framenumber and frequency, I'm able to reproduce your serial generation using the data that dxlAPRS provides. I already wrote a function that will do this and I think I also ran some test data through it in order to verify that it outputs the correct serials.

Another important thing I remember is that DFMs and M10 don't transmit a framenumber and therefore you have to come up with some unique identifier for each frame. If I remember that correctly, your's is based on the gps epoch. But I'll look into that later.

I just started with the RS41, since it's the most used radiosonde type in my area. And therefore it's the easiest for me to test. Each day, my station is receiving several of those. I'm also regularly receiving an IMET radiosonde from Beauvechain (Belgium). DFMs are also sometimes received by my station. The german military is using those sometimes. I still have to come up with some means to test the other types, once I've implemented them.

rs1729 commented 1 year ago

Don't know if the APRS time data is always UTC, but for the decoders used in auto_rx you can look at the sources for the JSON output for each decoder, if it is GPS time or UTC: "ref_datetime": "GPS"/"UTC". M10 could do both, it has the leap seconds in the GPS data, the JSON output is actually UTC. You will find the generation of the frame number based on date/time for DFM and M10/M20 also there.

rs1729 commented 1 year ago

iMet-4 "EA34B81A" from Belgium last night showed two different IDs from uploader "SIMON2-14" not matching the "regular" ID: imet4_20230131_00Z However if I put e.g. the first json frame of "8C034AB6" into the imet_unique_id() function, I get the same hash as in the "regular" ID:

2023-01-30T22:01:50Z403.000 MHzSONDE
IMET-EA34B81A

i.e. power on 2023-01-30T22:01:50Z and frequency seems to be ok.

@Eshco93 can you check how you generate the input string and the hash?

Then your altitude values are not integers, e.g. "alt": 19900.0872, however the iMet-4 data has integer altitude https://github.com/rs1729/RS/blob/master/imet/imet1rs-binary.pdf How do you calculate the altitude? Is there a conversion back and forth?

Eshco93 commented 1 year ago

To be honest, I already wanted to disable the upload of telemetry data for IMET radiosondes for now. I haven't had time yet to examine the IMET data closely in order to see if it's actually being generated correctly all the time. And I didn't really want to upload any more erroneous or inaccurate data to the database when it could be avoided. I'll do that now and re-enable upload once I've fixed the bugs.

The altitude that I'm getting from dxlAPRS is in feet. I'm converting that to meters with 4 decimal places (currently for all radiosondes). I didn't know that IMET radiosondes only provide an integer altitude. I will round the calculated height in meters for IMET radiosondes to an integer from now on. That should give the same altitude that you are getting from auto_rx/rdz_ttgo_sonde.

Eshco93 commented 1 year ago

I found the problem. For some reason I've used an old version of the imet_unique_id() function. I got no idea how that happened. :raised_eyebrow:

Therefore I was missing two important things that were causing this issue:

  1. The SONDE at the end of the input string
  2. Rounding the frequency to the nearest 100 kHz

The frequency was drifting by ~1kHz and therefore I sometimes got 403.000 MHz and sometimes 403.001 MHz. That was the reason for the two different serials:

2023-01-30T22:01:50Z403.000 MHz
IMET-8C034AB6
2023-01-30T22:01:50Z403.001 MHz
IMET-C064E624

Oh, and another unrelated thing. The original issue that we discussed here has long since been resolved. So perhaps I could close this issue and we take all further discussion to a more appropriate place? Just a suggestion. If keeping the discussion here is ok for you, that's also fine. As you wish.