Closed petersilva closed 2 years ago
An extension with additional geo information and type should not be a problem from my point of view. The message size does not increase significantly (in default case) and if we comply with a general standard we have a clear advantage. However, we should possibly set a maximum message size so that we do not have problems in the long run and a basis to reject message with large sizes. An extension to include geo information may not be widely used at the moment, but offers potential for further development. A "wrapping" of our content values with "properties" to be more familiar to the geoJSON format is also not much change
{ "type": "Feature", "geometry": { "type": "Point", "coordinates": [102.0, 0.5] }, "properties": { "prop0": "value0" } }
Also here I think important is a limit of the total size, because after geoJSON there seems no limit is set for further properties.
A download link or a service url is core content, but I think that probably in the WMO world also the repository systems and their repository structures will differ and are not necessarily oriented to the topic structure, therefore a total value in one field or other names of fields that has to be concatenated seems to be ok. The original idea that only the baseUrl has to be adjusted to own values seems to be not practicable in the diverse environment of WMO Nodes. Since a split into baseUrl and relPath is currently used in the pilot, the question would be what reasons speak against keeping the two fields to be merged for link?
However, an id or a unique filename value is very helpful for possible error searches and quality checks. We should be able to clearly identify content among each other. If someone asks if content x has arrived, it must be possible to quickly check if it is there. As the storage structures could differ, there must be some kind of id that makes this possible.
The unique filename (as represented by the relPath.. potentially renamed) is either:
The two are equivalent.
If only a complete Url is provided, the baseUrl cannot be extracted for it, so any use case that needs baseURL is broken.
In summary: The ET-AT suggestion reject and/or discards all work on message format by TT-protocols since it's inception, and makes conceptual and semantic changes whose effect are unknown, necessitating a return to design phase, and re-validation of all existing work. There are many issues raised:
All of which can be done, but requires investigation and validation, and will take some time to work through. A minimum impact alternative #81 which just adds fields to the current payload is far more straightforward, and could be done immediately.
Yay, I almost thought that we will get bored.
@antje-s agree just moving the fields around is not a problem... it's actually minor and already implemented (the feed from hpfx has v03 (tt-protocols original flavour messages) and v04 (geojsonized ones) available now.. just pick v04 as the topicPrefix and geoJSON goodness at your fingertips. geoJSON isn't the issue, it's the wholesale replacement of fields by other fields with different semantics, the banning of file names (which basically forces WAF to use the message id as the file name) that they don't want paths on WAF to be representative... So I guess we should be generating random trees with id's in them, as the base case.
I do not have access to https://wmo-teams.atlassian.net/wiki/spaces/WIS2/pages/322568193/2022-04-04+ET-W2AT+Meeting @petersilva This is a groundbreaking change and I do not understand the reasons. And I already see significant drawbacks. Who proposed this and why? Could we have a teleconference about it?
There was a meeting in Geneva (I wasn't there) and the group decided this was a good idea. the notes are from there... I think the content of this issue accurately reflects their request. In subsequent ET-AT meetings I tried to explain the implications, but people seem fixated on "WAF is ugly", so so everything else I say seems to be noise to them. They claim breakage.. it was N vs. 1... so ... whatever...
I'm implementing it for folks to see, and they can judge based on the result of their recommendations. I don't support these changes at all... (note: geojson is fine... it's the rest) but I'd rather implement than argue.... It's a test feed... no harm done.
People will gradually re-discover that we originally had an optimal, minimal solution... or we'll end up with some baroque rube Goldberg thing... where they keep adding things because it doesn't work because they don't understand the semantics that were implicitly taken care of by the original... whatever... out of my control. I think people think I'm difficult now, so would prefer if others spoke.
As you can see. I'm piling up the issues, and we will certainly start talking about it at the next teleconference. with this new development the committee has a lot more work to do.
@petersilva The notes that I cannot access (but I have sent a request so maybe I will get access) ;-) Now, in parallel, I was chatting with Tom Kralidis. He pointed out that https://geojson.org/schema/Feature.json does allows "null" geometry. So
{
"pubTime" : "20190120T045018.314854383",
"baseUrl" : "https://localhost/data/20190120",
"integrity": {"method": "sha512", "value": "A2KNxvks...S8qfSCw=="},
"relPath" : "WIS/CA/CMC/UpperAir/04/UANT01_CWAO_200445___15103.txt",
"size": 457,
"content" : { "encoding": "utf-8", "value": "encoded bytes from the file" },
"retPath" : "4Pubsub/92c557ef-d28e-4713-91af-2e2e7be6f8ab",
"type" : "Feature",
"geometry": null,
"properties": null
}
is a valid GeoJSON. This said, almost any JSON can be trivially converted into a valid GeoJSON, but that does not make things interoperable. I am going to do some analogies - It is almost like telling: "your software produces XML my software consumes XML we are interoperable". For some data, a CSV is much better than XML because CSV is much more limited and it hints that the data is tabular. Similarly, using our primitive "WMO mesh message-schema" makes things clearer than the use of a much more generic format. We can refer to https://geojson.org/schema/Geometry.json as I did in the issue081 (I will probably do one more alternative there) but we should not market it as GeoJSON because that will just confuse everybody.
To be clear, I think using geojson as the format for mqp messages is fundamentally wrong, as geojson is a data format, and most people will reasonably interpret such information as meaning that one should post geojson data as mqp messages. One will then always have to have a second conversation about sizes, and feature limitations or "profiling" ... and at that point why bother calling it geojson at all?
@antje-s agree just moving the fields around is not a problem... it's actually minor and already implemented (the feed from hpfx has v03 (tt-protocols original flavour messages) and v04 (geojsonized ones) available now.. just pick v04 as the topicPrefix and geoJSON goodness at your fingertips. geoJSON isn't the issue, it's the wholesale replacement of fields by other fields with different semantics, the banning of file names (which basically forces WAF to use the message id as the file name) that they don't want paths on WAF to be representative... So I guess we should be generating random trees with id's in them, as the base case.
@petersilva switched the client to v04 as a test and it works
I was mistaken...btw... I believe the ET-AT proposal is against id being the file name as well, so some other random name must be generated. The Canadian feed implements random file names as well (different names from message-id's)
I have commented on some of the referenced issues where I had the impression that my view might be helpful. Here I would like to comment on the high-level topics:
Do we want to make the data notifications in WIS2 to be valid GeoJSON documents? I do not think so. The format is very generic and has only 3 required properties and at least two of them can be "null" thus two GeoJSON documents can differ in all but one property. At the same time, in certain domains the GeoJSON is used for actual data and consequently specialists from those domains may get confused and assume that notification messages in WIS2 will contain specific things - either values or formats of the values, that they are used to from their specific GeoJSON sub-specifications (e.g., compare "time" and "display_time" in http://agora.ex.nii.ac.jp/digital-typhoon/geojson/wnp/201601.en.json with our format).
Does it make sense to use "GeoJSON Geometry" in the data notifications? Yes, but.
I have added some examples to https://github.com/wmo-im/GTStoWIS2/tree/issue078/message_format
message-schema.json
and message-schema.yaml
is another iteration of GeoJSON based schema.message-schema-full.json
is the same thing but bundled with the GeoJSON stuff, so you can use it standalone.message-example.json
is example of a message/notification that corresponds to situation when relPath != retPath.message-example2.json
is example of a message/notification that corresponds to situation when relPath == retPath, thus it is closed to the typical WAF usecase.I had troubles with expressing the nullability of the geometry in a way that would be accepted by https://www.jsonschemavalidator.net. I will check this later on. Moreover, I had issues with "date-time". That is a known problem that this type in JSON means RFC 3339 format that is a somewhat lame subset of ISO 8601 (not only it is longer, but it does not allow higher precision than milliseconds). In my opinion ISO standards are more, er, standard then RFC, but for now I have used the RFC format.
in example 2, you use a templated url value, hierarchy, from properties:
subsequent discussions with ET-AT, reflected here: https://wmoomm.sharepoint.com/:b:/s/wmocpdb/Ech78Pb1k_9Jvde0uB-Ir0MB4giLGIaMBjelE0sETNhYRw?e=0ehXK5 brief summary of updates:
As this changes the proposal substantially. likely further discussion with multiple different proposals at different points in the issue will be hard to follow. We will close this issue. in favour of #90 to discuss only the revised proposal.
Well, as per committee consensus, I asked for geographic and temporal extent advice from ET-AT, and they replied by asking us to use GeoJSON format outright with a heavy re-work of elements of the payload that changes semantics significantly.
https://wmo-teams.atlassian.net/wiki/spaces/WIS2/pages/322568193/2022-04-04+ET-W2AT+Meeting
The ET-AT suggestion reject and/or discards all work on message format by TT-protocols since it's inception, and makes conceptual and semantic changes whose effect are unknown, necessitating a return to design phase, and re-validation of all existing work. There are many issues raised:
83 pertinence of WAF as a model for TT-protocols work.
80 opened for adding "topic" to payload.
84 for more comprehensive means of addressing comments about Integrity.
82 requirement for baseURL
79 Add version tag to payload.
85 meaning / specification of id in GeoJSON records.
86 replace retpath by links from STAC
87 is Z timezone specification really needed, or can it be implicit.
All of which can be done, but requires investigation and validation, and will take some time to work through. A minimum impact alternative #81 which just adds fields to the current payload is far more straightforward, and could be done immediately.
We should put a link in to Jeremy's slides, I'm just going from memory from what was presented for now:
changes:
there is an opaque id field (sort of replacing relPath) It was explained in the presentation as a uuid style hash. Looking at RFC7946 however, it is clear that it is meant as a relative identifier within the GEOJSON record. usage as envisioned by the proposal is a departure from what the specification envisioned. #85
obviously uses geometry ... from GeoGSON.
there is a "topic" field in the message (under "Properties"): #80
so a data topic um... the relative Path in a WAF from the baseURL, minus the file name.
relPath is replaced by combination of topic (hierarchy dot/separated? which is odd since in MQTT ithe separator is /) ( #80, #86, #85)
balance of fields moved into properties.
baseUrl is gone, so there the ability to replicate file trees is broken. ( #82)
version tag in the message rather than in the topic. It was mentioned that once the message has been received, we no longer have the version. ( #79 )
good:
observations:
relPath is bad, but relPath - filename == topic... topic is good... and if the id is implemented as per the spec, then it becomes a relative identifer... so... identical the the combination of topic+id ... likely could accommodate what is essentially a name change... but given that it isn't the entire topic, it should likely be relative topic, RelTopic, because otherwise it will be extremely confusing for people to know the difference between the real topic, and the one in the message.
having a filename is bad, but having an id is good... it's the same thing. how is one bad and the other good? if the hierarchy is bad, then why is topic introduced?
Looking at RFC 7946, GEOJSON allows for "Foreign Members" (https://www.rfc-editor.org/rfc/rfc7946.html#page-15) , which can be placed at the top level. So the existing TT-Protocols format, once, "type" and "geometry" are added, is already GeoJSON. There is no need to change anything else for GEOJSON compliance. "id" and "version" in the proposal are examples of foreign members, and "foreign members" such as the "relPath", "integrity" etc... are entirely permitted by the standard.
not clear why the rest of the reformatting is necessary if software that does not understand the messages, will still not understand it regardless of it's placement in "properties", and software that parses geojson will still understand the existing payload in so far as it will already be geoJSON, to the same degree that STAC is. I suppose someone will argue that such GEOJSON is "unusual" ... I wonder if, in this case, being unusual is an advantage, it that it is unlikely to be mistaken for ordinary GEOJSON.
Worries/Dangers:
The robust use case of testing is all with the file replication case, all validation benefits of using a format that is heavily used (for announcements) is lost.