Closed blchoy closed 4 years ago
A deeper investigation revealed that this is not as simple as what I thought. In fact, iwxxm.sch is version specific and can only be used with the IWXXM version it is designed for (e.g. Version 3.0). Then how come it can be used to validate a collective with multiple versions of an IWXXM report? Seems to me that we need a separate schematron file to validate collectives. [Updated: Probably not. Looking at the iwxxm.sch again it will check those parts in COLLECT 1.2 and IWXXM 3.0 only because the rule contexts have constrained to their respective namespaces. To validate those parts in other versions of IWXXM it can be as simple as running iwxxm.sch of all other versions of IWXXM. As a result, AEROTHAI's suggested changes can still be applied]
Aren't member states supposed to validate their IWXXM messages with the appropriate version of iwxxm.sch before sending?
If a schematron file can't do different versions of IWXXM in a collective, then a RODB will have to "disassemble" those collectives which have different IWXXM messages in them and validate them separately.
Not really clear to me as to how these mixed version collectives come about and how many times these messages have to be validated.
It all started when people are looking into the need for RODBs to aggregate IWXXM reports of METAR or TAF that are in different IWXXM versions. There are situations where schemas of some of the report types do not change from one version to another (e.g. a new report type is introduced in the new version only). As Annex 3 does not mandate the version of IWXXM to be used with an amendment all compliant version of schemas can be used, hence the possible occurrence of mixed version collectives.
As we are telling people that IWXXM 3.0 is the only version that is Amendment 78 compliant, there should be no mixed version collectives for the time being, but when the next version comes which is likely to be the one that include the new WAFCs SIGWX objects only, we should be able to handle this at that time.
After doing some tests, the following summarize what we could do with regard to this issue:
In a separate discussion with @marqh, we may do nothing right now as we are not encouraging people to use versions prior to IWXXM 3.0 and the need to implement support for multi-IWXXM version collectives can therefore be delayed until IWXXM 3.1.
Views please?
Hi all,
we had quite a long discussion in IBL about this topic. The intention to collect different version of IWXXM reports is related to the question if all IWXXM users are able to update their software in time or even if they need to update it. It is obvious that updating to the latest IWXXM 3.0 version is not necessary, when I am using, for example, only METARs which are the same as in IWXXM 2.1.1 (let's suppose so). But if I don't update my system to IWXXM 3.0 then I will not understand these reports. Moreover, I will not be able to validate them because I don't have the latest validation rules. In other words, if we expect that there are several IWXXM version used in the real world in parallel, then it means that we expect that not all institutions are able to update their systems in time. Otherwise they would use the latest version.
So back to the original problem. There are three ways what we can do:
I know I went beyond our original topic, but I think now is the right time.
It seems to us that collecting reports in one large XML brings quite a few complications, but not many obvious benefits. We understand it is now likely too late to change anything, but we are wondering if collecting reports as multiple XML files in a ZIP archive would be an easier solution? (as when for example Microsoft Word DOCX is a ZIP collection of multiple XML documents).
Hi @borisburger,
I think there is still some value at this point to have XML collectives of one or multiple IWXXM reports (equivalent to a TAC bulletin) as one can extract content from them directly without any pre-processing. Of course from a software engineer's perspective he/she can always create a filter which reads the zip file and expose the content in exactly the same way as the XML collectives. My personal preference is to do less right now as the need of collective will eventually gone with the introduction of SWIM (and WMO is also thinking of phasing out the WMO Abbreviated Header Line too). I agree that if collectives are being used much longer than we expect the problem will grow in time (i.e. the combined schematron rule file that validates a collective with multiple versions of IWXXM reports will grow with the number of IWXXM versions) and ZIP may be a better solution.
While we mentioned that collectives will become non-essential in SWIM, we have never analyse how publish/subscribe and WFS can lift the need for them. May be you could shed some light on this so that we could make some proposals for the MET-SWIM plan?
Regards, Choy
Hi @blchoy, in WFS when user requests data for a list of aerodromes, or for a geographic area defined by latitude-longitude bounds, the WFS response will encode multiple "reports" in a wfs:FeatureCollection, which is a sequence of GML members. There is a more fundamental difference between GTS and WFS collections though:
one can extract content from them directly without any pre-processing
In the traditional GTS world the collections will inevitably have a portion of reports you do not necessarily care about. A real-world use case is that users want to see data for one particular aerodrome (or several specific aerodromes). If it is "buried" in a collection of 50 unrelated METARs, the software needs to extract the data for the aerodrome that user wants from the large XML. Loading large XML DOMs into typical parsers is slow and memory consuming. That is why I do not understand you saying "without any preprocessing".
From a software engineer perspective it is more efficient to split IWXXM collections into individual per-report documents when storing them into any sort of "database". And when you split the collection, then you could also validate each such sub-document separately, alleviating the need for combined iwxxm.sch that would accumulate all IWXXM versions over time.
Regards, Boris
I am not sure how WFS or publish/subscribe can remove the need for traditional GTS-like collections. Each has its strengths and weaknesses:
I think there is no silver bullet here. Maybe publish/subscribe will replace traditional store/forward when WIS matures. Web services are great for specific use cases, but for widespread adoption more standardisation and maturing is due. In the transition period it is probably best to work with all the currently used approaches.
Hi @borisburger thanks for your comments. There are teams working fast on the implementation of pub/sub mechanisms and I am going to bring this discussion to their attention. I agree that there is no silver bullet. However one of the principles of WIS 2.0 is that we want to modify GTS in a way that collections are not needed, but I think that a complex work on the catalogue needs to be done to be able to provide effective access to data streams. Granularity is always a problem of balance on where we stop in detailing the data.
Thanks @borisburger for your views. I am sure there are still a lot of design works to do to move forward to SWIM, especially when different stakeholders have different use cases and expectations (e.g. information producers may want to package his data for dissemination, database owners may want to scrutinize data and make it servable with the least effort, and end users may want to have easy access to the information they need). I am trying to address these with the new SIGWX objects for WAFCs' SIGWX chart and I will definitely involve the team when things are more mature.
But going back to our original issue, could @efucile shed some lights on my suggested moves?
First of all I have to remind everyone the current status of IWXXM 3.0 . We have submitted the new version for consultation to the national focal points (NFP) on codes matters. They have until 16 August to send comments and request for changes (none received yet). The only way to make a change now is to have the problem officially reported by a NFP as the ball is in their field now. After that no other opportunity will be open until next fast track in November for approval in May 2020. If a NFP is reporting such a complex problem this will force us to respond and delay the implementation date that now is 7 December 2019. This will affect mostly the space weather component that is the new part and we have a requirement to have it operational by end of year.
I think we should avoid this path. If you don't agree please comment on this otherwise I assume that we are not making any change for now.
Going into the specifics of the problem. Thanks to @blchoy and @jkorosi for your proposals. We need to decide the way forward. My considerations to facilitate the decision.
With this in mind we could adopt solution 1. from @jkorosi , but I doubt that everyone will fully comply. My experience is that mandating a rule like don't mix different versions of IWXXM in a collection will be disregarded in many situations and open the field to a number of incidents. At the end your software will need to take into consideration the case of a collection that "accidentally" contains different versions of IWXXM. Therefore I don't think we have any other option left other than 2. from @jkorosi .
Conclusion. I think we should clearly communicate to the users that collections are a communication artefact and validation of messages in a collection has to be performed individually after splitting the collection in single messages. I think that in the economy of our limited resources we should spend our energies to think how to get rid of the collections rather than improving them.
I think we should clearly communicate to the users that collections are a communication artefact and validation of messages in a collection has to be performed individually after splitting the collection in single messages.
I concur with this conclusion. As we are not asking people to run iwxxm.sch with collectives the fact that COLLECT.MB1 in iwxxm.sch will complain with different versions of IWXXM reports in a collective shall not occur. In the next version of IWXXM (e.g. version 3.1) we can remove the redundant COLLECT.MB1 from iwxxm.sch.
At TT-AvXML-8 the team decided that this is a useful restriction for IWXXM 3.0 and shall now be considered for IWXXM 3.1
While we will confirm WG-MIE's understanding that the only IWXXM version to be used since Nov 2019 will be 3.0.0, we may want to consider telling them what will happen when we started to have reports with no change in structure in IWXXM 3.1, no matter we want to keep COLLECT.MB1 to check the integrity of a collective or not. The following are the technical details:
This was fixed in PR #200.
Worapong Jirojkul of AEROTHAI has the following comment with regard to rule COLLECT.MB1 in iwxxm.sch:
I think this is a reasonable suggestion. However, we may only want to amend the rule in iwxxm.sch but not the one in the original collect.sch as aggregating feature instances of the same element name but different name spaces (which corresponds to different IWXXM versions) may be unique to IWXXM.