wmo-im / BUFR4

BUFR edition 4
MIT License
27 stars 9 forks source link

Regulate the use of WIGOS Station Identifiers #50

Closed efucile closed 1 year ago

efucile commented 4 years ago

Summary and purpose

With the following letter to Members CircularLetter_2017-10-30-OBS-WIS-DRMM-DRC-WIGOS-ID_en.pdf the use of WIGOS Station Identifiers (WSI) was introduced and some guidelines provided. WIGOS implementation plan requires a phased implementation of WSI in BUFR data. The introduction of standard practices in the Manual on Codes related to the mandatory or optional use of WSI sequence is required for a successful implementation of WSIs.

Action proposed

Draft WIGOS Station Identifiers regulations to be added to the Manual on Codes Vol. I.2 with the following content:

  1. Which sequence shall be used to represent WSI and how
  2. for which sequences the use of WSI is mandatory or optional
  3. How to migrate to WSI. Timing, announcements ...

The draft amendments to the Manual on Codes Vol. I.2 must be submitted to the INFCOM session in November and therefore have to be ready by the end of September.

efucile commented 4 years ago

Building on the suggestion of @SibylleK I added two notes to WI sequence and some clarification in the "element description" column. @SimonElliottEUM and @jitsukoh can you please have a look and make suggestions? WSI_TABLE_D_306_I2_2019_en.docx

efucile commented 4 years ago

@wmo-im/tdcf some more clarification. The idea behind my proposal is that WIGOS Identifier is a new schema for identification of stations or other platforms. It will be applied also to satellite platforms one day. We should propose the WI sequence to precede the pre-existing identifier sequences. All of them in principle. We should ask ET-WT for which identifiers this is required, making clear that we have a long list of identifiers already in use. This will make ET-WT aware of the breath of the issue which is not only related to the B/C regulations. Without input from ET-WT we cannot know which identifiers are going to be replaced by WI. Please comments are welcome. We need to make some progress on this.

jbathegit commented 4 years ago

@wmo-im/tt-tdcf I think it is not so simple as just adding Element Description notes to Class 1 of Table D. Many of these sequences are included as subsequences within sequences of other Classes. For example, 3-01-001 is referenced within other sequences in Classes 7, 9, 16 and even 35, and not always as the first element of the sequence. Similarly, 3-01-031 appears in other sequences in Classes 7, 9 and 21. So would we need similar notes for all of those sequences as well, reminding that 3-01-150 now needs to be included preceding the entire sequence? As @efucile notes, the scope of this issue could quickly become difficult to manage, due to the cascading nature of Table D sequences and the fact that many Class 1 sequences are used as building blocks for sequences in other classes.

jbathegit commented 4 years ago

Now that I think of it, another problem with just adding Element Descriptor notes is that these are often missed or difficult to easily incorporate, especially by automated parsers which process csv or xml versions of the tables.

As a side note, if we had proper versioning of Table D descriptors, one easy solution could be to just replace all of the occurrences of 3-01-001 with 3-01-150, wherever they occur within other sequences of Classes 7, 9, etc. in the next version of Table D ;-)

efucile commented 4 years ago

Thanks @jbathegit you are right and I am not keen on notes and comments in the tables. I also agree with the effectiveness of the "proper" versioning. We would be able to introduce 3-01-150 from the next version without any note. I wonder if this is feasible. It is a clean solution. Very tempting and absolutely within the rules! I know someone is going to shout that not all the decoders implement strict versioning of the tables.

jbathegit commented 4 years ago

What might be an even bigger issue is that not all encoders (especially ones used by commercial vendors and installed within COTS observing systems) don't follow the rules either.

SimonElliottEUM commented 4 years ago

I note that with an eye on the satellite data (which of course dominates global data exchange in terms of volume), @efucile has inserted an 0-01-150 WSI before sequences containing 0-01-007 (Satellite identifier). This is a reasonable approach but needs to be proposed for 3-01-125 for completeness. Also there are many satellite sequences in Table D (eg in classes 10, 12, and 40) starting with 0-01-007, and these will also need a similar approach. Like @jbathegit and @efucile I think the element description column in Table D is of limited value except to human readers. We need strong notes in Table B to explain that 0-01-007 should always be preceded by 0-01-150. From a logical model point of view, as long as it is there at least once it doesn't matter if the descriptor comes in multiple embedded sequences if it has the same value.

efucile commented 4 years ago

Dear @jbathegit you are right the risk associated with such a solution is too high. While I was writing this answer @SimonElliottEUM comment came and I see that you both think that the note and comments are not strong enough to enforce the change. I have an idea on which I would like @wmo-im/tdcf to comment. We could take a more radical approach and make new WIGOS categories out of the existing, cloning current sequences with the addition of 3-01-150. We could do it gradually, starting with 07 and 09, not even all the sequences in those categories, and create 47, 49 as WIGOS sequences. In this way, we have a controlled means of rolling out WIGOS sequences, and also satellite data could be addressed in this way. This approach would require coordination with other teams on SC-ON and changes to B/C regulations. It has the advantage that gives a clear indication on how to migrate, without recurring to notes and does not bring the risk of using the version number. I think that we should bring this to INFCOM session because it is not a typical fast-track change and also because we want to get a wide consensus and advertise the change widely. I think that there is some significant work to do it, but it can be done by the end of November/ beginning of December as required for an INFCOM document.

jbathegit commented 4 years ago

Dear @efucile, yes this might work, but how would we enforce an eventual migration to the new sequences? Perhaps you were envisioning such a directive within the B/C regulations? Bottom line, if there's no incentive or impetus to change, then most if not all encoders will just continue using the existing sequences, and then we wouldn't make any real progress towards the use of WIGOS identifiers.

As a side note, if we went with this sort of approach, we wouldn't be able to use Class 49 in Table D, as that's a local class (only Classes 0 through 47 are standard).

Rather than proliferating a whole new set of sequences, part of me still likes the idea of changing the existing sequences in a new version of Table D. In my earlier comments, I wasn't necessarily saying this couldn't be done, but rather just trying to point out some of the issues and concerns that would likely be raised by the user community. But that doesn't necessarily mean it's not the right thing to do, and users could still continue to use the old sequences by just continuing to use older versions of the tables, and then migrate to the newer versions whenever they're ready. So it could still be an orderly migration, and this approach also provides some added incentive in the fact that any new tables version number applies to all of the tables, meaning Tables A, B, CodeFlag, etc., and not just Table D. So the incentive is that if users ever want to make use of any other future additions (e.g. elements, sequences, code/flag entries) to any of these other tables, then they'll only be able to do so by fully migrating to the newer version numbers, which in turn means they'll also need to migrate to the updated Table D sequences containing the WIGOS identifiers. Sort of a "half carrot, half stick" approach ;-)

Of course I'd like to hear comments or other suggestions from the rest of the @wmo-im/tt-tdcf I know we've discussed/argued the concept of table version numbers for many years, but to me this seems a clear-cut case where we could really take advantage of this inherent feature of BUFR.

efucile commented 4 years ago

Hello @wmo-im/tt-tdcf can you please comment on this thread. We have three proposed solutions at the moment

  1. Note in sequence 3-01-150 and comments in relative identification sequences that are going to be preceded by 3-01-150.
  2. Add sequence 3-01-150 to existing sequences in next version of the tables.
  3. Create new sequences with the additional 3-01-150 cloning and assigning a new sequence number in a new category

All the solutions have pro and cons that you can read in the posts above. Your opinion is very important and we need to decide as soon as possible to make sure that we can draft the amendments in time for INFCOM session.

SibylleK commented 4 years ago

Dear @efucile, thank you very much for the summary, which makes it easier for me to comment.

In my view solution 1 is the only practicable option, although I do not think that all the comments in other sequences except 301150 are necessary. It should be sufficient to have a note or comment for 301150 which says, that this sequence should be placed before BUFR/CREX sequences if not already included.

Regarding solution 2, as I do not know so far any application software which has to handle with the content of BUFR messages, that considers the BUFR master table version number, I will always argue against changing table D sequences:-)

Regarding solution 3, I suppose it is a little bit overdone and it would rather hinder the migration to WIGOS identifier.

If I consider the issue of the migration to WIGOS Station Identifiers, the encoding in BUFR is the least problem and could be done with existing BUFR table entries. The main topic will be, whether the data users including NWPs are ready to use WIGOS Identifier. At the moment this seems not to be the case and this problem solving is the real challenge. Therefore, all what would rather distract from this and make the process even more complicated by adding unnecessarily BUFR issues, should be avoided. That's why I am voting for solution 1.

marianmajan-ibl commented 4 years ago

Dear Enrico, thank you for sorting the options out and Jeff, Simon for very valuable input.

I would vote for the option 1 too. That is an approach that has been already adapted for "standard" data and I think it was quite practical -- for users as well as for producers. The main problem is that the information about a station in a report (WSI as well as traditional WMO station identifier IIiii) is duplicated, however that is a minor problem -- decoders can simply ignore the old, traditional station identifier or mark reports where these two don't match as incorrect.

We have discussed issues with the solution 2 many times in the past. There has been a decision made not to change sequences between different version. As a voice of data producers: please, keep it for this BUFR edition.

I think the solution 3 is good as well, however, it requires quite a lot of work which is (from my point of view) not really necessary as the option 1 will work too.

marijanacrepulja commented 4 years ago

Hello @wmo-im/tt-tdcf , I would support solution 1 for the time being taking into account pron vs cons and having in mind readiness of users for using WIGOS. Solution 3 is good alternative. I think, we will come to that gradually while introducing new sequences.

sergioh-pessoal commented 4 years ago

I think this is a complex discussion. Note that at the moment we have in table D only 6 templates that have WIS: 3-07-092, 3-07-103, 3-08-018, 3-09-056, 3-09-57, 3-11-02)

This is too little.

On the other hand, there are many data from surface stations that do not have the WMO block and station number (3-01-001), but are being reported in BUFR. Several of these using expanded templates

The use of 3-01-150 would be very useful in these cases.

There are other examples like the template 307091 that don’t include 3-01-150, and the 307092 that included it.

What Brazil is doing to encode with 307091 is add 301150 before this sequence.

In that way, I stay with the option 1. It is apparently the most natural option

jitsukoh commented 4 years ago

Thank you @efucile for the summary and thank you for all comments.

I vote for option 1 as well, and I am of the same opinion with Sibylle, i.e. it is sufficient to have a note or comment for 301150 which says, that this sequence should be placed before BUFR/CREX sequences if not already included.

On top of all the reasons that were already stated by other members, my point is that enforcing the reporting of WSI is not our team's role. We are only responsible for advising how to report WSI in BUFR/CREX.

More fundamentally, I have to say it is questionable if it is a good approach to enforce or even encourage to report WSI for stations that have a traditional 5-digit station identifier. WSI was introduced to allow to allocate identifiers to more stations than the traditional 5-digit station identifiers can and ultimately to obtain more observations from new stations/platform, and for this end, reporting WSI in traditional reports from stations that have a traditional 5-digit station identifier do not add any value. On the contrary its poor implementation would cause a real disaster in NWP systems. I dare to make an analogy with the BUFR upper-air reports converted from TEMP, which are causing only troubles without any added value. We should face up the reality and need to learn lessons from this painful experience.

There might be a counterargument that WSI is beneficial even for traditional stations to improve information management such as by exploiting OSCAR/Surface. The very simple solution for that is to make a rule stipulating that all the reported traditional station identifiers "IIiii" should be considered as being reported with WSI "0-2000s-0-IIiii" (s=0 for stations with sub-index number=0 and s=1 for stations with sub-index number=1). The beauty of this solution is that it doesn't impact any systems and it is also applicable to remaining TAC reports that we cannot afford discarding.

richardweedon commented 4 years ago

Enrico Whilst I appreciate that options 2 & 3 would seem to be the logical approach, I have to take a pragmatic stance on the matter and endorse option 1. My primary concern at this point will be to limit the impact of any changes on our current systems.

Regards

Richard

From: Enrico Fucile notifications@github.com Sent: 15 October 2020 10:31 To: wmo-im/BUFR4 BUFR4@noreply.github.com Cc: Weedon, Richard richard.weedon@metoffice.gov.uk; Team mention team_mention@noreply.github.com Subject: Re: [wmo-im/BUFR4] Regulate the use of WIGOS Station Identifiers (#50)

Hello @wmo-im/tt-tdcfhttps://github.com/orgs/wmo-im/teams/tt-tdcf can you please comment on this thread. We have three proposed solutions at the moment

  1. Note in sequence 3-01-150 and comments in relative identification sequences that are going to be preceded by 3-01-150.
  2. Add sequence 3-01-150 to existing sequences in next version of the tables.
  3. Create new sequences with the additional 3-01-150 cloning and assigning a new sequence number in a new category

All the solutions have pro and cons that you can read in the posts above. Your opinion is very important and we need to decide as soon as possible to make sure that we can draft the amendments in time for INFCOM session.

— You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHubhttps://github.com/wmo-im/BUFR4/issues/50#issuecomment-709037549, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AO5TCSNPLAHDBBZAG7W5EULSK26MPANCNFSM4QYMM2BQ.

efucile commented 4 years ago

Dear @wmo-im/tt-tdcf thank you for your answers and your clear argumentation. I believe that we have considered all the possible options from many aspects and I see that there is consensus on choosing to add a note on 3-01-150 and comments on the identification sequences as proposed in my first post with the attached document WSI_TABLE_D_306_I2_2019_en.docx

I think that we need now to refine the proposal. We need to answer the following questions

  1. Do we need at this stage to add 3-01-150 to satellite sequences?
  2. Do we need to write a clarification in B/C regulations?
  3. Are we going to mandate the use of 3-01-150 from the next tables version? How do we manage the transition?
efucile commented 4 years ago

Dear @jitsukoh I have to say that I 100% agree with your arguments regarding the possible negative impact of adding WSI to existing stations. However, I discovered that reality is more complicated and if we don't do something to manage the transition we will get into a chaotic situation soon.

  1. New stations can only be assigned WSI, not block/station Number. The old numbering system has been retired. Do we need to mandate WSI only for new stations? This will force systems to handle two different identifier sets.
  2. Many stations are using block/station number which was not in Vol. A and is not in the "0-2000s-0-IIiii" WSI set. This is a considerable number of stations for which we cannot assign "0-2000s-0-IIiii" they will have to be considered as new stations with country code. This makes the utility of this schema limited.
  3. There is a request from some countries to replace the "0-2000s-0-IIiii" WSI code with country code and their own numbering. This will break completely the connection between current data with block/station number and OSCAR/records.

I am afraid that we are in a situation in which the introduction of WSI was thought of only at OSCAR level and not for the data exchange and this will have negative consequences if we don't think of an appropriate way of dealing with the transition and the exceptions that can generate.

jitsukoh commented 4 years ago

@efucile thank you for the proposal. I am still of the opinion that the comment for 3 01 150 saying that this should precede sequences if not already included, would be enough. And the Manual on Codes is not the place to say something like (13). The careful and thorough discussion if and how we should impose this kind of intervention with ET-OM is necessary before drafting this.

I understand the importance of WSI, and totally agree with you that we will get into a chaotic situation soon, if we do not manage this in a right way. And exactly for this reason, I should argue that we need to do things in the right order. "Do whatever we can do when we can" approach is not the approach we should take to handle this highly complex transition process.

I fully appreciate the situation you stated in the previous post; then, what we need to start with is having a reasonable consensus among WMO Members and satellite operators about how to map traditional identifiers to WSI through mapping exercise on paper, and next reaching the consensus between producers (all WMO Members for traditional reports and satellite operators for satellite products) and users, especially NWP centers about the mapping. Mandating reporting WSI in BUFR/CREX is the next step, not before.

  1. Do we need at this stage to add 3-01-150 to satellite sequences? > We need to consult with satellite operators and NWP centers, if they are ready for this. @SimonElliottEUM do you think this discussion can be done in the CGMS code group? And I think we need to establish a mechanism to communicate on this with NWP centers, possibly through SC-ESMP.
  2. Do we need to write a clarification in B/C regulations? > We could do in the "general features" of B/C regulations, but if we should do now is another question.
  3. Are we going to mandate the use of 3-01-150 from the next tables version? > I think there are a lot to do before this stage, though it will be necessary at some point.

I would suggest that SC-IMT should have a comprehensive discussion from a broader perspective, rather than focusing on how to change Manual on Codes.

SimonElliottEUM commented 4 years ago

Concerning need at this stage to add 3-01-150 to satellite sequences, I could address at CGMS via its Task Force. But this will probably take a while as it will need discussion in session I expect (in May June 2021)

efucile commented 4 years ago

Summary of the discussion The Team decided to add a note in sequence 3-01-150 and comments in relative identification sequences that are going to be preceded by 3-01-150. For the moment satellite identification sequences are excluded and they will be considered after consultation with CGMS. @wmo-im/tt-tdcf please review the text for the note proposed in the document WSI_TABLE_D_306_I2_2019_en.docx

SibylleK commented 4 years ago

Dear @efucile, here is my review of the document: WSI_TABLE_D_306_I2_2019_en.docx As I do not think, that the "shall be preceded by 3-01-150 WIGOS Identifier" in the element description column will work, these entries are withdrawn. In addition I suppose it is far to early for the wording "shall", as is would mean all existing BUFR messages will become "not conform with the Manual on Codes".

efucile commented 4 years ago

Dear @SibylleK thank you for your review. I think that this is not enough as we need to mandate the use of WSI for new stations given that TSI cannot be used. We need to find a good wording to avoid forcing existing stations to change their data and allow new stations to use WSI.

SimonElliottEUM commented 4 years ago

@efucile Concerning the imposition of 3-01-150 before the satellite sequence, 3-01-041, 3-01-043, etc, if this is decided to be necessary, then I would recommend a suitable text with Common Code Table C-5 (which gives the values to be used for 0-01-007). It would be a practical place to define the use of 3-01-150 in the case of satellite data

efucile commented 4 years ago

@SimonElliottEUM at the moment is not urgent to introduce WSI for satellites as the schema and its application are not ready yet. I think we should focus on surface data

jitsukoh commented 3 years ago

@wmo-im/tt-tdcf This is the proposal to add a new note to the WSI sequence (3 01 150) in Table D, based on Sibylle's proposal and made small editorial changes (deleted the new acronym WI etc.). Proposed note is: (12) To encode the WIGOS Identifier in a BUFR message, the sequence 3 01 150 should be placed before BUFR/CREX sequences if not already included. The elements in the sequence shall be set to missing if they are not defined. Other Identification following the WIGOS Identifier shall have the same values as before the introduction of WIGOS Identifier or be set to missing if they are not known or defined.

I would say this change can be done through FT2021-1 and we can report INFCOM-1 in February about what we have done, but I am open to other options. I feel other changes, such as WSI for satellites and B/C regulations, will follow at the later stage, considering the readiness of both data producers and users.

efucile commented 3 years ago

@wmo-im/tt-tdcf This is the proposal to add a new note to the WSI sequence (3 01 150) in Table D, based on Sibylle's proposal and made small editorial changes (deleted the new acronym WI etc.). Proposed note is: (12) To encode the WIGOS Identifier in a BUFR message, the sequence 3 01 150 should be placed before BUFR/CREX sequences if not already included. The elements in the sequence shall be set to missing if they are not defined. Other Identification following the WIGOS Identifier shall have the same values as before the introduction of WIGOS Identifier or be set to missing if they are not known or defined.

I would say this change can be done through FT2021-1 and we can report INFCOM-1 in February about what we have done, but I am open to other options. I feel other changes, such as WSI for satellites and B/C regulations, will follow at the later stage, considering the readiness of both data producers and users.

I don't think we should do this. This addition is only a guidance (should) and doesn't add anything to the existing circular letter. If we are not sure of what we want to mandate we should wait until we are convinced. There is also another reason that we cannot do this. The reason is that we have to mandate the use of WSI for all the stations that are not registered with the 20000/20001 because for those stations the only valid identifier is WSI (actually not only for those stations). It is wrong to propose as optional the use of the only valid identifier. I am not comfortable presenting a document to INFCOM in which we say that WSI is optional.

efucile commented 3 years ago

During the teleconference on the 10th December the Team decided that is not ready to produce regulations on WSI and this discussion will be continued next year. For the moment the planned document for INFCOM is withdrawn.

amilan17 commented 1 year ago

https://github.com/wmo-im/CCT/wiki/20.to.22.September.2023 notes:
The team briefly discussed the status of WSI at their agencies and noted that this effort has more priority than the migration from TAC. The team also noted that there are approaches documented in WMO publications for the management of WSIs and there is no need to keep this issue open.