Enumerated types and open standards

rwshuka commented 11 years ago

First of all I apologize for my lack of participation in the standards development process to date. I had scheduling conflicts for the first couple meetings, then I was out for a couple weeks due to personal issues and I've been playing catch-up for the past week. Thank you for allowing Kaylen to stand in for me in my absence.

One of the conversations that it appears that I missed along the way was a discussion regarding whether to develop an open or a closed standard. Maybe that ship has already sailed so this may be a moot point but I wanted to at least put our position out there on the chance that it may not be to late for consideration.

Specifically I want to discuss the issue of enumerated values ("restriction" definitions in the xsd). GlobalVetLINK is in favor of a more open standard with companion documentation that enumerates "preferred" values.

My assumption is that our desire would be to define a standard that would be as widely adopted in the industry as possible. Thus we want to define a standard that doesn't discourage parties from implementing it. Our experience has been that, while it places more of a burden on us as a company to validate the data that we receive, openness encourages adoption of a standard.

Our experience has been that a closed standard discourages adoption in the following ways:

There are cases where we are interested in a senders data even if it isn't 100% compliant with a standard. There may be some sub-set of information that we need and while the whole document may not be 100% compliant it may have all or most of the information we need or are interested in for the given scenario. We would much rather receive the document and have access to the data we want then to turn it away because it isn't 100% compliant and have nothing. In such a scenario we will need to define a different standard just for that scenario thus discouraging adoption of the committee's standard.
There will be cases where "UNK" or "OTH" isn't sufficient. If the animal in question is a shark or a snake, that information is useful to us. If the sender has that information but has no way to send it to us, we will again need to define a standard just for that scenario - again discouraging adoption of the committee's standard.
There are parties that we have need to exchange information with that have development departments that are less sophisticated than our own. We've found that a closed standard makes it more difficult for us to assist these parties with their development. Schema validators generally return messages that are cryptic in nature and have required us to get more intimately involved in there development efforts than would be necessary otherwise. While a more open standard requires us to do more work initially in terms of message validation, it allows us to provide better feedback to the sender which helps them solve their own development problems. This means there is less need for us to get involved and solve their problems for them. In short, it's more work for us up front, but it pays off in the long run.

I believe I understand the reasons behind the desire for a closed standard, but our practice has been to find other ways to address those issues. I assume that one of the main objectives of having a closed standard is to control the quality of the data. Another possible way to address this goal would be to provide a "test suite" or possibly even a test web site where parties could upload files to determine their level of compliance. An example of this would be the Java programming language. In essence, Java isn't actually a language but rather an open standard. There are several vendors that provide implementations of this standard and there are programs available that will test and report on the level of compliance of each implementation.

Once again, I'm sorry for this late entry into the discussion, and if we are already beyond the point of considering a more open standard then I apologize.

mkm1879 commented 11 years ago

The most convincing argument here is, I believe, the enumerated value lists. I believe there is room to compromise there and still be a closed standard. I'll have to look at how we would deal with that. The most obvious would be to enumerate only purely structural elements where no other value would make any sense. That is the direction Michael is pushing us on Species/Breed. Other standards handle this by allowing alternate code or original text values as alternatives or in addition to the standard value list.

As to the less sophisticated business partners, I know this is a challenge but I believe it makes more sense to help them through dealing with the closed standard because then they are set. If we have N different variations on an open standard, we really have only partly solved the O(N^2) problem.

mmcgrath commented 11 years ago

We got very bogged down in this important issue during our discussions in Denver. As a first step towards resolution, I want to state that I do not believe this has to be "one size fits all" -- for example, we might decide that Species will be an enumerated list but Breed is a free-text field.

Taking Species and Breed separately:

Species

I see these options:

Enumerated simple type where the list of valid entries comes from a 'trusted' third party and is simply copied into our schema.
External reference to an external XML Schema that provides the same constraints as in option 1.
We make up our own list and encode them as enumerations
We make Species a free text field

I would vote for option 1.

Breed

I see these options:

Enumerated simple type where the list of valid entries comes from a 'trusted' third party and is simply copied into our schema.
External reference to an external XML Schema that provides the same constraints as in option 1.
We make up our own list and encode them as enumerations
We make Breed a free text field

I vote for option 4 for Breed as I don't think it is "important enough" to enough people to go the enumerations route; I would however propose we maintain a mapping dictionary for common breeds and species to establish an "opt-in" approach to coding this data. In other words, we can say that "AN"=>Angus without enumerating it - this still allows for an originator of an XML message to provide anything they want when faced with recording details of a exotic/unusual breed....

This is a major issue and we need to see input from all parties.... thanks to @rwshuka and @mkm1879 who have already weighed in with detailed thoughts...

mkm1879 commented 11 years ago

I have thought from the beginning that this would eventually prove to be the hardest part of this standard. It is a very important decision.

Michael's suggestion fits more or less with an approach I've floated over the last year or so that taxonomy be in two fields "True structured taxonomy" (what we are calling "species" here) and "Additional taxonomy detail" or something like that. For this to work right everything needed to make system logic work has to be in the first field. For ADT this might be problematic unless we have a "species" that includes all the dairy cattle breeds and another with all the non-dairy breeds.

This is NOT an easy design decision. Everyone please think deeply about this.

jconlon commented 11 years ago

Take a look at this for extending enumerations: http://www.ibm.com/developerworks/library/x-extenum/

jconlon commented 11 years ago

A suggestion: Take a look at solution 3 in the above referenced document.

If instead of using a regex to specify extended values to be distinguished with 'x:' we use a regex to specify an URI. (Note: xsd:anyURI could not be used here, because it is not a string.)

URIs themselves are schema based, you can even say they are simple modeled data types. This way any two or more partners can extend our enumerations with unique partner defined URIs (by restricting the regex) and the resulting instance documents will validate against our base scheme yet that same document instance may or may not validate in the partners extended schema.

mmcgrath commented 8 years ago

This has been stagnant for a number of years and will be closed on 2 Sept unless objections are raised. Any specific amendments should be opened as separate issues.

tracefirst / usaha_committee

Enumerated types and open standards #22

Species

Breed