unitsml / schemas

UnitsML schemas
4 stars 3 forks source link

Unrefined new version for further development #11

Closed y4r9 closed 5 months ago

y4r9 commented 1 year ago

Preface: This is only a preliminary proposition for further discussion and all constructive criticism and comments are welcome. Since I have not been part of any of the previous technical committees or taken part in the discussions, I am unfamiliar with the reasons for certain design decisions. Therefore, some of my suggested changes may conflict with previous design decisions.

To avoid confusion with a published standard a temporary header was added stating that this is not a standard nor does it conform to one.

The new major version 2.0 was chosen since it introduces breaking changes. To leverage the new assert, alternative and XPath 2.0 capabilities the XML schema version was update to 1.1.

New features (short summary TODO: write documentation @y4r9):

  1. A RegulatoryBody element was added which provides the basis for domain attributes in all set types and Unit, CountedItem, Quantity, Dimension, Prefix and the new Constant elements. Reasoning: By using the domain attribute an element can be referenced to a regulatory body like an NMI. The use of a uniqueness constraint and the keyref functionality permits a validation of the association.
  2. In contrast to the previous version, the set elements can now occur unbounded times. Reasoning: This should enable grouping of the contained elements in logical sets such as by domain, by version, by type of unit etc. Additionally, the contents of a set can be constrained to only allow elements with the domain of the set element. TODO: the inheritance of the set domain attribute does not seem to be available to the assert statements.
  3. Added an additional mechanism by which RegulatoryBody, Unit, CountedItem, Quantity, Dimension, Prefix and Constant can be referenced within an instance document using composite keys. This is further restricted by applying uniqueness constraints and requiring the histories of these elements to be strictly linear with each element being derived from an explicit ancestor or constituting the initial version of the element. Timestamps were added to supplement the versioning of elements. References using a URL are still permitted to allow for some backward compatibility requiring only minor changes.
  4. Because the RootUnits, as defined by the enums, neither have a version history nor an explicit definition (same goes for the prefix enum), the derivation of units was extended beyond the concept of root units by allowing references to normal Unit elements. A Unit element without specified root units or references can implicitly be regarded as a base or root unit.
  5. To enable validation of the symbol elements a sub type for each content type was added. TODO: Most types are not yet validated except, ASCII, MathML and D-SI. Added symbol types for the proposed D-SI standard (doi: 10.5281/zenodo.3522631) and UCUM (see: https://ucum.org/). TODO: must test for uniqueness with respect to the language or remove xml:lang from D-SI and UCUM since these are language independent.
  6. Added specialized code list elements for IEC CDD (see: https://cdd.iec.ch/). TODO: add more specialized code list elements for other common standards.
  7. Added ConstantSet and Constant elements to allow for the definition of real and imaginary scalar constants (with an unlimited number of elements to accommodate quaternions etc.) which may reference Unit or CountedItem but not Quantity, because constants may be used for multiple quantities such as pi, e, electron charge and others. Furthermore, an exact attribute was added to the Constant element to allow for constants without error and uncertainty. Many constants, however, are irrational and thus cannot be presented exact in a decimal notation. Numeric precision was not taken into account, because an arbitrary math library is expected to be used. TODO: In this first step uncertainties cannot yet be associated with the Constant. One would need to add elements for symmetric, asymmetric and empirical (percentile) uncertainties and their respective distributions. Possibly add constant vectors, tensors etc. The simplistic implementation presented here should be discussed to further improve the applicability. Reasoning: Constants and their associated uncertainties need to versioned since they may change over time. This affects the reproducibility of calculations as well as the uncertainty of the derived results.
  8. Digital signatures can now be added to the UnitsML element, in order to provide trustworthy lists/databases of the contents. Reasoning: In case an NMI or another standardization organization publishes reference lists/database for use in digital calibration certificates etc. a digital signature may be required by statutory or regulatory requirements.
  9. Added a scale attribute enum to Quantity with choices of ratio, interval, ordinal and cardinal. TODO: Asserts still have to be written to enforce the restrictions.
  10. Added some basic test cases for UnitSet and Unit only. TODO: expand test set.
y4r9 commented 9 months ago

Sorry to be pushy, but are there any comments?

opoudjis commented 9 months ago

@y4r9 I'm sorry we have not had any response to date, and I have been trying to attract others attention for a little while now. I'm not a subject matter expert, so I'm not the person to address this.

This is a large amount of stuff to go through, and reviewing an entire slab of XSD against an entire previous slab of XSD is going to make people balk (and it has done). We would be much more prepared to review this if we had something much more like a diff to work from.

For my part, the changes all sound reasonable, although the two concerns I have is that this work should integrate with the work being done in https://github.com/unitsml/unitsdb , rather than reinvent it (point 5), and that integration to external identifiers be open-ended, rather than locked in to specific identifier schemas (point 6).

The way forward is going to need the leads of this activity to meet with you and work out a way forward; that will involve us (Calconnect) and NIST. We also do need a sense of who you are and who you are representing; you clearly know what you're talking about, but "y4r9" doesn't tell us much about your organisational affiliation :-)

y4r9 commented 9 months ago

@opoudjis thank you for your response. I completely understand that my pull request is rather extensive and surely there are better ways to communicate the changes. However, I did try to split the changes for “features” into separate commits, each having its own diff, but since some of the features rely on previous changes, I thought that one large pull request bundles the separate commits better than a larger number of pull requests. A prior architectural discussion would clearly be beneficial, guiding later development efforts and allowing for constructing test cases more efficiently, but unfortunately I could not find a relevant place to discuss such design decisions. To allow for changes in the design, I did check the box to allow maintainers to change the pull request retrospectively. With regard to point 5 and 6, I agree that cross references and reference types should be easy to add. It would be possible to add an “any” field which does not restrict the type of data used to link to another data source. The reasoning was to be able to validate as much as possible using the xsd without the need for external programmatic validation efforts, but this would be one of the architectural decisions. I would be happy to attend a meeting where my personal affiliation and other matters can be discussed. Just to let you know, I neither represent a large company involved in the field of metrology nor would I claim to be an expert, considering that members of NIST are involved in the TC. Please let me know what would be a way convenient for you to meet.