nvs-vocabs / P06

A controlled vocabulary for units of measurement
0 stars 1 forks source link

NTR: Request for P06 Practical salinity unit #30

Open nvsvocabs opened 3 years ago

nvsvocabs commented 3 years ago

Required

Term name (PrefLabel)

Practical salinity unit

Definition

One psu equals one gram salt (Na+Cl-) per kilogram of seawater

Optional

Sources/references

https://fr.wikipedia.org/wiki/Practical_salinity_unit

Synonym or acronym (AltLabel)

psu

roy-lowry commented 3 years ago

This is an issue that has come up many times in both BODC and CF. TEOS-10 strongly recommended that the units for practical salinity should be 'dimensionless' and not 'PSU'. Frank Millero had been particularly vocal on this issues for years - see for example:

Millero, F.J. 1993. What is PSU? Oceanography 6(3):67. Available online at: http://www.coastalwiki.org/wiki/Salinity

Consequently, PSU was not included in P06 to ensure adherence to physical oceanography best practice.

dr-shorthair commented 3 years ago

Clearly Frank Millero understands what he is talking about. However, you have raised the matter of what P06 is for. Is it (a) a carefully governed set of units that 'adhere to physical oceanography best practice' and thus define the limits of what is acceptable or (b) a census of units of measurement that are in use, in practice.

If it is the latter, the PSU should clearly be included, since it appears fairly widely in actual datasets.

Do the P06 units have enough information to be used for automated re-scaling and transformations? They don't look like it on the public interface at least. In which case I don't see how they can realistically claim to be rigorous enough to play role (a).

roy-lowry commented 3 years ago

P06 started life as the units controlled vocabulary for the data managed by BODC. PSU was excluded to ensure that no practical salinity data handled by BODC could be tagged with the unit 'PSU' - i.e. to ensure BODC conformed to best practice. This enforcement was carried across to SeaDataNet as a management decision and adopted by the CF community. So there is a large community in the oceanographic domain that do see P06 as fulfilling role (a). For background, Frank's objections are because practical salinity is a conductivity ratio and therefore dimensionless.

roy-lowry commented 3 years ago

Having thought about it during the day I feel a brief history lesson for the benefit of @dr-shorthair is in order.....

Once upon a time (the 1980s to be more precise) IODE developed an oceanographic data exchange format called GF3. This included significant amounts of metadata - information on platforms, stations, geographic areas plus of course parameters (what was measured). During the first half of the 1980s there was a governance authority known as IODE GETADE (group of experts on technical aspects of data exchange) chaired by Meirion Jones (former BODC Director). They put a huge amount of effort into developing the GF3 code tables - a set of lists of valid entries for many of the fields in the GF3 data model. Included in these were the GF3 parameter codes (forerunner of P01) and GF3 unit codes (forerunner of P06) . These were what we now refer to as 'controlled vocabularies' that may be defined as 'a set of values that may be used to populate a designated field in a data or metadata model'. Each collection in NVS is very much a controlled vocabulary.

Towards the end of the 80s the USA pulled the plug on UNESCO finances, which filtered down catastrophically onto the activities of GETADE. Consequently, these controlled vocabularies ceased to be governed. The result during the 1990s was mayhem. Dozens of organisations took copies of the final GF3 code tables and manipulated local copies to suit local needs - adding terms, deleting terms and even tweaking term definitions.

Around 2000 the Sea-Search project was working to set up pan-European metadata catalogues that included many of the fields from GF3 metadata. This exposed the sorry state of the controlled vocabularies, initially producing whole sets of semantically incompatible records. At a meeting in Brest it was decided that the code table mess needed to be sorted out. First stage was to re-establish governance. Initially, this was done be giving individuals responsibility for one or more controlled vocabularies. I drew the short straw, getting 'parameters' which included a set of 'what was measured' lists of differing granularity (P08, P03, P02 and P01) plus units (P06). I realised that in addition to sorting out content we were going to need tools to ensure that there was a master copy of each vocabulary that could be maintained to a high standard and made accessible to all. If this wasn't done people would simply continue to use local copies (I once likened this to Darwin's finches on the Galapagos).

To cut a long story short, NVS was the technical solution. I think the most important thing to realise about NVS is that it is made up of collections of concepts. Each collection is a controlled vocabulary - a set of concepts that may be used to populate a designated field in data or metadata models. Many data management systems use NVS in this way to actively constrain the data held within them. For example, BODC's data schemas are bound by NVS collections through referential integrity constraints and SeaDataNet data files need to get past a checking tool called Octopus that performs a host of semantic checks using live calls to NVS services.

Around 2010 I realised that the one-man governance empire I'd set up was far from healthy. Over time I'd inherited governance responsibility for virtually every SeaDataNet controlled vocabulary. Something had to be done, particularly as retirement loomed within a decade. I tried engaging bodies like the SeaDataNet Technical Task Team as a GETADE replacement and started up SeaVoX as a joint vocabulary governance authority between SeaDataNet and IODE. I wouldn't call these roaring successes as more often than not requests for discussion hit a wall of silence. These are being superseded by the GitHub vocabulary for a being set up by Gwen and Alexandra that seem to work a lot better.

Vocabulary governance is a thankless task. In life we learn by experience: get it wrong, learn lessons then do better next time. With controlled vocabularies if you get it wrong you're stuck with it. Add a term during a momentary lapse of reason and you can't delete it without breaking somebody's data/system. Deprecation and versioning provide corrective tools to some extent, but whilst they can be implemented server-side there's no guarantee that clients will be aware of them. Adding PSU to P06 may seem trivial, However, it means that for the first time in decades producers of data files in BODC and SeaDataNet will be able to include 'PSU' in the units field as they are no longer blocked by automated constraints. In my experience, as somebody can do something they will.

You mention P06 as the basis of automated re-scaling and transformations. It has certainly been thought about - many a beer has been consumed with Andrew Wolf discussing the idea of an RDF version of UDUNITS. This never came to be. The closest we have come is through the ODV software which can be configured to do a limited set of data conversions based on P06 unit codes in the data.

So, there it is. Hoper you're still awake!!

dr-shorthair commented 3 years ago

Thanks for the history Roy, which fills in details within a general story that I was already aware of.

I have two responses, one about the scope of NVS, and the other about dimensionless 'units'.

On scope: NVS has influence beyond the original BODC and even SeaDataNet. I have been an active advocate for NVS as one of the best governed controlled vocabulary services, and also responsive to requests (which are now handled transparently thanks to the GitHub issue tracker). That means that it is being used outside the original network. This is great as it enhances interoperability for more data-providers to be using the same cvs. However, it can lead to tricky situations where legacy systems which have been using definitions (such as PSU) that are not approved for NVS. It would be a huge shame to send them away if just one or two definitions cannot be found and will not be added to NVS. Perhaps a separate collection could be added for 'non-recommended variants' to enable greater adoption?

On dimensionless 'units': if I understanding the argument on PSU, it is that strictly there is no such thing as PSU. It is a conductivity-ratio (dimensionless) scaled to match grams-per-kilogram{NaCl in water}. I note that P06 already has a few scaled dimensionless 'units', such as percent, ppth, ppm, ppb, and lots of mass-per-mass, volume-per-volume and amountOfSubstance-per-amountOfSubstance variants, with different scale factors. These are found useful, even though it could be argued that they are not strictly needed. So I don't understand why another one should be excluded in principle. And what 'unit' should be used for conductivity-ratio? - presumably PPTH. In which case could there be an annotation , alt-label, or alias to lead people there when they are looking for PSU?

As you note, the formal expressivity of P06 entries is limited to text descriptions and rather generic SKOS relations. 90% of the P06 members now have a sameAs link to the corresponding QUDT representations. The latter use a proper units ontology, which supports on-the-fly transformations between units that have the same dimensionality. P06 could be re-conceived as a 'community profile' (subset) of the QUDT units catalogue.

roy-lowry commented 3 years ago

The great advantage of being a grandparent versus a parent is that when the going gets tough the problem can be delegated. Apologies @gwemon , but with this decision you're the parent!

gwemon commented 3 years ago

Thanks @roy-lowry and @dr-shorthair for the interesting exchange. It is a tricky one. I do think that it is our responsibility to avoid endorsing terms that are misused just because that usage has become common but I guess there comes a point when it might become unavoidable. Have we reached that point for PSU? i.e. has PSU become such common usage that not having it in our unit vocab becomes an hindrance. I'll think it through, consult and ponder the pros and cons. Roy, I replaced the link in your comment at the top because it was not resolving. I found the Millero (1993) text on the CoastalWiki.
For the record, in his 2010 article Millero says: "There were quite a few committee discussions about how salinity should be defined. Physicists on the panel thought of salinity as 3.5 x 10–3 or 0.035. Because most of the oceanographers considered average seawater as having a salinity of 35.0‰ or ppt, we fought to keep the average salinity as 35.000. Because the salinity is a conductivity ratio or fraction, the compromise was to define practical salinity as SP = 35.000 without the ‰ symbol. This compromise was probably a mistake. We should have kept the symbol (or added g kg –1) and thus avoided the incorrect use of the practical salinity unit (PSU) to define salinity in terms of a unit (Millero, 1993)."

roy-lowry commented 3 years ago

If Brian King is still around then having a discussion with him might provide useful guidance on whether we've reached the common usage point with PSU.

gwemon commented 3 years ago

@matdon17 did a quick scan on the issue as handled by UDUNITS and CF standard and here is what comes out:

Issue in support of it not being in UDUNITS from 2014 https://github.com/Unidata/UDUNITS-2/issues/27

This from CF standard suggesting it is not used (search for 'psu'), but a note saying things might have changed: http://cfconventions.org/faq.html

And the UDUNITS unit database doesn't seem to have any, see: https://www.unidata.ucar.edu/software/udunits/udunits-2.2.28/udunits2.html#Database

However, someone appears to have created a 'modified database' for the CF standard! https://ncas-cms.github.io/cfunits/cfunits.Units.html

We'll continue to gather views on this...