kaiiam / UO_revamp

2 stars 0 forks source link

Are #6

Closed kaiiam closed 3 years ago

kaiiam commented 3 years ago

@dr-shorthair @HajoRijgersberg @jamesaoverton

I'm curious about whats the best way to proceed with the unit are and it's metric derivatives. According to the SI-Brochure-9.pdf hectare itself shows up in the Table 8. Non-SI units accepted for use with the SI Units.

According to the hectare wiki page the decare and are "are not officially "accepted for use", they are still used in some contexts."

In addition to both QUDT and OM having hectare and are, QUDT has Decare and OM has centiare.

Currently this script is configured to parse are as a standard prefix-crossable metric unit (behaving like metre etc). I wonder if this is correct. Perhaps we shouldn't be doing that? I was able to find mentions online of the following crosses but not any others.

kiloare
hectare
decare
deciare
centiare
milliare

Currently the script generates the following (with exceptions for dec and hect are labels). Perhaps these extra ones are wrong and we shouldn't allow for this? Or if they are valid, maybe we just need to figure out the proper names?

image

@jamesaoverton as a separate point, dar deciare was the only crossing pattern in the entire set of metric units crossed with prefixes that didn't get picked up with our current parser. It might be a mistake on my part using the parser where I wrote an exception for when the PREFIX is da see line 208 of the script. The current workaround is to have an exception for dar in the parser, but if we decide that we shouldn't allow for all metric combos of are then perhaps we won't have this problem and can remove it from the metric-prefix-crossable list?

dr-shorthair commented 3 years ago

I had not delved into are. It appears to be an archaic surveyor's measure. These days it is typically only encountered in the compound hectare. As you note, neither are nor hectare are SI units, though 'hectare' is accepted for use, for pragmatic reasons.

Officially the SI prefixes should only be used to modify SI units, though the SI rules are more fierce than common usage. But I'm not sure it is worth putting too much effort in here.

You may have noticed Guenther Schadow's rant about kg in the UCUM spec - i.e. that it is kinda mad to have a prefixed-unit as one of the base units, and I think UCUM treats gram as the base unit internally. Rule makers sometimes have to bow to world realities.

kaiiam commented 3 years ago

Also as a follow up to @HajoRijgersberg's comments in https://github.com/kaiiam/UO_revamp/issues/2, about The symbol 'ar' is not SI. There are only a few exceptions where UCUM deviates from the SI labels, this being the only one with ttl ( or NC name) appropriate strings (both ar and a are valid NC names). Other codes, e.g., plane angle minute with SI code and UCUM ' need to be changed regardless to make appropriate NC names (for IRIs).

I'm currently handling this issue by having seperate SI code and UCUM code annotation properties, along with an NC name appropriate IRI, e.g., unit_har. Please let me know @HajoRijgersberg if something like this is sufficient to address your concerns.

image
kaiiam commented 3 years ago

@dr-shorthair

Officially the SI prefixes should only be used to modify SI units, though the SI rules are more fierce than common usage.

Yes and people still use millicurie I think we should still enable that type of unit to be crossed with the metric prefixes, but not be given it an SI code annotation property and not call it a metric unit.

I think UCUM treats gram as the base unit

The way I'm handling it is trying to do the best of both, programmatically I'm treating gram as the base, but definition wise I'm treating kilogram as the base (as defined in the SI brochure).

Currently I have:

image

and

image

As the exceptions but the rest of the gram terms refering to to gram as the base, e.g.

image

Is this reasonable? This was another thing I was hoping to get input on and change if needed.

dr-shorthair commented 3 years ago

That looks good @kaiiam - well done. You are clearly learning diplomacy ;-)

HajoRijgersberg commented 3 years ago

Hi guys, good work and good discussion! My contribution:

kaiiam commented 3 years ago

Thanks @HajoRijgersberg for the feedback! Here are some quick responses we can pick up more in our next meeting on the 26th.

You can treat are as a standard prefix-crossable metric unit

OK will do but should we change any other names? E.g. exare instead of exaare I don't know if there's a precedent either way.

Apart from SI we also support other systems of units, so SI prefixes should indeed be allowed with e.g. curie

Yes I think we should have another category in the parse which include units that can be crossable with metric prefixes but aren't SI, curie inch? ... I could use some help getting a list of these.

Kilogram as base unit indeed is crazy, but officially in the SI it is. So nanogram should not be defined as 10^-9 gram, but as 10^-12 kilogram....

Would love to get a consensus option on this one I don't have a strong opinion either way. I presume it'd be good to be as SI-conformant as possible. @dr-shorthair and @jamesaoverton should we define all the prefixed-gram terms (e.g., nanogram) in reference to kilogram instead of gram then?

NC name appropriate IRI's, such as unit_har to my opinion aren't [a good idea]. I think we should use the full official names ...

Interesting points I think @HajoRijgersberg and @jamesaoverton aren't in agreement on this one, lets discuss this in our next meeting.

HajoRijgersberg commented 3 years ago

Just a quick answer, I'll come back to the other questions tonight. As to kg as base unit: now that I think about it a little longer, I think this is only relevant with regard to the definition of derived units (based on kg). I would think, for prefixed units, you can base them on gram. I did the same thing in OM.

kaiiam commented 3 years ago

I would think, for prefixed units, you can base them on gram. I did the same thing in OM.

That's how we currently have it. If people feel there is a need, we can change it by writing exceptions for all gram based units in the label producing function so that they are defined based on kg instead of g.

HajoRijgersberg commented 3 years ago

No, I definitely think that referring to gram in prefixed units is Ok. In derived units however, such as newton (= kg m s-2), definitely to SI base unit kg should be referred. But indeed, looking forward to opinions of others.

HajoRijgersberg commented 3 years ago

E.g. exare instead of exaare I don't know if there's a precedent either way.

Have never encountered (the spelling of) this and other are units. I only knew about hectare (and kilohm). Learned decare yesterday from you! :) My feeling says only the often-used units, such as hectare, are "optimized" this way. Otherwise we would have many examples of such namings, wouldn't we? Or not? E.g. microohm is not spelled differently as microhm.

we should have another category in the parse which include units that can be crossable with metric prefixes but aren't SI, curie inch? ... I could use some help getting a list of these.

Good point, let's create such a list. Millicurie, milliinch, milliröntgen

Interesting points I think @HajoRijgersberg and @jamesaoverton aren't in agreement on this one, lets discuss this in our next meeting.

Just indicating what I think is best. :) If you have good reasons to do it differently, is of course no problem with me!

kaiiam commented 3 years ago

@HajoRijgersberg thanks again for the feedback:

I definitely think that referring to gram in prefixed units is Ok.

Great!

newton (= kg m s-2), definitely to SI base unit kg should be referred.

Totally agreed for all the Named SI units I took the definitions from the SI brochure, e.g.,

image

My feeling says only the often-used units, such as hectare, are "optimized" this way. Otherwise we would have many examples of such namings

Totally agree the fewer exceptions the better, we should make a list of these as well I didn't know about kilohm. I've started an issue for this see https://github.com/kaiiam/UO_revamp/issues/8.

Millicurie, milliinch, milliröntgen

great I've also started an issue for this, see https://github.com/kaiiam/UO_revamp/issues/7

NC name appropriate IRI's ...

@jamesaoverton recently has had another idea involving just using the UCUM codes and letting the standard purl resolution mechanisms deal with the non-valid characters. I'll let him explain it in the upcoming meeting.

HajoRijgersberg commented 3 years ago

Thanx! :) All good, except I'm a bit afraid that the UCUM codes simply do not conform enough to the SI... And I do think the SI (and ISO) should set the standard, not UCUM. Hope I don't spoil the positive energy! If so, please forget about what I said! :)

kaiiam commented 3 years ago

I'm a bit afraid that the UCUM codes simply do not conform enough to the SI...

It's a fair point but there aren't that many places where UCUM diverges from the SI, from my understanding of the standard SI (base terms + special named + Non-SI but accepted for use), see the full mapping here. The following are the divergences (I guess the SI code for year also diverges but it's not in the base SI).

UCUM_symbol SI_symbol
Cel °C
Ohm Ω
AU au
deg °
u Da
'
'' ′′
ar a

All other SI codes as far as I know are the same, so I'm not particularly worried about this but would like to get others input here. We also have the SI annotation property for SI conformant terms and term combos.

kaiiam commented 3 years ago

Apologies I'd missed the last 3 and refreshed the above, so re-examine the table on github if viewing this by email.

HajoRijgersberg commented 3 years ago

Good that there is great overlap between UCUM and SI, but it should be fully... UCUM should not have deviated from SI in the first place. But please don't let my opinion spoil the fun! :)

kaiiam commented 3 years ago

@HajoRijgersberg I do see your point and we have yet to decide how we are going to proceed with IRIs for the new system. @jamesaoverton proposed using canonically ordered versions of UCUM codes, which from a technical standpoint would be the simplest solution. This is still up for discussion, however, so lets do so on Monday.

dr-shorthair commented 3 years ago

'UCUM should not have deviated from SI in the first place.'

The reasons for that are explained in detail in the UCUM Specification. It is because the SI rules do not cover the gamut of all the units used in practice, and also leave some areas of ambiguity. UCUM resolves almost all of these, while sticking very closely to SI as much as practical.

HajoRijgersberg commented 3 years ago

I understand that UCUM needed to tackle such issues, but it should not have done that by deviating from the SI. E.g. rather than changing the symbol of are (a) - to ar, it could have chosen to use the full names of units in the URIs. So not 'unit_har', but 'hectare'. For long composed units, additionally symbols could perhaps be used, but only if they are not ambiguous. I don't know the best solution (I use full unit names in OM), but it should not have been done by deviating from the SI. That's not the idea of standardization.. So it should have remained closer to the SI....

P.S.: The convention I use in OM is very clear. Using symbols in URIs will lead to problems anyway with composed units, I expect. How to deal with unit_m/s? Is a slash allowed there? unit_m_per_s is also very ugly. So that leaves 'metre_per_second'. And then we immediately run into the problem that there are two seconds, one for time and one for angle. So it should become something like 'metre_persecond(time)'. Are brackets allowed? I hope so...

kaiiam commented 3 years ago

E.g. rather than changing the symbol of are (a) - to ar

I'd refer again to the table above of deviations between SI and UCUM. au, Da and a are the only real deviations in my mind, and I'm assuming this was done to avoid ambiguities with some other units. The other deviations, e.g., changing Ω to Ohm makes perfect sense to me not to use non standard characters.

metre_persecond(time)'

Yes your logic is good and this is a reasonable way of doing it. I also think it would be possible to do with codes as long as they are well documented. I think IRIs will be our major topic of discussion for our upcoming meeting. Looking forward to discussing it.

HajoRijgersberg commented 3 years ago

au, Da and a are the only real deviations

I'm sure that this has been done with good intentions, but to avoid ambiguities a standard should not be deviated from. (Ohm may be excused because of the non-standard character Ω. That is, is it not allowed in URIs?) So the solution should be by deviating as little as possible and in my opinion that would be - if one really wants to stick as much as possible to the definitions - to add some disambiguation. So e.g. 'a (are)'. Not nice, but less deviated than 'ar'.

it would be possible to do with codes as long as they are well documented

I understand the intention. But another party will deviate in another way. So the only solution is that everyone sticks to the standard. That's what standardization is.

Looking forward to discussing it.

Me too, but I'm afraid I can only repeat what I have written so far. In the end the team may decide not to stick to the standard, it is no crime! :) ;)

graybeal commented 3 years ago

Though I'm new here, and undoubtedly a little naïve, perhaps I can add that perspective at this point. I'll just say up front that merits the detailed discussion (sorry @jamesaoverton) because yes, I could imagine this becoming a specification more important than UCUM, SI, OM, or any other existing. Building on the shoulders of giants, and all that.

It seems clear that we want unique IRIs for each 'official' unit, to accommodate as much of the unit universe as possible. Things that aren't as clear:

For maximum usability, I'd assume the answers are Yes (definitive IRI); Yes (synonym IRIs) and Yes (distinguished as such); and 'someday but not now' (for annotating relationship to sources), because that costs more, offers less, and can be derived later.

All of this context says to me that the definitive IRI does not have to follow any one standard, because (a) this resource is by definition merged, compositional, and likely dynamically resolved, and (b) there will be (at least, can be composed and resolved on the fly) synonyms for those other expansions, so that when an SI user (for example) wants to use the SI 'are' or 'hectare' that works fine, even if it isn't the canonical IRI in this system.

And then the canonical IRIs (which can perhaps include those "common forms" of units that are actually present in the wild) form a collection that can be presented as a static resource, for tools that need to look things up from a simple list. But it won't be authoritative as a formal standard.

kaiiam commented 3 years ago

I could imagine this becoming a [important] specification

Ideally that should be our goal.

Do we intend to create a canonical 'IRI to rule them all' whenever there are multiple spellings (British/American), expanded format (abbreviated vs expanded), plurals, representations from different standards (Ω vs Ohm)?

That is what @jamesaoverton and I are proposing see https://github.com/kaiiam/UO_revamp/issues/11 for discussion on how that could be done using UCUM codes as the IRIs. The synonyms and other APs I think we should include but I'd prefer that not be in the IRI (in favor of UCUM code IRIs).

Do we intend to create synonym identifiers for the non-canonical IRIs that mean exactly the same thing? If so, we'll distinguish between the canonical IRI and the synonym IRIs?

I would rather we only have one IRI and add APs like synonyms. We had originally though of having an SI set of IRIs and a UCUM set but I think thats overcomplicating things and could allow possibilities for confusion and mistakes. See and please add to https://github.com/kaiiam/UO_revamp/issues/10 for synonyms.

Do we intend to distinguish in any explicit way between the sources of the IRI representations? ...

I'd prefer to if possible have annotations for definition sources for at least the non-combinatorial units. Currently all SI units definitions are sourced from the SI-Brochure-9.pdf. Be great to get help finding def sources for non SI units.

For maximum usability ...

Yes.

All of this context says to me that the definitive IRI does not have to follow any one standard ...

We're intending the resource to be merged, compositional, and dynamically resolved. We want people to be able to compose units on the fly and get back an IRI with appropriate APs right away. We currently have links to SI and UCUM codes so yes we want them to be useful to people who are searching that way too, even if we don't end up with pure SI codes in the IRIs (which isn't actually possible AFAIK). So although the IRI might not have to follow any one standard my vote is still for having it follow UCUM. Again see https://github.com/kaiiam/UO_revamp/issues/11.

And then the canonical IRIs ... form a collection that can be presented as a static resource, for tools that need to look things up from a simple list. But it won't be authoritative as a formal standard.

Yes we want to have an official dynamic server for this but of course the latest version of that product should be made available under a CC0 or similar license for anyone to use for any purpose. We're not BIMP or NIST, and therefore not intending to set any formal standard. We're just trying to provide a better solution to this common problem.

HajoRijgersberg commented 3 years ago

Hi John, thanks for your input! :)

All of this context says to me that the definitive IRI does not have to follow any one standard, because (a) this resource is by definition merged, compositional, and likely dynamically resolved, and (b) there will be (at least, can be composed and resolved on the fly) synonyms for those other expansions, so that when an SI user (for example) wants to use the SI 'are' or 'hectare' that works fine, even if it isn't the canonical IRI in this system.

Could you explain a bit more for me? I don't see why (a) is a reason, and do not fully follow (b). Hope that's no problem for you! :)

And then the canonical IRIs (which can perhaps include those "common forms" of units that are actually present in the wild) form a collection that can be presented as a static resource, for tools that need to look things up from a simple list. But it won't be authoritative as a formal standard.

How is that represented in the IRI?

dr-shorthair commented 3 years ago

We need to be careful not to mix concerns here.

The URIs for unit definitions must be unambiguous. But formally they are opaque, so they do not need to follow any existing standard. The normal OBO convention is to use meaningless numbers in URIs. But since we want to generate URIs for units dyamically we need a predictable algorithm.

AFAICT @jamesaoverton and @kaiiam proposal is to use UCUM because it is unambiguous, generative, and covers a large range of conventional units, but not because it is superior in any other way compared with (for example ) SI. i.e. UCUM provides a convenient recipe for generating URIs. That's all.

In particular, we are not saying that the UCUM symbols for units are preferable to SI, not at all.

HajoRijgersberg commented 3 years ago

UCUM provides a convenient recipe for generating URIs. That's all.

Yes, but that should not be all. Since the IRIs are non-opaque, they should follow the standards. Changing a symbol is not following a standard, it should be disambiguated by putting something behind it in parentheses, e.g. 'ha (hectare)'. Not nice, but the symbol has not been altered.

kaiiam commented 3 years ago

again see my answer in https://github.com/kaiiam/UO_revamp/issues/11

graybeal commented 3 years ago

Well, let me blend my response to @dr-shorthair and @HajoRijgersberg here.

Yes, I think the problem is one of mixed concerns. It seems the goal was to create a unit controlled vocabulary represented by resolvable and human-recognizable IRIs that are deterministically created, and support existing (standardized or in common practice) unit bases, prefixes, and derivations in an unambiguous way whenever possible. I expect we can agree that this is a hugely challenging goal, and the italicized bits are where we are discussing inevitable compromises.

So we're understandably getting stuck on whether the resulting IRIs can:

These are social tradeoffs. Boiled down, I don't think the goal started as "create an ontology to match the SI unit specification" (we don't own it after all, so not our place). So it bothers me a lot less if the result does not entirely follow the terminology defined in the SI specification—this is not a formal standard in that same way, it's a working solution, and working compromises seem appropriate. Especially because maximizing usability also seems really important.

That was what underlies my thinking about (a). In talking about synonyms (b), I was imagining that we could mint IRIs for the synonyms. I understand this is a bad practice, and my reframing of that wish is "Can the authoritative service please resolve the synonym-based IRIs, even if they aren't the real IRIs?" (Just like Google says 'Giving you answers for Madonna instead of Madona'.)

HajoRijgersberg commented 3 years ago

Thanks, John. I could react to some points; please let me know if you want that.