Prefixes and IRIs - Githubissues

kaiiam commented 3 years ago

Prefixes

As discussed in our 4th meeting, we want to make this more general purpose than just the scope of OBO, Open Biological and Biomedical units for those communities but instead be more general purpose units on the web for anyone's use cases. Hence we will probably not go for the OBO-specific prefix <http://purl.obolibrary.org>.

Suggestions floated from @jamesaoverton @dr-shorthair @cmungall including perhaps using a w3id.org namespace url prefix. Unfortunately https://w3id.org/unit/ is taken by a French engineering university but perhaps we could try for something like https://w3id.org/units-on-the-web/? Any other suggestions?

IRIs

Also as discussed we have 2 possible options for IRIs. 1) as proposed by @jamesaoverton to use UCUM codes e.g., m.s-1 and 2) as proposed by @HajoRijgersberg to use OM style label IRIs, e.g. metre_per_second_(time).

1) has the advantage of building off the work already done by UCUM for resolving ambiguities so we woudln't need to spend time reinventing anything. Draw backs are it deviates from the SI for a few terms and can potentially get ugly for some terms.

Further exploring this idea, @jamesaoverton had suggested that UCUM code with non standard NC name characters for example [in_US], could become https://.../%5Bin_US%5D following standard URL character escaping conventions. I'm thinking that if we force the IRI to use angle brackets in .ttl format anyway, couldn't we just make use of UCUM codes as is as IRIs and just let browsers or other systems deal with it? Leads to prettier IRIs e.g. https://.../unit_[in_US].

The following example the mini ontology:

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .

<https://w3id.org/units-on-the-web/unit_L>
  a owl:NamedIndividual ;
  rdfs:label "litre"@en .

<https://w3id.org/units-on-the-web/unit_[in_US]>
  a owl:NamedIndividual ;
  rdfs:label "US Inch"@en .

is both valid .ttl format according to http://ttl.summerofcode.be/, and works in protege:

@jamesaoverton @dr-shorthair @HajoRijgersberg @ddooley @graybeal @cmungall @zhengj2007 @matentzn thoughts on 1) prefix ideas and 2) using UCUM IDs as is within <>'s for IRIs?

HajoRijgersberg commented 3 years ago

Also as discussed we have 2 possible options for IRIs. 1) as proposed by @jamesaoverton to use UCUM codes e.g., m.s-1 and 2) as proposed by @HajoRijgersberg to use OM style label IRIs, e.g. metre_persecond(time).

I have also proposed a third option: 3) use the symbols, and disambiguate the are and year multiple and submultiple symbols by adding the full name of the unit in parentheses:, e.g. ha(hectare), ha(hectoyear).

HajoRijgersberg commented 3 years ago

I guess we will run into disambiguation problems with units more often. For instance the British thermal unit. Very popular with air conditioning. There are six:

British thermal unit (39 °F)
British thermal unit (59 °F)
British thermal unit (60 °F)
British thermal unit (International Table)
British thermal unit (mean)
British thermal unit (thermochemical)

The first, second, third, and fifth all have the symbol Btu. We can tackle that by adding some disambiguation in parentheses, e.g. Btu(39°F).

(The fourth and sixth have the symbols Btu_IT and Btuth, where indicates subscript.)

jamesaoverton commented 3 years ago

@kaiiam: I'll have to double check this, but I think that https://w3id.org/units-on-the-web/unit_[in_US] is not a valid URL because of the square brackets. If I'm right that it doesn't follow the URL spec, then I can't guarantee that any given Turtle reader or RDF triplestore or whatever will process it properly. When any of those tools fails, the bug would be our fault. The escaped version is a valid URL, so if a tool doesn't accept it then that bug would be their fault (the developer).

@HajoRijgersberg: UCUM already uses parentheses in a way that's not compatible with your proposed use. To provide a viable alternative, you'll need to propose an unambiguous alternative grammar. To be better than UCUM for my needs, your grammar would have to cover all the units that UCUM covers.

UCUM provides a grammar for disambiguating SI, non-SI, and a wide range of custom unit strings. After studying UCUM for a few months, I am amazed at how much work they had to do to make this possible. I don't agree with every decision that they made, but I don't have the time or money or expertise to build an alternative to UCUM.

@kaiiam and I have working code for a subset of the UCUM grammar, with a plan to implement all of it. At that point we'll be able to accept any UCUM string and provide useful Linked Data about it. We've got a prototype and a plan to provide a web service, so that you can request an IRI which embed an UCUM code and get Linked Data about it. I think of this as "UCUM on the Web". It will solve a lot of problems that I'm facing. We have working code for the hardest parts, and a clear path to working code for all of it. I have the time and money and expertise to make this happen.

I guess that's pretty much what I said on the call yesterday.

HajoRijgersberg commented 3 years ago

To be better than UCUM for my needs, your grammar would have to cover all the units that UCUM covers.

I have to think about it; I feel I would like to take that challenge. But would you really favour mine above UCUM if I accomplished that challenge? I can imagine you would still want to conform to UCUM, since you may also have the goal to adapt to UCUM. Am I correct about that? I have to know, since taking that challenge might be... well, a real challenge (regarding - indeed - time and energy).

P.S.: I don't care so much about which kind of parentheses are used, as long as the idea of disambiguating by adding something after the unit symbol in some kind of brackets, like ha_(hectare), or using the %-codes for the parentheses.

dr-shorthair commented 3 years ago

All of these BTU variants are already in UCUM - see Table 17 in https://ucum.org/ucum.html

HajoRijgersberg commented 3 years ago

Ok thanks, I would have to check. I'm afraid now that they have also changed these symbols too much by really altering them rather than putting something behind it in (some kind of) parentheses.

kaiiam commented 3 years ago

I have to think about it; I feel I would like to take that challenge.

@HajoRijgersberg please feel free to try but the point that I think @dr-shorthair @jamesaoverton and I are making is that UCUM has put a TON of work into figuring this out so the chances of us being able to do better are probably not worth the effort we would need to spend on it.

All of these BTU variants are already in UCUM

Thanks @dr-shorthair again to the point of UCUM has more a less solved things. I think the interesting thing to do here would be to sort though all the base units in QUDT and OM and see if there are any that aren't in UCUM. Those would be the cases where we'd need to do something different or expand on it. @dr-shorthair do you know about their governance processes or adding new terms to it? If we were completely dependent on them for new base terms that could be a limitation, but if we can collaborate then it's not.

I think that https://w3id.org/units-on-the-web/unit_[in_US] is not a valid URL because of the square brackets

@jamesaoverton it would be ideal if we could do this, but if not I guess should be able to use/re-use URL escape character encoding standards to generate the appropriate string based on the input like [in_US]. I think we should start experimenting with this. Open to better suggestions.

@kaiiam and I have working code for a subset of the UCUM grammar ...

💯 exactly how I feel about this.

HajoRijgersberg commented 3 years ago

please feel free to try but the point that I think @dr-shorthair @jamesaoverton and I are making is that UCUM has put a TON of work into figuring this out so the chances of us being able to do better are probably not worth the effort we would need to spend on it.

I would only need to change a small set, right? Most effort is in running through the entire list. I'm thinking about spending the effort on that, but I am hesitating of course if you guys would like to follow UCUM anyway. Which I think/feel you would want.. So think about whether you would like me to do that. I'll think about it too in the mean time! :)

dr-shorthair commented 3 years ago

Please take a look. It is pretty much as you suggest.

Sorry Hajo, it is unclear if you have reviewed the UCUM specification, since comments like this are speculative rather than reflecting what UCUM actually contains.

kaiiam commented 3 years ago

I would only need to change a small set, right?

@HajoRijgersberg Again please feel free to try, but I guarenty you that the first think you'll try will be changing ar back to a. Then you'll run into the conflict of not being able to generate Petaare because Pa is suppose to be Pascal. How do you square that circle while confiming strictly to the SI? Answer is we can't. Hence likely why UCUM changed a to ar. Year a isn't crossable with metric prefixes hence why in UCUM it can be a (which I can't even find as offically being the symbol for year in the SI brochure).

Our point is UCUM already did an amazing job figuring all this out the chances of us doing better are slim. Check out this from the UCUM spec with the list of ambiguities that if I understand correctly then resolved.

HajoRijgersberg commented 3 years ago

Petaare will then have the symbol Pa(petaare) and pascal will have the symbol Pa(pascal). The symbols are not changed this way, only disambiguated. In practice, by the way, people do use prefixes with year!

Is Table 25 the entire table of conflicts? I don't see hectare for example.

HajoRijgersberg commented 3 years ago

Please take a look. It is pretty much as you suggest.

I have the feeling that you guys want to use all UCUM symbols as they are anyway...

Sorry Hajo, it is unclear if you have reviewed the UCUM specification, since comments like this are speculative rather than reflecting what UCUM actually contains.

I haven't reviewed UCUM. But since they have changed the are symbol, I'm concerned. But I think I'm gonna leave it. You want to use the UCUM symbols as they are anyway.

kaiiam commented 3 years ago

You want to use the UCUM symbols as they are anyway.

That's pretty clearly what @jamesaoverton @dr-shorthair and I are voting for.

I think the next step if to figure out how to make IRIs from UCUM codes. Simplest case would be something like https://.../unit_[in_US] if that works, else some uglier character escaping version of that e.g. <https://.../%5Bin_US%5D>. If that doesn't work/is too ugly, then we maybe consider making our own IRI codes with NC names like @jamesaoverton original idea something like u_in__US_, and finally if not just go with full opaque IDs? or english labels as IRIs (as in OM) but I'd prefer not do that. @jamesaoverton thoughts?

HajoRijgersberg commented 3 years ago

That's pretty clearly what @jamesaoverton @dr-shorthair and I are voting for.

Clear!

kaiiam commented 3 years ago

As @dr-shorthair said:

The URIs for unit definitions must be unambiguous. But formally they are opaque, so they do not need to follow any existing standard. The normal OBO convention is to use meaningless numbers in URIs. But since we want to generate URIs for units dyamically we need a predictable algorithm.

AFAICT @jamesaoverton and @kaiiam proposal is to use UCUM because it is unambiguous, generative, and covers a large range of conventional units, but not because it is superior in any other way compared with (for example ) SI. i.e. UCUM provides a convenient recipe for generating URIs. That's all.

Echoing his point, we'll need a strategy to consistently generate dynamic IDs. Unfortunately SI codes have ambiguities, UCUM codes don't so if we are going to use codes for IRIs then UCUM is a better solution the the SI codes. If not, we could use English labels as IRIs (like in OM) but I'd prefer not to as their more characters and non-universal.

HajoRijgersberg commented 3 years ago

Unfortunately SI codes have ambiguities, UCUM codes don't so if we are going to use codes for IRIs then UCUM is a better solution the the SI codes.

I would agree if you stated easier or something like that, but not better because UCUM deviates from the SI.. Indeed we can't do anything about the labels having more characters. We can do something about the non-universality of the English language by creating - in theory - IRIs in all languages that exist. I’m not saying we can accomplish that in limited time/energy.., so that could only be a (very) long-term goal.

kaiiam commented 3 years ago

I would agree if you stated easier or something like that, but not better because UCUM deviates from the SI..

I'm not saying UCUM is better or superior to the SI, I'm simply saying that it has resolved ambiguities where the SI hasn't and therefore UCUM codes alone can be used as is for unambiguous IRIs where SI codes can not.

IRIs in all languages that exist.

I'd prefer to have one IRI per term not have separate English and Chinese metres . That why I think labels should be annotation properties NOT IRIs. We can and should add new language labels later.

I want some function that either take as is, or transforms UCUM codes into an IRI preferably following some existing convention.

HajoRijgersberg commented 3 years ago

UCUM has resolved things, but not in an appropriate way. One can't change standard symbols (or shouldn't).

After all, I think we can better work with opaque IRIs. We don't have the symbol problem and we don't have the multiple-languages problem.

jamesaoverton commented 3 years ago

If the problem is that you want to refer to "hectare" but "unit:har" is too confusing, then I don't see how an opaque ID like "unit:1243258" is satisfactory.

The usual OBO-way of generating opaque numeric IDs isn't suitable for dynamic resolution of unlimited combinations of units, which is a feature that I want. We would have to assign numeric IDs in advance or on request, or we would need a different approach using distributed IDs, all of which are much more annoying than dealing with the small number of cases where UCUM deviates from SI.

Regarding multiple IRIs, maybe in multiple languages: Semantic Web tools work best when everyone uses the same IRI for the same thing, simply because you can use string equality to compare terms. As soon as you say that two different IRIs mean the same thing, you need to add a mapping table, keep it updated, and adapt all your queries to use it. Sometimes you can't get around this, but it has high costs. In #10 we're discussing adding synonyms, which doesn't have this problem.

kaiiam commented 3 years ago

@jamesaoverton just suggested we could use https://docs.python.org/3/library/urllib.parse.html as the transformation function, for example

>>> from urllib import parse
>>> parse.quote("[in_US]")

outputs %5Bin_US%5D. As ugly as this is, it is following the URL standard and is a way to ensure we map UCUM to a valid IRI.

Since our function already makes canonical versions of UCUM codes (A.Vand V.A both become A.V), I could add this step calling parse.quote at the end and it will generate a unique and unambiguous IRI. e.g. https://.../unit_%5Bin_US%5D for [in_US] or https://.../unit_A.V for A.Vand V.A.

I would prefer this then to maintaining a seperate UCUM to NC name mapping which would need to be manually updated/curated.

HajoRijgersberg commented 3 years ago

If the problem is that you want to refer to "hectare" but "unit:har" is too confusing, then I don't see how an opaque ID like "unit:1243258" is satisfactory.

Well, an opaque symbol doesn't give the illusion of non-opaqueness. Indeed, I'm a supporter of non-opaque IRIs, but if it leads to confusion, I think non-opaqueness is better.

The usual OBO-way of generating opaque numeric IDs isn't suitable for dynamic resolution of unlimited combinations of units, which is a feature that I want. We would have to assign numeric IDs in advance or on request, or we would need a different approach using distributed IDs, all of which are much more annoying than dealing with the small number of cases where UCUM deviates from SI.

I understand, but to my opinion not deviating from the standard is more important than that an approach is annoying to deal with...

Regarding multiple IRIs, maybe in multiple languages: Semantic Web tools work best when everyone uses the same IRI for the same thing, simply because you can use string equality to compare terms. As soon as you say that two different IRIs mean the same thing, you need to add a mapping table, keep it updated, and adapt all your queries to use it. Sometimes you can't get around this, but it has high costs.

I fully agree. I agree so much that, again, in this exercise, I think opaque IRIs are better. And let me emphasize that, in principle, I'm not a supporter of opaque IRIs, so that means something! :)

kaiiam commented 3 years ago

We have two proposals for IRIs. 1) URL lib parsed UCUM codes as non opaque ID and 2) Opaque IDs.

Proposal 1) is more straight forward to implement than proposal 2), even if it is ugly for some cases.

How would others feel about these 2 options?

jamesaoverton commented 3 years ago

URL-escaped UCUM codes are sometimes ugly, but opaque IDs are always ugly. So that design decision is clear to me, regardless of the implementation questions.

ddooley commented 3 years ago

The debate so far about the goal of bringing a comprehensive units solution to the semantic web has shown the struggle between concise representation which requires compromise between unit systems (as UCUM seems to have achieved) versus a verbose identity for each base and derived unit/symbol, with disambiguation qualifiers, carried in URL representation that potentially would allow each unit system like SI to come onto the web on its own schedule.

Historically this could be considered a question of human vs. machine comprehension, and the tactical question of adoption. Tactically, because humans are still needing to identify semantic components visually, and are still looking at urls and crafting them into their databases, I vote for non-opaque, short form URLS that the UCUM + code URL system @kaiiam & co have proposed. I believe this will gain more adoption, and be viewed historically as an adequate compromise that had a handful of variance that can be contended with via some mapping.

The UCUM + code URL system does not preclude longer form URLs for each base or derived unit. Driven by preference, I can see someone crafting a long-form representation that neatly maps to each short form, just as each short form maps to available UO, OM, and QUDT entities. But that can happen on a separate project.

HajoRijgersberg commented 3 years ago

I looked into the definitions of the symbols of British thermal unit in UCUM. Actually they solved that a lot better than they did with are:

British thermal unit at 39 °F: [Btu_39] British thermal unit at 59 °F: [Btu_59] British thermal unit at 60 °F: [Btu_60] mean British thermal unit: [Btu_m] international table British thermal unit: [Btu_IT] thermochemical British thermal unit: [Btu_th]

The postfixes after the underscore may be regarded as added disambiguation labels, leaving the unit before the underscore intact. This is exactly how I would like to see it. I would have been glad if they had done the same with the are symbol, i.e. a_are rather than ar. The former leaves the symbol 'a' intact, the latter alters it. They have chosen for the latter.

(They have additionally defined the British thermal unit with symbol [Btu], and defined it as being equal to the thermochemical British thermal unit. This is asking for trouble to my opinion.)

HajoRijgersberg commented 3 years ago

How would others feel about these 2 options?

Opaque, since we run in too serious problems with the UCUM-symbols-based non-opaque IRIs. Ugliness is less important than that.

kaiiam commented 3 years ago

Opaque, since we run in too serious problems with the UCUM-symbols-based non-opaque IRIs.

What serious problems are you referring to? are being the code ar (I've made my thoughts on that clear) and the way UCUM handels BTUs? Where [Btu] is equal to thermochemical British thermal unit instead of international table British thermal unit? I'm not a meteorologist so I don't feel qualified to comment on that one.

HajoRijgersberg commented 3 years ago

What serious problems are you referring to? are being the code ar

Indeed. Since 'a' in UCUM is year, without any disambiguation, people like Damion describes may specify 'a' where they mean are. Instead they have - without knowing - specified year. That is serious.

P.S.: And deviating from a standard is something that should be avoided in principle. 'ar' could have been defined as 'a_(are)' and we wouldn't have had a problem. Or 'a_are' or perhaps even 'a_ar'. All such disambiguations would have altered the symbol less seriously.

kaiiam commented 3 years ago

Indeed. Since 'a' in UCUM is year, without any disambiguation, people like Damion describes may specify 'a' where they mean are. Instead they have - without knowing - specified year. That is serious.

We've been over this many times now. I don't think it's worth rehashing this point any longer, yes UCUM diverges from the SI standard and you have your objections. Never the less UCUM solves the other issues we care about hence our rational of using UCUM as codes for IRIs. We can agree to disagree on this point. I'd certainly like to put it up to a vote on what our broader assembled community thinks but it seems @ddooley @jamesaoverton and I are in favor of the non-opaque, short form URLS using UCUM code + the URL lib system tranformations. @dr-shorthair or anyone else please let us know where you might stand on this.

HajoRijgersberg commented 3 years ago

We've been over this many times now.

Yes, it is not quite fun to have to repeat my worries!

jamesaoverton commented 3 years ago

Rest assured @HajoRijgersberg that we have all listened to your worries. We've thanked you for your sharing your worries. And in the end, we (or at least I) simply disagree with you.

I personally spent at least a full working day over the course of weeks trying to find a better solution to the "are" problem. That work convinced me that a truly better solution is too difficult to achieve, and that the "URLs for UCUM" proposal gets me almost everything that I want for a cost I can afford. I've taken the time to explain my reasons, in this issue, on the calls, and in the call notes. That time is precious to me, because there's a lot of other things I want to be doing with my time.

So I've listened. I disagree. I gave my reasons for disagreeing. You haven't convinced me. I haven't convinced you. That's fine. Now I'm being as clear as I can be. I won't waste anyone's time by going over this all again.

I think that those of us who are in agreement should coordinate our time and resources on the proposed solution.

ddooley commented 3 years ago

One thought Hajo to address the UCUM vs SI ambiguities issue is to have ucum stated in the purl of the resolving service, e.g. ucum.whatever.org/s-1.m or whatever.org/ucum/s-1.m . Then at a glance anyone using that will see that UCUM is in play, rather than SI directly.

HajoRijgersberg commented 3 years ago

I think that's a very good solution, Damion!

jamesaoverton commented 3 years ago

We would, of course, make it very clear that the service uses UCUM codes, and link to the UCUM documentation. But I would not want to use "ucum" in the URL without their explicit written consent.

HajoRijgersberg commented 3 years ago

Simon has short lines with UCUM. Maybe he can manage that?

HajoRijgersberg commented 3 years ago

Making it very clear is not enough, James. People will use the services separately from such statements.

dr-shorthair commented 3 years ago

And when all is said and done, 'are' is an archaic unit, almost never used in practice outside of a few local surveying communities. OTOH 'a' is used across a number of scientific applications, which is probably why UCUM chose to give that symbol there rather than the more rare usage.

Get Outlook for Androidhttps://aka.ms/AAb9ysg

From: James A. Overton @.> Sent: Thursday, April 29, 2021 6:11:59 AM To: kaiiam/UO_revamp @.> Cc: Cox, Simon (L&W, Clayton) @.>; Mention @.> Subject: Re: [kaiiam/UO_revamp] Prefixes and IRIs (#11)

We would, of course, make it very clear that the service uses UCUM codes, and link to the UCUM documentation. But I would not want to use "ucum" in the URL without their explicit written consent.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/kaiiam/UO_revamp/issues/11#issuecomment-828745310, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAEUQL54HRM26SQFC32S2ETTLBTY7ANCNFSM43U6YWRA.

HajoRijgersberg commented 3 years ago

The same goes for hectare and that's definitely not an archaic unit. And I'm not sure if are is. (And I am not sure it would make a difference if it were an archaic unit.)

ddooley commented 3 years ago

I was wary that ucum in a url might require UCUM trademark permission. But perhaps they will see it is an attractive service worthy of their support. If so, that is two wins, on both promotion and on ambiguity clarification.

graybeal commented 3 years ago

My wishes:

a generic service (and therefore IRI), not a UCUM one. This is serving a much broader purpose and has separate governance.
non-opaque IRIs (even if I do find the notation kind of opaque, like RPN ;-) )
any deterministic scheme (ideally documented, and implemented in more than one library) for encoding unacceptable characters (I think the list is determined e.g., as described in https://www.xml.com/pub/a/2001/07/25/namingparts.html, because these may be represented in XML)
alternate spellings and expressions as synonyms where feasible;
multiple language support for labels and descriptions when feasible (as annotations, not as separate IRIs);
for synonyms to resolve in the resolver service, even if they are not defined as acceptable IRIs; the resolving page can make clear the issue and the correct IRI to use. (See my comment to issue 6.) I know this is non-trivial, so maybe later…?

I don't quite understand the general concern about determinism of generated IRIs, I suspect there are already some exceptions being handled as one-offs, and clarifying notations can be deterministically listed and included by the generation algorithm. (And you could make opaque IRIs in a deterministic way. But ick.) On this premise, using clarifying expressions in case of conflicts is viable. But not with parentheses.

But really, in the end we are defining a set of codes that will be convenient to use; that have a very minimum element of potential confusion around a very small number notations that conflict with those in SI; and that are self-describing when they are resolved (so the nature of and reason for and variations from the SI standard can be made quite readily accessible.

I don't think any of this means we want to standardize around UCUM as a goal, but that may be an effect. And I'm OK with that; when SI comes out with their solution I'm sure it will be a standard that people can follow.

kaiiam commented 3 years ago

But I would not want to use "ucum" in the URL without their explicit written consent.

Agreed, but if we do get it like @ddooley said perhaps that could be a win win if we do. If/when we're ready we could present our prototype to the UCUM team and have that conversation. However I do agree with @graybeal's point that ideally we'd want a generic service (and therefore IRI), not a UCUM one even if heavily based on UCUM.

to @graybeal's other points

non-opaque IRIs

So that makes at least 4 of us in agreement on this point.

any deterministic scheme encoding unacceptable characters ...

Sounds like support for my proposal for generating non-opaque IRIs using standard URL escaping as a transformation function.

alternate spellings and expressions as synonyms where feasible

Agreed see and please feel free to add to https://github.com/kaiiam/UO_revamp/issues/10.

multiple language support for labels and descriptions when feasible (as annotations, not as separate IRIs)

Agreed I have a collaborator @aponsero willing to do French translations. It would be good to be on the lookout for others who can do Chinese, Spanish and perhaps others languages if there is interest.

general concern about determinism of generated IRIs

We want the service to be dynamic hence the need to generate unique IDs, or if someone asks for an existing term V.A when A.V exists the system will say thanks we've already got that please use ...

in the end we are defining a set of codes that will be convenient to use ...

yes and since UCUM has accomplished this better than any other system we plan to leverage it. Like you said standardizing around UCUM isn't the goal but a pragmatic solution.

when SI comes out with their solution I'm sure it will be a standard that people can follow.

Yes and if/when then do we'll write a script that autogenerates mappings to their system's IRIs and add them as another mapping layer to this system, like we've done with QUDT/OM/UO/NERC etc. @dr-shorthair invited @HajoRijgersberg as panelists to cover for him in a recent SI digital workshop and my take-home from that their time frame on doing that is quite slow. Hence in the mean time (which could be a long time) we can make this solution available. We're also going to cover non SI units, so this system would still be relevant even if a pure SI solution were to arise.

I'm going to move forward with the proposed the non-opaque, short form URLS using UCUM code + the URL lib system transformation for IRIs for our working script.

It would be nice to tackle the prefix question. Using the w3id.org namespace was suggested. Unfortunately https://w3id.org/unit/ is taken, maybe something like https://w3id.org/units-on-the-web/? I'd love to hear other suggestions.

dr-shorthair commented 3 years ago

Would https://w3id.org/units be too cheeky/confusing? (The w3id overlords are usually pretty hands-off, but they might catch and object to that.

CODATA has the DRUM task-group ('Digital Representation of Units of Measurement') so maybe https://w3id.org/drum? I could run it past them, though we don't actually need their permission.

https://w3id.org/uom (unit-of-measurement) is nice and short.

HajoRijgersberg commented 3 years ago

a very minimum element of potential confusion

John, I could react to that. Please let me know if you want that.

kaiiam commented 3 years ago

https://w3id.org/units

👍 from me.

The other 2 are OK but I prefer units. Anyone else thoughts?

How about https://w3id.org/units-of-measure/ or https://w3id.org/units-of-measurement/?

graybeal commented 3 years ago

a very minimum element of potential confusion

John, I could react to that. Please let me know if you want that.

I think it’s useful for the record, and I’m interested, so please if you’re willing.

graybeal commented 3 years ago

I’m in for /units. Reserve it! but if there’s a way to check /unit ownership let’s also check in with them after we register /units, to see what their plan for unit is. (Maybe it will be good for SI!)

HajoRijgersberg commented 3 years ago

a very minimum element of potential confusion

John, I could react to that. Please let me know if you want that.

I think it’s useful for the record, and I’m interested, so please if you’re willing.

Certainly, thanx. My point is that we cannot judge whether an element of potential confusion is (at) a very minimum. The Mars orbiter was also lost due to minimum confusion and there are other examples in history. So, we should avoid this minimum potential confusion if we can, and we can. By altering the UCUM symbols:

ar => a_are or a_ar
a => a_yr

Then, it will be prevented that someone who refers to the symbol 'a' while intending to express 'are' will mistakenly refer to year. We can't judge the consequences of such possible mistake and I think no-one can.

UCUM has disambiguated the Btu symbols much better than 'are' and 'year', by adding some disambiguation behind the unit (like in the alternatives I give for 'ar' and 'a' above). In practice four of the British thermal units below have the symbol Btu. In UCUM the following disambiguation has been added, altering the symbols less severely:

British thermal unit at 39 °F: [Btu_39] British thermal unit at 59 °F: [Btu_59] British thermal unit at 60 °F: [Btu_60] mean British thermal unit: [Btu_m] international table British thermal unit: [Btu_IT] thermochemical British thermal unit: [Btu_th]

The only thing they shouldn't have done is additionally defining a general British thermal unit with symbol [Btu], being equal to the thermochemical British thermal unit. That is asking for the same kind of trouble as 'a'.

So, we could make a few exceptions to the UCUM symbols. It's a small list (I assume), and if everyone is willing and interested I am willing to think about performing that task.

kaiiam commented 3 years ago

So, we could make a few exceptions to the UCUM symbols. It's a small list (I assume), and if everyone is willing and interested I am willing to think about performing that task.

We're not saying don't do it, just reiterating as @jamesaoverton said it has to be demonstrably at least as good as UCUM for us to consider it. Although, he's also said that it would be better to use either use UCUM as is or not at all so I'm not sure. Doing small changes like that is similar to Jame's original NC name idea, which he and I have moved past in favor of simply running UCUM codes through the URL lib transformation to deterministically generate unique non-opaque IRIs.

I reiterate that this is just the IRI, the annotation properties of SI code, label and definition will clearly indicate what the unit is, and we can put up warnings about the few divergences in the IRI codes from the SI.

kaiiam commented 3 years ago

See https://github.com/kaiiam/UO_revamp/issues/12 for voting for units, please feel free to suggest new ideas too.

kaiiam commented 3 years ago

Thanks to everyone who voted on a name so far it looks like https://w3id.org/units has the most support at the moment. I'm going to go ahead and use that for now, but it can always be changed later. We'd need to actually secure the domain before we could use it anyway, so we can revisit this later if needed.

kaiiam / UO_revamp

Prefixes and IRIs #11

Prefixes

IRIs