qudt / qudt-public-repo

QUDT -Quantities, Units, Dimensions and dataTypes - public repository
Other
108 stars 69 forks source link

symbol aliases #820

Closed jimkont closed 6 months ago

jimkont commented 7 months ago

Hi, we are observing some symbols like: 10^3/uL, 10^6/mL, /24h from user inputs.

both 10^3/uL and 10^6/mL are equivalent to /pL and /24h equivalent to /d. We were thinking of modelling those as "symbol aliases": and attaching those aliases to the corresponding units. (UCUM talks about etymological equivalent symbols but it is probably not a 100% match of this. Using the term alias as a placeholder until we come up with a more accurate term.)

e.g. using qudt:ucumCodeAlias as a possible property that could be used

unit:NUM-PER-PicoL qudt:ucumCodeAlias "10^3/uL"^^qudt:UCUMcs, "10^6/mL"^^qudt:UCUMcs .
unit:PER-DAY  qudt:ucumCodeAlias "/24h"^^qudt:UCUMcs .

Would QUDT be interested in receiving such aliases as contributions? We would of course need to go through the process of naming/defining the needed ontology properties etc if you do. Otherwise, we can maintain this locally

steveraysteveray commented 7 months ago

We currently allow multiple values for most uses of the qudt:symbol property, so we don't need to distinguish between any one symbol as "primary" and others as "alias". Would it work for you to just populate multiple values?

fkleedorfer commented 7 months ago

I don't think I have seen a single case of multiple symbols. I would appreciate being able to select the 'preferred' one if there are multiple.

steveraysteveray commented 7 months ago

What about the examples provided, such as /24h equivalent to /d?

dr-shorthair commented 7 months ago

Who decides which one is "preferred"? It is probably application specific, thus outside the scope of QUDT

fkleedorfer commented 7 months ago

I am not convinced that the suggestions aren't separate units each, such as PER-24_HR. After all, if you think about a day as a 24h period, you might also want PER-(2|4|6|8|12|48|72)_HR

In the other suggested cases as well it's more like intentionally using a redundant scaling than an alternative label. If you want to use such a redundancy, it might be a good idea to encode that in your choice of unit rather than label.

Am 9. Dezember 2023 05:04:08 MEZ schrieb Simon Cox @.***>:

Who decides which one is "preferred"? It is probably application specific, thus outside the scope of QUDT

-- Reply to this email directly or view it on GitHub: https://github.com/qudt/qudt-public-repo/issues/820#issuecomment-1848215254 You are receiving this because you commented.

Message ID: @.***>

steveraysteveray commented 7 months ago

I'm leaning toward @fkleedorfer's suggestion, although I would name the unit PER-HR_24, because the 24 is really a qualifier of the HR the way I look at it.

jhodgesatmb commented 7 months ago

We have been putting numbers like that in front of the unit with no characters. So you want to change all of those to be qualifiers now? I need to go look at a bunch of examples before I say more and I cannot do that for several hours.Jack Hodges, Ph.D.Arbor StudiosOn Dec 9, 2023, at 10:16 AM, steveraysteveray @.***> wrote: I'm leaning toward @fkleedorfer's suggestion, although I would name the unit PER-HR_24, because the 24 is really a qualifier of the HR the way I look at it.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>

jimkont commented 7 months ago

thank you all for your suggestions, a couple of comments.

we already have multiple symbols per unit, e.g. in unit:PER-DAY we have the following

  qudt:symbol "/day" ;
  qudt:ucumCode "/d"^^qudt:UCUMcs ;
  qudt:ucumCode "d-1"^^qudt:UCUMcs ;
  qudt:uneceCommonCode "E91" ;

UCUM has a grammar-based approach that has different ways of encoding the same unit. If one wanted to have a preferred symbol, both would be equally valid but some heuristic could be used to take e.g. the shortest one.

I am not convinced that the suggestions aren't separate units each, such as PER-24_HR. After all, if you think about a day as a 24h period, you might also want PER-(2|4|6|8|12|48|72)_HR

Yes, in theory, one could have many combinations that could be marked as equivalent. My initial thought was to create a separate unit for those too, but then we thought of the complexity of maintaining and curating these units and considered the "alias" approach. For simple cases like the ones mentioned above (10^3/uL, 10^6/mL, /24h) an alias approach could work well. But, I agree, for other cases like /2h or 10^6/dL (which we did not yet observe) a separate unit would be a better fit.

fkleedorfer commented 6 months ago

I think there are two different cases:

  1. two units are mathematically equivalent but express different ways to look at the quantity
  2. a unit has two different names (in different contexts, maybe) but the names do not imply a difference in perspective.

Examples for 1 are all cases suggested in this thread so far (I believe) An Example for 2 would be 'thou' and 'mil', both denoting a thousandth of an inch, both being used.

I would really suggest to keep it to only one qudt:symbol value per unit. If we must, we should only have multiple values where case 2 applies. Because, arguably, I can then just select one at random and it will be valid.

jimkont commented 6 months ago

my initial suggestion was to use a separate RDF property for denoting these symbols (e.g. qudt:ucumCodeAlias) because I agree that the main qudt:symbol should have only one value.

I see your point (1) here, I am closing this issue and we will see how we will handle this internally