unitsofmeasurement / unit-api

Units of Measurement API
http://unitsofmeasurement.github.io/unit-api/
Other
183 stars 42 forks source link

Where to apply Level of Measurement? #131

Closed keilw closed 6 years ago

keilw commented 6 years ago

Either in Unit or Quantity the new LevelOfMeasurement attribute should be applied for arithmetic decision making.

Needs #130

keilw commented 6 years ago

@unitsofmeasurement/experts, @unitsofmeasurement/contributors Based on discussions and some code snippets (e.g. by @desruisseaux back in June) from #95 I take, that Unit would be the best place for a LevelOfMeasurement attribute / method like getLevel()?

desruisseaux commented 6 years ago

I think it should rather be in Quantity. I'm not sure that conversion from "interval °C" to "ratio °C" make sense for example (i.e. I do not have any use case in mind where we would want to convert from Units.CELSIUS to Units.CELSIUS_INTERVAL). Instead, the use cases that I see are conversions that preserve the level of measurement of the quantity. For example:

We want Quantity.to(Units.CELSIUS) to preserve automatically the level of measurement of the quantity. We do not want to force user to check the Quantity unit in order to determine if (s)he should specify Units.CELSIUS or Units.CELSIUS_INTERVAL in argument to the to method.

Level of measurement of a Quantity are not changed by unit conversions, but by arithmetic operations applied between two quantities. For example "ratio" - "ratio" = "interval".

keilw commented 6 years ago

Well "describes the nature of information within the values assigned to variables." from the Wikipedia article sounds slightly in that direction. The question then is, where does it have to be set that is least intrusive? On the Unit a natural place would have been when defining e.g. CELSIUS, but if there's a valid use case for having -271.15°C as RATIO and the same as INTERVAL, if that is the case, we may have to find a different place but we could not do it without assuming a default level when none is provided. Or can it always be derived through operations?

This example from June 15 was under the assumption, it was more beneficial on the Unit:

Quantity add(Quantity that) {
    Unit u1 = this.getUnit();
    Unit u2 = that.getUnit();
    Unit uc = u1.level(u2.getLevel());    // We will convert in unit of u1, but taking in account the nature of u2 (quantity or increment).
    UnitConverter c = u2.getConverterTo(uc);
    newValue = this.getValue() + c.convert(that.getValue());

    // The result is in unit of u1, but is it an absolute value or an increment?
    // Note: following code could be factorized in a convenience method.
    boolean isIncrement1 = u1.getLevel() == LevelOfMeasurement.INTERVAL;
    boolean isIncrement2 = u2.getLevel() == LevelOfMeasurement.INTERVAL;
    boolean isResultAnIncrement = u1 & u2;
    Unit uf = u1.level(isResultAnIncrement ? LevelOfMeasurement.INTERVAL
                                           : LevelOfMeasurement.RATIO);

    return new Quantity(newValue, uf);
}

I only changed the enum name, otherwise it is like the one from June. So even with pseudocode, where is a level set? The Wikipedia description of the different levels also says "Most measurement in the physical sciences and engineering is done on ratio scales. ", therefore it would be a hassle and not acceptable having to do something like Quantities.getQuantity(10, KILOGRAM, RATIO) every time, if that was a place where it had to be explicitly set.

https://en.wikipedia.org/wiki/Level_of_measurement#Ratio_scale states, "The Kelvin temperature scale is a ratio scale because it has a unique, non-arbitrary zero point called absolute zero." So Quantities.getQuantity(10, KELVIN, INTERVAL) seems to make no sense.
Or are you saying 10 K - 1 K or 100 m - 5 m automatically turns their level into INTERVAL?

https://www.isi.edu/~ulf/amr/lib/popup/quantity-types.html

desruisseaux commented 6 years ago

The idea was that 10 K - 1 K automatically turns their level into INTERVAL. But I need more though; on one side it is true that an ORDINAL level of measurement for example applies better to Unit than Quantity. But for the particular case that we were trying to solve in #95 (i.e. the result of 1°C + 2°C), having the information associated to Quantity instead than Unit allow more convenient conversions as described in my previous comment. I need to think more about that…

keilw commented 6 years ago

Ok, we could also do a vote, probably running a bit longer than the one for the name of a new type, but it should not take quite as long as #95 itself ;-)

desruisseaux commented 6 years ago

To me, vote should start only after we are done analyzing the problem, listing the choices and debated pros and cons…

keilw commented 6 years ago

I would have hoped much of this was done in #95 but no problem doing it in this ticket although it was meant as an action item. I changed it to a question. @unitsofmeasurement/experts, @unitsofmeasurement/contributors or @unitsofmeasurement/observers please (at least after the busy conference week) share your thoughts and preferences whether the Unit or Quantity should be used to apply the new LevelOfMeasurement attribute, or something else like UnitConverter, although literature mostly points to either the unit or quantity (sometimes in a slightly different context also called Measure or Measurement) This is a good overview: http://www.indiana.edu/~educy520/sec5982/week_3/measurement_rsm.pdf Here is another source: https://math.tutorvista.com/statistics/scales-of-measurement.html

desruisseaux commented 6 years ago

One difficulty is that the debate on #95 and elsewhere is exploded in many comments, which make difficult to get the big picture. I would like a wiki page summarizing the current situation: what is resolved, what still need to be resolved, what are the alternatives with pros and cons. The difference with issue tracker is that agreement result in the wiki page being updated and kept short, as opposed to a comments added in a long, tedious to follow, thread.

keilw commented 6 years ago

If it's just for decision making, then issues like this one are just like a Wiki, too. And after an API decision was made, that information is normally not needed any more. Creating a Wiki that helps downstream users and projects to make use of those new features, sure, we can't have enough of that, so please let us not just put arguments for a particular vote or decision into a Wiki where it has little value later on. I spoke to @kaikreuzer at Eclipse IoT WG meeting on Monday. And he confirmed, they have a workaround right now in SmartHome. Real life experience from a project like theirs is also welcome. They certainly won't change the API but the way they decide how to calculate things differently should help to inspire the standard so it's useful to their and other solutions. There is of course a Wiki page here: https://github.com/unitsofmeasurement/unit-api/wiki/Arithmetic-operations-on-Quantity so if you could make example cases there to refine the problem based on the newly created LevelOfMeasurement, that would be a good enhancement. Creating another page just for this ticket may be a bit confusing. https://github.com/unitsofmeasurement/unit-api/wiki/Arithmetic-operations-on-Quantity#8-how-to-reduce-surprises-for-users already hints on this new feature, so it could be added as a new paragraph there. Adding a whole new page, not sure, if that adds value, maybe start there, if the number of arguments and options became too many it could always be refactored into a separate page.

keilw commented 6 years ago

This article http://psych.colorado.edu/~carey/courses/psyc5741/handouts/Measurement%20Scales.pdf is quite explicit, e.g.

Units of time (msec, hours), distance and length (cm, kilometers), weight (mg, kilos), and volume (cc) are all ratio scales.

Also interesting on the Interval scale (level)

As a result, one can add and subtract values on an interval scale, but one cannot multiply or divide units

Therefore it would be an issue if 100 m - 5 m suddenly became INTERVAL and one could no longer multiply or divide the 95 m by another quantity.

This article refers to the level as MeasurementScale btw, which would be an alternate name for that enum. It is fairly common, but with 606.000 Google for the exact term "measurement level" (in quotes) compared to 22.800.000 for "level of measurement", I guess we don't have to revisit or reopen #130 (unless there are serious objections) and stick to the most common term also used by the Wikipedia page.

https://www.questionpro.com/blog/nominal-ordinal-interval-ratio/ and https://www.questionpro.com/blog/ratio-scale-vs-interval-scale/ also provide great explanations, examples and comparison between two of them (INTERVAL and RATIO are the most common to use with numeric values), but none of them show evidence, that CELSIUS or FAHRENHEIT could be both RATIO and INTERVAL. Those who know such requirements, cases or sources please quote them here.

desruisseaux commented 6 years ago

Let try to summarize:

My conclusion (for now): Unit seems a natural place for LevelOfMeasurement: gender is NOMINAL, Beaufort wind scale is ORDINAL, Celsius degree is INTERVAL and Kelvin is RATIO. However while useful, level of measurements used that way do not resolve well the #95 problem: even if Celsius degrees is an INTERVAL units, it can be used for both measurements and intervals. Attempt to distinguish those two cases with two different CELSIUS units would force us to define a "Celsius as ratio scale" unit, which seems wrong. I think we rather need a property in Quantity for telling us whether the quantity is a measurement or an interval, keeping in mind that:

So I think that LevelOfMeasurement in Unit and "measurement or interval" property in Quantity are complementary, and that for fixing #95 the important one is the later.

dautelle commented 6 years ago

I would suggest not changing current units definition ( "scaled dimensions") currently supporting different physical models (e.g relativistic). But to add the "level of measurement" property to the quantity/measurement itself. Make sense as the name "level of measurement" indicates :)

desruisseaux commented 6 years ago

Yes, I agree that for #95 purpose the information is more useful in Quantity. My issue is that in such case, "ratio" may not be an appropriate name for a measurement in °C. In other words, I think that LevelOfMeasurement fits well in Unit but is not exactly what we need for #95.

dautelle commented 6 years ago

Hello Martin, what do you mean by ”fits well in Units”?

desruisseaux commented 6 years ago

Each unit can be associated to exactly one level of measurement. A Beaufort wind scale unit can be associated to ORDINAL level of measurement. Celsius unit can be associated to INTERVAL, most other units can be associated to RATIO, etc.

We may consider that the level of measurement of a unit does not change. A "Beaufort wind scale" unit can not be upgraded from ORDINAL to INTERVAL for example, because an increase in wind speed from Beaufort number 2 to number 4 is not twice the increase in wind speed from Beaufort number 2 to number 3. Celsius unit can not be upgraded from INTERVAL to RATIO because the amount of heat at 4°C is not twice the amount of heat at 2°C. So we can see LevelOfMeasurement as a useful Unit property, but with a fixed value for each unit. This is nice and clean, but implementors are already capable to get equivalent information with the current API: if unit.getConverterTo(unit.getSystemUnit()).isLinear() returns true, then the level of measurement is RATIO; if false, then the level of measurement is something else, possibly INTERVAL. So LevelOfMeasurement in Unit fit well the definitions that we can see in Wikipedia and other web site, but does not help much for #95 resolution.

If we put LevelOfMeasurement in Quantity, then we have the capability to instantiate two Quantity with the same units but different answer to the "is it a measurement or an interval" question. This is exactly what we need for #95. But then my problem is: how do we specify that 4°C is a measurement? Do we create a Quantity with RATIO level of measurement? The problem is that 4°C is not twice the amount of heat of 2°C, so it does not fit the definition of ratio. Conversely if a user wants to compute the difference between two Beaufort numbers, what would the LevelOfMeasurement of the result? It is not an INTERVAL for the reason given above (the difference between 2 and 3 is not the same than the difference between 3 and 4), and it is no longer an ORDINALneither.

For resolving #95, we need to distinguish between measurements and intervals. But an interval quantity does not automatically implies LevelOfMeasurement.INTERVAL (Beaufort wind scale example), and conversely a measurement quantity does not automatically implies LevelOfMeasurement.RATIO (the Celsius example). Unit level of measurement and Quantity "measurement or interval" characteristics are closely related, but not the same. I think they are complementary (but only the later is strictly necessary for #95).

keilw commented 6 years ago

If LevelOfMeasurement may not be used for Quantity then why did we introduce it? There seems no need other than improving operations discussed in #95.

we need to distinguish between measurements and intervals

The term Measurement is already taken at least in the RI and other unit frameworks e.g. in F# call what we defined as Quantity Measure or Measurement.

If something really has to be added to Quantity then only there, and let's forget about LevelOfMeasurement. However, even the term "Interval Quantity" means something entirely different: https://www.dummies.com/art-center/music/music-theory-harmonic-and-melodic-intervals/

So what should we define where? And how does it benefit those special cases like CELSIUS or FAHRENHEIT while being unintrusive in all other cases. @kaikreuzer @htreu any hint, what you used in SmartHome? Did you primarily check for special units like those? If we cannot simply use the "level" of the Unit of each Quantity then we probably are back to something like Data Type or MeasurementType.

desruisseaux commented 6 years ago

This is why I wanted a wiki page. Long threads in issue tracker does not help to see the big picture.

95 identifies the cause of arithmetic inconsistencies. The outcome is that we need to distinguish between Quantity that are measurements and Quantity that are intervals.

This issue is about how to make the distinction needed for #95, which is a slightly different topic than identifying that this distinction was needed. The analysis work is not the same.

I'm neutral on whether we should add LevelOfMeasurement in Unit or not. My suggestion is that for the purpose of #95, we need only two enumeration values in Quantity: MEASUREMENT and INTERVAL (or other names if there is suggestion), and that those values are not the same than RATIO and INTERVAL levels of measurement, even if closely related.

keilw commented 6 years ago

If it helps, and has no other dependencies, it could be best to add a static enum like Type, DataType or similar directly to Quantity. I see no problem with INTERVAL but it would really be easy to confuse with the proposed LevelOfMeasurement entry (which has been defined that way by literature and experts for many decades) Where are the sources that describe the difference e.g. Wikipedia or a similar article? If we defined something like that we mustn't point to our own Wiki page, we should have an official reference, whether it's Wikipedia or a specialized forum, but something that is free and safe to quote.

Since both Unit and Quantityalready use asType() with a Classargument, the method should not be getType()but something else, maybegetDataType()or getDatatype().

keilw commented 6 years ago

Conversely if a user wants to compute the difference between two Beaufort numbers, what would the LevelOfMeasurement of the result? It is not an INTERVAL for the reason given above (the difference between 2 and 3 is not the same than the difference between 3 and 4), and it is no longer an ORDINAL either.

@desruisseaux Then what is it in this case? If we stick to LevelOfMeasurement unless we add some other levels we must not have an "unknown" or null level, that would be rather bad.

desruisseaux commented 6 years ago

It is not an interval as defined by Stevens Level of Measurement. But it can be an interval as we define for a different context. The same English words have different meaning depending on the context; this is why ISO standards, Ph.D. studies, etc. begin with a definition of terms they are going to use. Or if we really feel that it may be a cause of confusion, we may call it DIFFERENCE.

keilw commented 6 years ago

But what would you call the other one, VALUE? MEASUREMENT is quite confusing, there is plenty of literature that differentiates between RATIO and INTERVAL both being MEASUREMENT. SPSS actually has an enum called MeasurementLevel(!) https://www.ibm.com/support/knowledgecenter/en/SSLVMB_24.0.0/spss/base/dataedit_define_variable_measurement.html, but it makes no difference between RATIO and INTERVAL either, it calles that level SCALE. What it does seem to do is assigning that MeasurementLevel to a particular data entry, which would be closer to our Quantity. I found another library on MavenCentral with an enum literally called LevelOfMeasurement. I have to check out the source JAR and see, how it fits into their API and what they do with it. The JAR contains other elements like BaseUnit or SIUnits, therefore it looks like the level may be used on a Unit, but I can't say until I saw the code in more detail. Having both even if we invented terms no other piece of software uses, looks like an overhead and source for confusion.

As the two JCP EC Members who supported this effort ever since JSR 275 (IBM and Red Hat) are soon going to be one ;-) and at least via SPSS IBM already uses this term, I guess we should try to also ask them (maybe not the actual EC reps, but they should know someone from the SPSS team) for advise.

desruisseaux commented 6 years ago

Agree for trying to find terms used by the literature - this is the purpose of this issue. But we have to use the right definitions for the purpose we are trying to fix, which was the intent of my comment.

keilw commented 6 years ago

Here are 3 sources, one of them actually an (incubating) Apache Project which targets Machine Learning and Big Data:

While both SPSS and SystemML summarize INTERVAL and RATIO under a common level called SCALE, Purifinity is closest to the definition we used so far.

Beside that, all of them have one thing in common, they apply this measurement level to a data point or metadata used to describe a measurement, not the actual unit.

This is another example for Spatial Data, should be familiar especially to @desruisseaux https://www.e-education.psu.edu/geog160/c3_p8.html It does not talk about an API, but "An implication of this difference is that a quantity of 20 measured at the ratio scale is twice the value of 10" also sounds quite clear about the scale meant to be on a Quantity, so even if we ended up sticking to RATIO and INTERVAL only (IMO to support use cases like Big Data, Statistics and others we should still keep the 4 Stevens definitions) we should probably do this on Quantity.

keilw commented 6 years ago

Thanks @dautelle, @desruisseaux for the constructive input. I am not sure, if we still need any Wiki page. It is not written in stone, even the name, although the one we picked or Purifinity matches the most common phrase in literature, so it seems fine. @andi-huber, @filipvanlaenen and others, do you feel issues like https://github.com/unitsofmeasurement/indriya/issues/128 can be worked on based on the current assumption Quantity has a getValue() that can be either RATIO (the default for now because @desruisseaux also said, it's the default 1.0 behavior) or INTERVAL, or others where appropriate?

keilw commented 6 years ago

Based on only a small selection of Java or other APIs that apply the level, all of which doing so to a Quantity, "measurement" or "data" (not a Unit), I would like to resolve this. Should anybody come to a serious problem, we may revisit it, but the projects and Java APIs that use this concept would make it difficult to interact and exchange data with, if we did this much different.

desruisseaux commented 6 years ago

@keilw: this issue has been close early again without evidence that it has been understood. Citing what other projects do does not help. There is no question that Stevens's LevelOfMeasurement with nominal, ordinal, interval and ratio values are widely accepted. This is not the issue I was raising. The issue I was raising is that what we need for #95 may not be LevelOfMeasurement. Do we have an answer to the two questions I asked before?

keilw commented 6 years ago

There are so few projects or APIs (dealing with measurements, not just in Java) that even care about it, and still being used a lot. Those who do all apply levels that are similar to Steven's definition but e.g. SPSS does not care, if °C was an interval or not at all, it's just SCALE there. Putting it on a Unit would be wrong. I take those who bothered to discuss it so far agreed with what others do. IBM SPSS Statistics, "the world’s leading statistical software" and other approaches for Data Science, Machine Learning or Statistic (including Java support) would have done this in a different place. That IMO is the goal of this issue. And it was answered. Asking whether those 4 levels are too many or not enough, please create a new ticket for that, so we don't have super-tickets like #95. I created #138 as a placeholder, please fill it with relevant parts. I did not see any indication, that we should have TWO places or two levels to apply, especially because having one contradict the other would lead to utter confusion. This ticket helped find ONE place (from all evidence both here and elsewhere Quantity turned out to be the better place)

desruisseaux commented 6 years ago

We agree to put the information in Quantity and I'm not asking to put it in two places. I'm not debating neither whether there is too many levels or not enough. I'm questioning whether LevelOfMeasurement as defined by Steven can address the needs of #95. While I like Steven's level of measurement a lot I would be happy to see them in the API, unfortunately I think it does not address #95 needs. Citing other software like IBM SPSS just because the words "levels of measurement" appear in their documentation does not help - we have to understand what they are using levels of measurement for and see if we are in the same situation.

Please lets focus on just two levels: INTERVAL and RATIO. Forget everything else for now. The two questions are:

My answer is that no - again I love Steven's Level of Measurement definitions, but they do not apply to what we are trying to do for solving #95.

We can not said that we don't care. Being able to differentiate "interval" from "measurement" (replace "measurement" by whatever other name you like) is the critical part we need for resolving #95.

keilw commented 6 years ago

This one simply helped finding the best place for the level, so please continue in #138