Closed keilw closed 6 years ago
@unitsofmeasurement/experts, @unitsofmeasurement/contributors Based on discussions and some code snippets (e.g. by @desruisseaux back in June) from #95 I take, that Unit
would be the best place for a LevelOfMeasurement
attribute / method like getLevel()
?
I think it should rather be in Quantity
. I'm not sure that conversion from "interval °C" to "ratio °C" make sense for example (i.e. I do not have any use case in mind where we would want to convert from Units.CELSIUS
to Units.CELSIUS_INTERVAL
). Instead, the use cases that I see are conversions that preserve the level of measurement of the quantity. For example:
We want Quantity.to(Units.CELSIUS)
to preserve automatically the level of measurement of the quantity. We do not want to force user to check the Quantity
unit in order to determine if (s)he should specify Units.CELSIUS
or Units.CELSIUS_INTERVAL
in argument to the to
method.
Level of measurement of a Quantity
are not changed by unit conversions, but by arithmetic operations applied between two quantities. For example "ratio" - "ratio" = "interval".
Well "describes the nature of information within the values assigned to variables." from the Wikipedia article sounds slightly in that direction. The question then is, where does it have to be set that is least intrusive? On the Unit
a natural place would have been when defining e.g. CELSIUS
, but if there's a valid use case for having -271.15°C as RATIO
and the same as INTERVAL
, if that is the case, we may have to find a different place but we could not do it without assuming a default level when none is provided. Or can it always be derived through operations?
This example from June 15 was under the assumption, it was more beneficial on the Unit
:
Quantity add(Quantity that) {
Unit u1 = this.getUnit();
Unit u2 = that.getUnit();
Unit uc = u1.level(u2.getLevel()); // We will convert in unit of u1, but taking in account the nature of u2 (quantity or increment).
UnitConverter c = u2.getConverterTo(uc);
newValue = this.getValue() + c.convert(that.getValue());
// The result is in unit of u1, but is it an absolute value or an increment?
// Note: following code could be factorized in a convenience method.
boolean isIncrement1 = u1.getLevel() == LevelOfMeasurement.INTERVAL;
boolean isIncrement2 = u2.getLevel() == LevelOfMeasurement.INTERVAL;
boolean isResultAnIncrement = u1 & u2;
Unit uf = u1.level(isResultAnIncrement ? LevelOfMeasurement.INTERVAL
: LevelOfMeasurement.RATIO);
return new Quantity(newValue, uf);
}
I only changed the enum name, otherwise it is like the one from June. So even with pseudocode, where is a level set? The Wikipedia description of the different levels also says "Most measurement in the physical sciences and engineering is done on ratio scales. "
, therefore it would be a hassle and not acceptable having to do something like Quantities.getQuantity(10, KILOGRAM, RATIO)
every time, if that was a place where it had to be explicitly set.
https://en.wikipedia.org/wiki/Level_of_measurement#Ratio_scale states, "The Kelvin temperature scale is a ratio scale because it has a unique, non-arbitrary zero point called absolute zero." So Quantities.getQuantity(10, KELVIN, INTERVAL)
seems to make no sense.
Or are you saying 10 K - 1 K
or 100 m - 5 m
automatically turns their level into INTERVAL
?
The idea was that 10 K - 1 K
automatically turns their level into INTERVAL
. But I need more though; on one side it is true that an ORDINAL
level of measurement for example applies better to Unit
than Quantity
. But for the particular case that we were trying to solve in #95 (i.e. the result of 1°C + 2°C), having the information associated to Quantity
instead than Unit
allow more convenient conversions as described in my previous comment. I need to think more about that…
Ok, we could also do a vote, probably running a bit longer than the one for the name of a new type, but it should not take quite as long as #95 itself ;-)
To me, vote should start only after we are done analyzing the problem, listing the choices and debated pros and cons…
I would have hoped much of this was done in #95 but no problem doing it in this ticket although it was meant as an action item. I changed it to a question. @unitsofmeasurement/experts, @unitsofmeasurement/contributors or @unitsofmeasurement/observers please (at least after the busy conference week) share your thoughts and preferences whether the Unit or Quantity should be used to apply the new LevelOfMeasurement attribute, or something else like UnitConverter, although literature mostly points to either the unit or quantity (sometimes in a slightly different context also called Measure or Measurement) This is a good overview: http://www.indiana.edu/~educy520/sec5982/week_3/measurement_rsm.pdf Here is another source: https://math.tutorvista.com/statistics/scales-of-measurement.html
One difficulty is that the debate on #95 and elsewhere is exploded in many comments, which make difficult to get the big picture. I would like a wiki page summarizing the current situation: what is resolved, what still need to be resolved, what are the alternatives with pros and cons. The difference with issue tracker is that agreement result in the wiki page being updated and kept short, as opposed to a comments added in a long, tedious to follow, thread.
If it's just for decision making, then issues like this one are just like a Wiki, too. And after an API decision was made, that information is normally not needed any more. Creating a Wiki that helps downstream users and projects to make use of those new features, sure, we can't have enough of that, so please let us not just put arguments for a particular vote or decision into a Wiki where it has little value later on. I spoke to @kaikreuzer at Eclipse IoT WG meeting on Monday. And he confirmed, they have a workaround right now in SmartHome. Real life experience from a project like theirs is also welcome. They certainly won't change the API but the way they decide how to calculate things differently should help to inspire the standard so it's useful to their and other solutions. There is of course a Wiki page here: https://github.com/unitsofmeasurement/unit-api/wiki/Arithmetic-operations-on-Quantity so if you could make example cases there to refine the problem based on the newly created LevelOfMeasurement, that would be a good enhancement. Creating another page just for this ticket may be a bit confusing. https://github.com/unitsofmeasurement/unit-api/wiki/Arithmetic-operations-on-Quantity#8-how-to-reduce-surprises-for-users already hints on this new feature, so it could be added as a new paragraph there. Adding a whole new page, not sure, if that adds value, maybe start there, if the number of arguments and options became too many it could always be refactored into a separate page.
This article http://psych.colorado.edu/~carey/courses/psyc5741/handouts/Measurement%20Scales.pdf is quite explicit, e.g.
Units of time (msec, hours), distance and length (cm, kilometers), weight (mg, kilos), and volume (cc) are all ratio scales.
Also interesting on the Interval scale (level)
As a result, one can add and subtract values on an interval scale, but one cannot multiply or divide units
Therefore it would be an issue if 100 m - 5 m
suddenly became INTERVAL
and one could no longer multiply or divide the 95 m
by another quantity.
This article refers to the level as MeasurementScale
btw, which would be an alternate name for that enum. It is fairly common, but with 606.000 Google for the exact term "measurement level" (in quotes) compared to 22.800.000 for "level of measurement", I guess we don't have to revisit or reopen #130 (unless there are serious objections) and stick to the most common term also used by the Wikipedia page.
https://www.questionpro.com/blog/nominal-ordinal-interval-ratio/
and https://www.questionpro.com/blog/ratio-scale-vs-interval-scale/
also provide great explanations, examples and comparison between two of them (INTERVAL
and RATIO
are the most common to use with numeric values), but none of them show evidence, that CELSIUS
or FAHRENHEIT
could be both RATIO
and INTERVAL
.
Those who know such requirements, cases or sources please quote them here.
Let try to summarize:
My conclusion (for now): Unit
seems a natural place for LevelOfMeasurement
: gender is NOMINAL
, Beaufort wind scale is ORDINAL
, Celsius degree is INTERVAL
and Kelvin is RATIO
. However while useful, level of measurements used that way do not resolve well the #95 problem: even if Celsius degrees is an INTERVAL
units, it can be used for both measurements and intervals. Attempt to distinguish those two cases with two different CELSIUS
units would force us to define a "Celsius as ratio scale" unit, which seems wrong. I think we rather need a property in Quantity
for telling us whether the quantity is a measurement or an interval, keeping in mind that:
So I think that LevelOfMeasurement
in Unit
and "measurement or interval" property in Quantity
are complementary, and that for fixing #95 the important one is the later.
I would suggest not changing current units definition ( "scaled dimensions") currently supporting different physical models (e.g relativistic). But to add the "level of measurement" property to the quantity/measurement itself. Make sense as the name "level of measurement" indicates :)
Yes, I agree that for #95 purpose the information is more useful in Quantity
. My issue is that in such case, "ratio" may not be an appropriate name for a measurement in °C. In other words, I think that LevelOfMeasurement
fits well in Unit
but is not exactly what we need for #95.
Hello Martin, what do you mean by ”fits well in Units”?
Each unit can be associated to exactly one level of measurement. A Beaufort wind scale unit can be associated to ORDINAL
level of measurement. Celsius unit can be associated to INTERVAL
, most other units can be associated to RATIO
, etc.
We may consider that the level of measurement of a unit does not change. A "Beaufort wind scale" unit can not be upgraded from ORDINAL
to INTERVAL
for example, because an increase in wind speed from Beaufort number 2 to number 4 is not twice the increase in wind speed from Beaufort number 2 to number 3. Celsius unit can not be upgraded from INTERVAL
to RATIO
because the amount of heat at 4°C is not twice the amount of heat at 2°C. So we can see LevelOfMeasurement
as a useful Unit
property, but with a fixed value for each unit. This is nice and clean, but implementors are already capable to get equivalent information with the current API: if unit.getConverterTo(unit.getSystemUnit()).isLinear()
returns true
, then the level of measurement is RATIO
; if false
, then the level of measurement is something else, possibly INTERVAL
. So LevelOfMeasurement
in Unit
fit well the definitions that we can see in Wikipedia and other web site, but does not help much for #95 resolution.
If we put LevelOfMeasurement
in Quantity
, then we have the capability to instantiate two Quantity
with the same units but different answer to the "is it a measurement or an interval" question. This is exactly what we need for #95. But then my problem is: how do we specify that 4°C is a measurement? Do we create a Quantity
with RATIO
level of measurement? The problem is that 4°C is not twice the amount of heat of 2°C, so it does not fit the definition of ratio. Conversely if a user wants to compute the difference between two Beaufort numbers, what would the LevelOfMeasurement
of the result? It is not an INTERVAL
for the reason given above (the difference between 2 and 3 is not the same than the difference between 3 and 4), and it is no longer an ORDINAL
neither.
For resolving #95, we need to distinguish between measurements and intervals. But an interval quantity does not automatically implies LevelOfMeasurement.INTERVAL
(Beaufort wind scale example), and conversely a measurement quantity does not automatically implies LevelOfMeasurement.RATIO
(the Celsius example). Unit
level of measurement and Quantity
"measurement or interval" characteristics are closely related, but not the same. I think they are complementary (but only the later is strictly necessary for #95).
If LevelOfMeasurement
may not be used for Quantity
then why did we introduce it? There seems no need other than improving operations discussed in #95.
we need to distinguish between measurements and intervals
The term Measurement
is already taken at least in the RI and other unit frameworks e.g. in F# call what we defined as Quantity
Measure
or Measurement
.
If something really has to be added to Quantity
then only there, and let's forget about LevelOfMeasurement
. However, even the term "Interval Quantity" means something entirely different: https://www.dummies.com/art-center/music/music-theory-harmonic-and-melodic-intervals/
So what should we define where? And how does it benefit those special cases like CELSIUS
or FAHRENHEIT
while being unintrusive in all other cases. @kaikreuzer @htreu any hint, what you used in SmartHome? Did you primarily check for special units like those? If we cannot simply use the "level" of the Unit
of each Quantity
then we probably are back to something like Data Type or MeasurementType
.
This is why I wanted a wiki page. Long threads in issue tracker does not help to see the big picture.
Quantity
that are measurements and Quantity
that are intervals.This issue is about how to make the distinction needed for #95, which is a slightly different topic than identifying that this distinction was needed. The analysis work is not the same.
I'm neutral on whether we should add LevelOfMeasurement
in Unit
or not. My suggestion is that for the purpose of #95, we need only two enumeration values in Quantity
: MEASUREMENT
and INTERVAL
(or other names if there is suggestion), and that those values are not the same than RATIO
and INTERVAL
levels of measurement, even if closely related.
If it helps, and has no other dependencies, it could be best to add a static enum
like Type
, DataType
or similar directly to Quantity
. I see no problem with INTERVAL
but it would really be easy to confuse with the proposed LevelOfMeasurement
entry (which has been defined that way by literature and experts for many decades) Where are the sources that describe the difference e.g. Wikipedia or a similar article? If we defined something like that we mustn't point to our own Wiki page, we should have an official reference, whether it's Wikipedia or a specialized forum, but something that is free and safe to quote.
Since both Unit and Quantity
already use asType()
with a Class
argument, the method should not be getType()
but something else, maybegetDataType()
or getDatatype()
.
Conversely if a user wants to compute the difference between two
Beaufort
numbers, what would theLevelOfMeasurement
of the result? It is not anINTERVAL
for the reason given above (the difference between 2 and 3 is not the same than the difference between 3 and 4), and it is no longer anORDINAL
either.
@desruisseaux Then what is it in this case? If we stick to LevelOfMeasurement
unless we add some other levels we must not have an "unknown" or null level, that would be rather bad.
It is not an interval as defined by Stevens Level of Measurement. But it can be an interval as we define for a different context. The same English words have different meaning depending on the context; this is why ISO standards, Ph.D. studies, etc. begin with a definition of terms they are going to use. Or if we really feel that it may be a cause of confusion, we may call it DIFFERENCE
.
But what would you call the other one, VALUE
? MEASUREMENT
is quite confusing, there is plenty of literature that differentiates between RATIO
and INTERVAL
both being MEASUREMENT
.
SPSS actually has an enum
called MeasurementLevel
(!) https://www.ibm.com/support/knowledgecenter/en/SSLVMB_24.0.0/spss/base/dataedit_define_variable_measurement.html, but it makes no difference between RATIO
and INTERVAL
either, it calles that level SCALE
. What it does seem to do is assigning that MeasurementLevel
to a particular data entry, which would be closer to our Quantity
. I found another library on MavenCentral with an enum literally called LevelOfMeasurement
. I have to check out the source JAR and see, how it fits into their API and what they do with it. The JAR contains other elements like BaseUnit
or SIUnits
, therefore it looks like the level may be used on a Unit
, but I can't say until I saw the code in more detail. Having both even if we invented terms no other piece of software uses, looks like an overhead and source for confusion.
As the two JCP EC Members who supported this effort ever since JSR 275 (IBM and Red Hat) are soon going to be one ;-) and at least via SPSS IBM already uses this term, I guess we should try to also ask them (maybe not the actual EC reps, but they should know someone from the SPSS team) for advise.
Agree for trying to find terms used by the literature - this is the purpose of this issue. But we have to use the right definitions for the purpose we are trying to fix, which was the intent of my comment.
Here are 3 sources, one of them actually an (incubating) Apache Project which targets Machine Learning and Big Data:
While both SPSS and SystemML summarize INTERVAL
and RATIO
under a common level called SCALE
, Purifinity is closest to the definition we used so far.
Beside that, all of them have one thing in common, they apply this measurement level to a data point or metadata used to describe a measurement, not the actual unit.
This is another example for Spatial Data, should be familiar especially to @desruisseaux
https://www.e-education.psu.edu/geog160/c3_p8.html
It does not talk about an API, but "An implication of this difference is that a quantity of 20 measured at the ratio scale is twice the value of 10" also sounds quite clear about the scale meant to be on a Quantity
, so even if we ended up sticking to RATIO
and INTERVAL
only (IMO to support use cases like Big Data, Statistics and others we should still keep the 4 Stevens definitions) we should probably do this on Quantity
.
Thanks @dautelle, @desruisseaux for the constructive input. I am not sure, if we still need any Wiki page. It is not written in stone, even the name, although the one we picked or Purifinity matches the most common phrase in literature, so it seems fine.
@andi-huber, @filipvanlaenen and others, do you feel issues like https://github.com/unitsofmeasurement/indriya/issues/128 can be worked on based on the current assumption Quantity
has a getValue()
that can be either RATIO
(the default for now because @desruisseaux also said, it's the default 1.0 behavior) or INTERVAL
, or others where appropriate?
Based on only a small selection of Java or other APIs that apply the level, all of which doing so to a Quantity, "measurement" or "data" (not a Unit), I would like to resolve this. Should anybody come to a serious problem, we may revisit it, but the projects and Java APIs that use this concept would make it difficult to interact and exchange data with, if we did this much different.
@keilw: this issue has been close early again without evidence that it has been understood. Citing what other projects do does not help. There is no question that Stevens's LevelOfMeasurement
with nominal, ordinal, interval and ratio values are widely accepted. This is not the issue I was raising. The issue I was raising is that what we need for #95 may not be LevelOfMeasurement
. Do we have an answer to the two questions I asked before?
RATIO
, how do you conciliate with Steven's definition of ratio level of measurement?INTERVAL
, then again how do you conciliate with Steven's definition of interval level of measurement?There are so few projects or APIs (dealing with measurements, not just in Java) that even care about it, and still being used a lot. Those who do all apply levels that are similar to Steven's definition but e.g. SPSS does not care, if °C was an interval or not at all, it's just SCALE
there. Putting it on a Unit would be wrong. I take those who bothered to discuss it so far agreed with what others do. IBM SPSS Statistics, "the world’s leading statistical software" and other approaches for Data Science, Machine Learning or Statistic (including Java support) would have done this in a different place. That IMO is the goal of this issue. And it was answered. Asking whether those 4 levels are too many or not enough, please create a new ticket for that, so we don't have super-tickets like #95. I created #138 as a placeholder, please fill it with relevant parts. I did not see any indication, that we should have TWO places or two levels to apply, especially because having one contradict the other would lead to utter confusion. This ticket helped find ONE place (from all evidence both here and elsewhere Quantity
turned out to be the better place)
We agree to put the information in Quantity
and I'm not asking to put it in two places. I'm not debating neither whether there is too many levels or not enough. I'm questioning whether LevelOfMeasurement
as defined by Steven can address the needs of #95. While I like Steven's level of measurement a lot I would be happy to see them in the API, unfortunately I think it does not address #95 needs. Citing other software like IBM SPSS just because the words "levels of measurement" appear in their documentation does not help - we have to understand what they are using levels of measurement for and see if we are in the same situation.
Please lets focus on just two levels: INTERVAL
and RATIO
. Forget everything else for now. The two questions are:
RATIO
scale, with "ratio" as defined by Steven?INTERVAL
scale, with "interval" as defined by Steven?My answer is that no - again I love Steven's Level of Measurement definitions, but they do not apply to what we are trying to do for solving #95.
We can not said that we don't care. Being able to differentiate "interval" from "measurement" (replace "measurement" by whatever other name you like) is the critical part we need for resolving #95.
This one simply helped finding the best place for the level, so please continue in #138
Either in
Unit
orQuantity
the newLevelOfMeasurement
attribute should be applied for arithmetic decision making.Needs #130