unitsofmeasurement / uom-systems

Units of Measurement Systems
http://www.uom.systems
Other
36 stars 17 forks source link

UCUMFormat issues formatting multiples or submultiples #97

Closed keilw closed 7 years ago

keilw commented 7 years ago

Somewhat similar to https://github.com/unitsofmeasurement/uom-se/issues/136 many multiples and submultiples of UCUM units applied a prefix cause problems formatting via UCUMFormat.format(). Parsing seems unaffected.

keilw commented 7 years ago

@garretwilson @magnonasc What about this one?

I looked a lot closer at UOMo UCUM over Easter also adjusting things like the UCUM XML version, etc. It contains a whole API modeled extremely close to the UCUM spec and wording: https://github.com/eclipse/uomo/tree/master/bundles/org.eclipse.uomo.ucum/src/main/java/org/eclipse/uomo/ucum Including classes like Term, Symbol or Operator. The UcumUnit implements the Unit interface. At least in UOMo UCUM 0.8 (master) if possible also in the 0.7 branch I will change that to JSR 363 very soon. And the dependency on ICU4J is going to be moved into an optional bundle where conversion between the two systems is possible (e.g. for sophisticated i18n that you get for free with ICU)

It is not trivial to get what UOMo does into the slightly more typesafe SystemOfUnit based module or vice-versa. Right now UcumService does not even do a format of STERE to "st" but methods like parse() or analyze() tell the name "stere" so it's recognized. An important aspect is, that every DefinedUnit (all units that are not a BaseUnit) includes a "class" attribute specific to UCUM. And the two liter units and stere fall into different classes, so that is a key aspect of UCUM telling them apart in the XML file despite the way they are made up from m³ or dm³ are the same. Although UCUMFormat derived from EBNFUnitFormat uses a "BNF" (Backus-Naur Form) the elements are not exactly the same, so some specialties of UCUM are therefore not handled the same way. It seems easier to implement UnitFormat in UOMo UCUM because all units carry the code in themselves. A mix of model and view strictly speaking, but that is among other things how it allows to tell them apart based on its particular definition. UOMo UCUM contains its own Unit implementations with special fields that only make sense for UCUM like "class" or "property" (which reassembles the "type" or "subtype" in ICU, a runtime quantity information that JSR 363 does not usually need because it handles quantities in its own way that offers compile-time type-safety in most cases)

magnonasc commented 7 years ago

I tried to debug to see if I find something about that stere appearing in some units like LITER and GILL_BRITISH, as pointed in a test created by @keilw and I've found a weird behavior of the method UnitFormat.format(...).

Here's the complete method, and I'll write comments with the unit being formatted step-by-step to show what's the problem here.

Let's suppose we're trying to format DECI(LITER_DM3)

public Appendable format(Unit<?> unknownUnit, Appendable appendable) throws IOException {
    AbstractUnit unit = (AbstractUnit) unknownUnit;
    CharSequence symbol;
    CharSequence annotation = null;

...

    String mapSymbol = symbolMap.getSymbol(unit);
    if (mapSymbol != null) { //returns <null> to DECI(LITER_DM3), tottaly understandable
        symbol = mapSymbol;
    } else if (unknownUnit instanceof TransformedUnit) { //then DECI(LITER_DM3) pass through here
        final StringBuilder temp = new StringBuilder();
        final Unit<?> parentUnit = ((TransformedUnit) unit).getParentUnit(); //returns <m³>, why? It should be <L>
        final UnitConverter converter = unit.getConverterTo(parentUnit);
        final boolean printSeparator = !parentUnit.equals(ONE);

        format(parentUnit, temp); //then it will call this method again to <m³>, which will return <st>. (it'll be represented later)
        formatConverter(converter, printSeparator, temp); //formats the converter (CENTI) with the unit <st> to "cst", explained below.

        symbol = temp;
    } 

...

    appendable.append(symbol);

...

    return appendable; //it returns "cst"
    }

First problem: ((TransformedUnit) unit).getParentUnit() returning <m³>. After go deep into the debugging, I noticed that the method AbstractUnit.transform(...) returns the transformation applied to its "systemUnit" <m³>, instead of the unit itself.

  @Override
  public final AbstractUnit<Q> transform(UnitConverter operation) {
    AbstractUnit<Q> systemUnit = this.getSystemUnit(); //this will return <m³>
    UnitConverter cvtr = this.getSystemConverter().concatenate(operation); //this will return <DECI>, because the <L> is composed by <dm³> and will concatenate one more <DECI> converter, which will turn into <DECI(DECI(m³))>, i.e., <CENTI(m³)>
    if (cvtr.equals(AbstractConverter.IDENTITY))
      return systemUnit;
    return new TransformedUnit<>(systemUnit, cvtr); //will return <CENTI(m³)>
  }

this explains, why we get above, because the parent of CENTI(m³) i.e., CENTI(STERE) is in fact ...

We are always looking at the formatter, but it doesn't seem to be the problem, but that method of AbstractUnit does. If I have and I want to transform it in dm³, it's okay.. I keep that. But if I want uL from L, why do I need to get its base units to make the conversion? We should get the unit L itself and transform it to uL.

Can you guys see what I mean? If it's a little bit confusing, I can try to make it a little bit better, but it's because the debugging is really confusing, I needed to keep coming and going through all the code to see what's really happening to those units.

So, my simplified answer about this problem is: We should not call systemUnit = this.getSystemUnit() and return it when creating a new transformed unit everytime, as there are some special cases and they need to be treated.

keilw commented 7 years ago

It would not make a difference because

Because the system unit is unique by quantity type...

So by definition STERE and LITER or LITRE share the same SystemUnit which is (https://en.wikipedia.org/wiki/Volume) The problem is, UCUM has a somewhat twisted idea of "base units", e.g. its base unit for mass is GRAMnot KILOGRAM.

 <base-unit xmlns="" Code="g" CODE="G" dim="M">
      <name>gram</name>
      <printSymbol>g</printSymbol>
      <property>mass</property>
   </base-unit>

While Volume is not a base unit, the official SI unit for it is On the other hand UCUM again twists the foundations for units like STERE

  <unit xmlns="" Code="st" CODE="STR" isMetric="yes" class="misc">
      <name>stere</name>
      <printSymbol>st</printSymbol>
      <property>volume</property>
      <value Unit="m3" UNIT="M3" value="1">1</value>
   </unit>

Compared to LITER

 <unit xmlns="" Code="l" CODE="L" isMetric="yes" class="iso1000">
      <name>liter</name>
      <printSymbol>l</printSymbol>
      <property>volume</property>
      <value Unit="dm3" UNIT="DM3" value="1">1</value>
   </unit>

Its aliasing indeed works because the "second liter" simply points to "1 l" but while STERE is based on "1 m3" the definition for LITER is based on "1 dm3". Both claim to be "isMetric" btw, another bent and twisted UCUM attribute which may mean something like "Being derived from some unit under the SI system". AbstractUnit or TransformedUnit work perfectly fine, for any metric volume unit the system unit is "m3" you must not change that.

UOMo UCUM has its own type hierarchy modeled strictly after the UCUM system, so the abstract base class works quite different. It is possible, the only proper way to support UCUM is by defining a similar hierarchy, but that would prohibit mixing UCUM and other systems like SI or those based on it.

magnonasc commented 7 years ago

But every unit that's based in other that exists in UCUM implementation won't be formatted correctly, look at the test you created with GILL_BRITISH:

    @Test
    public void testFormatUCUMCSGill() {
    assertEquals("[gil_br]", FORMAT_CS.format(UCUM.GILL_BRITISH)); //works fine!
    }

but

    @Test
    public void testFormatUCUMCSGill() {
    assertEquals("[gil_br]", FORMAT_CS.format(MILLI(UCUM.GILL_BRITISH))); //returns <st.454609/3200000000000>
    }

The reason for that is that if you transform a unit you look only at the base unit, I mean that's not wrong to use this approach, BUT, if the unit exists on the symbol map, it should be transformed as itself, not using base units.

For example, if you have LITER and you want to transform it in DECI(LITER), the current implementation will go straight to its base unit and transform it i.e., it will be equivalent to use DECI(DECI(CUBIC_METRE)), then the formatter looks for CUBIC_METRE on the symbol map and what does it find? STERE, which is not wrong, the problem is on the transformation!

What the transformation should see before using that approach is:

  1. Does this unit I want to transform exists on the symbol map? i.e., I want to transform LITER, does LITER exist on UCUM implementation?
  2. I saw that it does! So I'll keep this unit and just apply the converter.
  3. Returns dL

What does it actually do with the current approach is:

  1. Does this unit I want to transform exists on the symbol map? i.e., I want to transform LITER, does LITER exist on UCUM implementation?
  2. I saw that it does, but it doesn't matter to me! I'll ignore it and apply the converter to its base unit.
  3. Returns cm³ which is equivalent to cst.

You're just saying that TransformedUnit is working fine, and I'm giving you examples that it's not.. That's the reason why the formatter returns L for LITER and cst for DECI(LITER).


Its aliasing indeed works because the "second liter" simply points to "1 l" but while STERE is based on "1 m3" the definition for LITER is based on "1 dm3".

And this is what I'm trying to explain.. The implementation of TransformedUnit should see if the current unit being transformed is actually a unit on the UCUM implementation or not, before chosing how it will transform it. Again, if I want to transform LITER, I actually want to transform LITER, not DECI(CUBIC_METRE).

Judging by the way it's implemented, the unit MINIM_BRITISH is equals to:

(FLUID_DRAM / 60) * (FLUID_OUNCE/8) * (GILL/5) * (PINT/4) * (QUART/2) * (GAL/4) * (4.54609 * L) * MILLI

remember, LITER is equal to dm³, and also remember that we'll get a STERE from that, so it'll give you in the end:

st*454609/7680000000000000

when we should simply look at the current unit and format itself to return m[min_br].

And this is general, not specific for volumes, you can see that with MILLI(UCUM.ACRE_BRITISH)) that returns m2.15808008005469801/3906250000000000, DECI(UCUM.CALORIE_AT_20C)) that returns J.41819/100000, and some others.

With MICRO(UCUM.NEWTON) it returns uN, it works, why? because it's a system unit.

It's not a problem that came with LITER.. But LITER was the unit that open our eyes to this problem, and we can say that it's a huge one. And I could also say that it might not be hard to fix.

magnonasc commented 7 years ago

I'm almost creating a unit test for each one of the units :(

keilw commented 7 years ago

That could solve https://github.com/unitsofmeasurement/uom-systems/issues/81 but perverting getSystemUnit() is not the way to fix it. Unfortunately UCUM has some ways of twisting things, there is a good reason UOMo UCUM created its own hierarchy.

magnonasc commented 7 years ago

I'm not saying anything about change getSystemUnit(), I'm just saying that we should change how the method transform() works, because it doesn't check for these cases I mentioned. If we don't, we'll never be able to get a dL, uL or whatever unit transformed from other formatted correctly.

magnonasc commented 7 years ago

Let me give you another simple example @keilw, because maybe you're having some difficult in understand what I mean.

Unit <?> deciLiter = DECI(UCUM.LITER);
deciLiter.getParentUnit(); //returns m³, why????? It should be LITER

In TransformedUnit there are two methods getParentUnit() and getSystemUnit(), but the prefixes use the method AbstractUnit.transform(...) and what AbstractUnit.transform(...) does is: return new TransformedUnit(unit.getSystemUnit(), unit.getSystemConverter());

and by that, we can assume that getParentUnit() will always return the same unit as getSystemUnit(), which is .. This occurs because LITER was lost in the transformation, we have no LITER anymore. In every transformation where we use the method transform() the current unit will be replaced by its base.

Again, this problem is biggest than just the formatter, and maybe there's not even a bug in the formatter, but in other implementation.

keilw commented 7 years ago

getParentUnit() is not just the systemUnit. A TransformedUnit is a combination of the parent and a UnitConverter applied. So LITER is not lost and it should not be liter as long as the conversion is correct. When I put a breakpoint on a microLiter example it gives m³ RationalConverter(1,1000000000)

magnonasc commented 7 years ago

If LITER isn't lost, how can we get it back after the transformation? or ACRE_BRITISH, or CALORIE_AT_20C, or any other unit that's based on other, BUT exists on UCUM implementation?

If we transform it I bet we might be able to make the inverse way.

keilw commented 7 years ago

getParentUnit() is not just the systemUnit. A TransformedUnit is a combination of the parent and a UnitConverter applied. So LITER is not lost and it should not be liter as long as the conversion is correct. When I put a breakpoint on a microLiter example it shows

parent=m³
converter=RationalConverter(1,1000000000)

You see the problem in https://github.com/unitsofmeasurement/uom-demos/blob/master/console/systems/ucum/src/main/java/tec/uom/demo/systems/ucum/UCUMFormatDemo.java

    UnitConverter conv = microliter.getConverterTo(UCUM.STERE);
    System.out.println(conv);
    UnitConverter conv2 = microliter.getConverterTo(Units.CUBIC_METRE);
    System.out.println(conv);

both print

RationalConverter(1,1000000000)
RationalConverter(1,1000000000)

Both converters are identical and they are correct in the conversion factor. Again UCUM twists the Metric standard in some ways like using GRAM instead of KILOGRAM. UCUMFormathas to deal with that in line 279

        if (unit.equals(SI.KILOGRAM)) {
        // A special case because KILOGRAM is a BaseUnit instead of
        // a transformed unit, for compatibility with existing SI
        // unit system.

I don't know what other twists could help to deal with other UCUM peculiarities? The waytransform() or getConverterTo() work are physically correct.

magnonasc commented 7 years ago

If you're telling me that the method transform() is working correctly, can you point me why the formatter isn't working correctly? I debugged the tests, went step-by-step until I see where the formatter is getting lost and I even showed it on a comment above... If you're saying that I'm wrong, prove that I am.

Although getParentUnit() has no javadocs, I don't believe there's a reason to create two method that does the same thing on TransformedUnit.

And for what you said:

getParentUnit() is not just the systemUnit.

It actually is, exactly the same thing. getParentUnit() returns the TransformedUnit's parent unit, which is already the systemUnit, because the method transform() make sure of that.

If you don't think it works like this, please, give me an example of that, because as I said, I went step-by-step through the formatter/transformation.

keilw commented 7 years ago

Only the format() part of UCUMFormat has problems because it was bluntly forked off EBNFUnitFormat with practically no change. parse() so far works perfectly fine, but as a matter of fact the specialized inner classes of UCUMFormat are called Parse. So it's a small subset based on a quick approach (that happened in JScience 5, I heard it will be rescued to GitHub, but I have not seen active development on java.net since 2010) that has problems. UCUM is cumbersome, take the GRAM paradoxon. Those implementations that were tailor-made to follow its core algorithms with elements like Concept, BaseUnit or DefinedUnit (that is all there is for UCUM) are also centered around parsing. I don't recall a dedicated format() or print() method in UOMo UCUM or https://github.com/jmandel/ucum.js. The problem is, UCUM effectively mixes at least 3 or more different unit systems that JSR 363 SystemOfUnits is meant to tell apart. The class attribute in the XML file says "misc" for STERE and "iso1000" for the LITER "twins". The conversion is meant to work within a system, I'm afraid UCUM as it was attemted here is more than one system and should be broken into smaller pieces. Then the ambiguity could be resolved more easily.

Conversion within a particular quantity like Volume won't work if one system has X as fundamental systemUnit and another one arbitrarily picks Y.

magnonasc commented 7 years ago

Can you give me an example of a TransformedUnit where the parentUnit is different than the systemUnit?

keilw commented 7 years ago

JavaDoc was always there since JSR 275 https://github.com/unitsofmeasurement/jsr-275/blob/master/src/main/java/javax/measure/unit/TransformedUnit.java but by using Functional interfaces e.g. for getConverter()the docs were covered. They are now there and getParentUnit() reads for what it does:

The parent unit is the untransformed unit from which this unit is derived

So for certain systems the parent unit of a TransformedUnitand system unit of the general Unitmay be the same. Although TransformedUnithas a constructor it is not meant to be called that way in most cases, but via Unit.transform().

magnonasc commented 7 years ago

We surely have different interpretations for that..

The parent unit is the untransformed unit from which this unit is derived

With that I understand that if I make the transformation DECI(LITER) the parent unit will be LITER, for me it's obvious.. Where in the javadoc for getParentUnit() says that it'll return the systemUnit?

For me, DECI(LITER) is derived from LITER, where does it say that it would be ?

keilw commented 7 years ago

That is implementation-specific, it is not always the same, but if a unit is derived from another one like LITRE is derived from CUBIC_METRE, then the base unit is not LITRE which is itself just a derived unit. You already have the answer

untransformed unit from which this unit is derived

LITRE is not an untransformed unit, thus you won't get LITRE or LITER but always something that feels like CUBIC_METRE. Speaking of which, the results of printing look somewhat different after I tweaked the STEREdefinition ;-)

magnonasc commented 7 years ago

We're not talking about base units, we're talking about derivations..

if I have m³ -> L -> uL, then uL is derived from L and L is derived from ...

If you assume that uL is directly derived from i.e., if you don't alow a TransformedUnit derived from other, I'm sorry.. but there's a huge implementation problem and you won't be able to get rid of that problem with STERE , or with MINIM_BR or any other that is derived from a TransformedUnit.

The reason why parsing works and formatting doesn't, is that the parser is able to look at those units on the middle of the "unit hierarchy", and by saying that the parentUnit of a TransformedUnit will always be an "untransformed unit" you'll never keep track on those units the same way as parser does.

keilw commented 7 years ago

I'm afraid while most of it can probably be handled on an implementation level this must be discussed with the (former) EG especially all Maintenance Leads (@leomrlima, @dautelle and myself) Taking the JavaDoc of TransferredUnit with a slight variation (DAY) you have some examples:

         CELSIUS = KELVIN.shift(273.15);
         FOOT = METRE.multiply(3048).divide(10000);
         DAY = new TransformedUnit<Time>("d", SECOND, new RationalConverter(24 * 60 * 60, 1));

All of these are derived or "transferred" units. And e.g. DAY is a TransformedUnit based on a RationalConverter. Currently the chain of conversions will be based on the SI unit SECOND, so if you create a "milliday" the parent is likely to be based on the underlying SECOND like DAY itself. That makes sense because having to keep track of every single conversion let's say s -> d -> md -> jiffy (see https://en.wikipedia.org/wiki/Unit_of_time) you start losing track very easily. Thus the conversion chain has just one root which is SECOND in this example.

On an API level http://unitsofmeasurement.github.io/unit-api/site/apidocs/javax/measure/Unit.html has several key methods. getSystemUnit() which applies to pretty much all implementations while getBaseUnits() so far only applies to to a ProductUnit only. Not to a TransformedUnit although the lines between the two implementations can be blurred in many cases (take STERE, it was considered a ProductUnit earlier, but that also did not help us much in this case;-) Should any of this be missing something, I'm afraid, there is only a future API improvement to address that. In https://groups.google.com/forum/#!topic/units-dev/5rD6eljjNlQ I discussed the idea and a drastic redefinition of the SI standard while not necessarily changing that much of the way physics works or our API gives us a chance to adjust not only to these changes by other standard bodies.

Please feel free to discuss aspects you think to be missing or not covering all scenarios in the mailing list, too (probably units-dev in this case)

desruisseaux commented 7 years ago

@keilw Could you summarize for me the issue? If possible with one or few lines of code reproducing the issue, the expected result and the actual result?

keilw commented 7 years ago

As @magnonasc mentioned earlier, especially the UCUM format implementation (which was very much modeled after the JavaCC precompilation for EBNFUnitFormat, something once called UnitFormat in JSR 275) faces issues to get to the "parent" unit. E.g. if he does m³ -> L -> uL, then uL is derived from L and L is derived from . ("u" stands for "µ" in UCUM;-)

Ideally he would like to have LITRE as "parent" of MICROLITRE, but currently it seems the chain of converters is based on one parent, not a whole chain (at least for TransformedUnit)

Is that the case, or is there a way to solve this with existing Unit implementations?

magnonasc commented 7 years ago

Heey @desruisseaux, to complete @keilw comment, it began with two units based on the same base unit STERE and LITER based on CUBIC_METRE, now every unit based on LITER will be formatted to something based on stere, if we use CUBIC_METRE as parent of the transformation, i.e., the untransformed unit, we'll never keep track of what's based on LITER and what's based on STERE.

So we get nst from MICRO(LITER), where nst means NANO(STERE). And it happens for every single unit based on LITER.

So if we had the parent being the parent of the transformation (not the system unit), in this case LITER, the formatter would simply look at the parent, which is LITER, would look to its converter, and would return uL.

The same would happen for STERE.

The TransformedUnit working only with the systemUnit doesn't allow units with the same base, that's what I'm trying to fix here.

STERE is based on CUBIC_METRE, LITER is based on DECI(CUBIC_METRE), so I bet STERE will always have preference over LITER, and that's an even worse mistake when we know that LITER must be one of the most used units.

One of the solutions of @keilw in #96 was to take STERE out as it's considered a "legacy unit" according to Wikipedia, but the implementation is based on UCUM specs, not the inverse, so take it out is something that I'm not considering at the moment.

keilw commented 7 years ago

@magnonasc @desruisseaux By changing the definition of STERE

//public static final Unit<Volume> STERE = addUnit(new ProductUnit<Volume>(METER.pow(3)));
    public static final Unit<Volume> STERE = addUnit(new TransformedUnit<Volume>("st", Units.CUBIC_METRE, MultiplyConverter.IDENTITY));

The overlap with "st" is gone, but the general issue remains that UCUMFormat was derived from exactly the same precompiler code https://github.com/unitsofmeasurement/jsr-275/blob/master/src/main/javacc/javax/measure/unit/format/UnitParser.jj that fuelled all other UnitFormat implementations, but UCUM has a different BNF than what's used for EBNFUnitFormat in uom-se.

Can't say if there is a way to get any of these JavaCC files revived (they are certainly outdated now with JSR 363) but the UCUM formatting/printing part does not reflect the way UCUM deals with units.

magnonasc commented 7 years ago

With this approach you're free from the overlap of st over L, but if you try to get DECI(STERE) you'll still get a LITER.

And it doesn't fix the formatter aswell, if you try to format DECI(LITER), you'll get cm³, how would the formatter would know that this is a LITER? The formatter would make tests to see if there's a unit that fits with its definition..

It would go through: cm³ here's our base unit

test 1 : m³E10-2 -> stE10-2 -> CENTI(STERE)

test 2 dm³: dm³E10-1 -> LE10-1 -> DECI(LITER)

So even if you put an end on the overlap of those units, you won't be able to format it properly, even if you test all the possibilities on your code, you'd get two possible outputs.

So you'd never get dL without keeping track of LITER on the transformation.

keilw commented 7 years ago

It's not about TransformedUnitalone, there are several different implementations. Although using a prefix like DECI etc. usually does end up in a transformation by a UnitConverter, but other operations can result in a ProductUnit instead.

desruisseaux commented 7 years ago

I would not be of useful advise on this topic since it is an implementation-specific issue (I'm using another implementation with a different approach). But I don't think that UCUM should be used as an authoritative source of units definition. It may be used as formatting guideline, but this is a different topic than units definition. The designers of SI system have been very careful to create a coherent system, where no overlap exists. But adding units outside the SI system (or outside the unit approved for use with SI), we are at risk of breaking this consistency.

keilw commented 7 years ago

@desruisseaux Thanks for the advise. I also just created https://github.com/unitsofmeasurement/uom-systems/issues/100 since UCUM makes clear distinctions between parsing and printing. The latter is not even strongly formalized. And most compatible implementations (including Eclipse UOMo which I took over from Grahame who still does a lot in the HL7 space) care very little about pretty printing or UI. UOMo UCUM also has its own implementation (without strong compile-time unit binding) for a similar reason. And I am not sure, if full UCUM support envisioned by those who want to use it like @garretwilson or @magnonasc can be met without something similar? Let's not forget, JSR 363 implementations like the RI or SE port contain all kinds of converters like LogConverter. They are perfectly fine to apply in a TransformedUnit but in the UCUM terminology all these are Special Units (http://unitsofmeasure.org/ucum.html#section-Special-Units-on-non-ratio-Scales) and except CELSIUS it seems, they must neither be used in algorithmic operations nor applied a prefix like KILO, DECI, etc. We don't have the notion of "special" so we cannot distinguish them right now, but IMO if you tried to apply a MetricPrefix to such special unit, the operation should fail. Again, it could be a case for at least a special sub-set of Unit implementations. Most likely on top of AbstractUnit in uom-se, but not using any of the standard units.

garretwilson commented 7 years ago

And I am not sure, if full UCUM support envisioned by those who want to use it like @garretwilson or @magnonasc can be met without something similar?

@keilw I have an important question. I am only using UCUM because it gives me a standard, consistent computer-oriented (not human-oriented) representation for common units. I don't need strange or outdated units. I just need to know what when I store and parse "microliters" it will always be "uL" and not sometimes "microliters" or "micro-liters" or "μl" or some other permutation (or "nst" or something off-the-wall, as they say).

I need to use this across several departments in a large bioinformatics company. I chose UCUM because I looked around and there seemed to be no other standard to do this --- plus it was supported by international specifications such as HL7.

So @keilw , if UCUM is not a good standard for this, which computer-oriented representation for unit serialization would you recommend? Seriously, I need to know. I mean, just saying "SI" doesn't cut it, because SI doesn't cover serialized representations, does it. So please, please tell me what I should be using instead. Can you give me a link to some specification?

keilw commented 7 years ago

I can only point to what UCUM itself provides: http://unitsofmeasure.org/trac#ImplementationSupport The maintainers of UCUM don't seem to be maintaining that page a lot (it should however be possible to help them using Trac, so if you filed an issue there the whole site is a Wiki based on Trac)

Some links like that Applet by GShadow are gone, so the links should also be removed or updated to a new URL. Leaving aside, that Applets are very much a thing of the past and may not be supported by most browsers or modern JVM versions soon. UOMo UCUM works and while the libraries and Eclipse version could be out of date now, too, OHF is still available in Eclipse archives like e.g. older projects are in Apache attic.

I actively maintain the latest versions of UOMo UCUM: https://github.com/eclipse/uomo/tree/master/bundles/org.eclipse.uomo.ucum but while cutting the ties to ICU4J (or dealing with it in a different way) and also planning to make it available in Eclipse Maven repos (for official milestone releases maybe also JCenter or MavenCentral) I don't think any compile-time typesafe approach will be met by it. Calling it is demonstrated by JUnit tests or https://github.com/eclipse/uomo/blob/master/examples/ucum/org.eclipse.uomo.examples.ucum.console/src/main/java/org/eclipse/uomo/examples/ucum/console/UcumDemo.java

final UcumService service = new UcumEssenceService(UcumDemo.class
                .getClassLoader().getResourceAsStream("ucum-essence.xml"));
System.out.println("1 mStere to l   = "
                + service.convert(BigDecimal.ONE, "ust", "l"));     
System.out.println("1 dl to m3   = "
                + service.convert(BigDecimal.ONE, "dL", "m3"));

It converts STERE to LITER (both "l" or "L";-) etc. All based on the ucum-essence.xml.

Can't find tests using UcumFunctionalTests.xml in https://github.com/eclipse/uomo/tree/master/bundles/org.eclipse.uomo.ucum.tests/src (seems Grahame didn't get to that in OHF) but judging from the functionality UOMo UCUM can clearly be called a "fully conformant parser". Not entirely sure, if ucum.js uses those tests in https://github.com/jmandel/ucum.js/tree/master/vendor or not. We could certainly add those to UOMo at least in the current branch/release.

It is possible to offer "limited conformace" with UCUM (whatever they mean with that, it is quite vague) here either by leaving out units that are incompatible with the SI based implementation or not doing any printing at all (that is not in scope for UCUM after all, only the "c/i" and "c/s" codes matter plus conversion factors of its own)

The services offered by https://github.com/jmandel/ucum.js are fairly similar to those UOMo provides: https://github.com/eclipse/uomo/blob/master/bundles/org.eclipse.uomo.ucum/src/main/java/org/eclipse/uomo/ucum/UcumService.java

As @desruisseaux also mentioned, different libraries often need their own implementation of a standard like JSR 363. I am using a very minimalistic enum based approach for a client that suits all their needs. There is currently just one class UOMo UCUM reuses from the Units bundle. It should not be required at all since the constant can be easily reproduced. That makes UOMo UCUM self-contained, it only depends on an API either JSR 363 or the prior Unit-API 0.6.

garretwilson commented 7 years ago

I can only point to what UCUM itself provides ...

No, @keilw , you misunderstand my question. You and @desruisseaux seem to say that UCUM is a bad standard for representing units, that it perverts things and has unusual ways of looking at things.

So I'm asking you: what specification for standardized serialization of units would you recommend instead of UCUM? If UCUM is so bad or difficult to implement, surely you can recommend an alternative specification.

keilw commented 7 years ago

@garretwilson UCUM is just an attempt to combine a subset of ISO 80000 (which for legal and license reasons I am not sure you can even put out in the open due to ISO ownership) and others like ISO1000 or the general SI system plus a whole lot of legacy systems. It keeps mixing and matching various unit systems while the SI system is well-defined and self-contained. E.g. there is only ONE system base unit per quantity like m3for Volume or mfor Length.

I'm not saying don't use it, but if you do it requires all the meta-information like "isSpecial" or a registry of Special Units that Grahame built over nearly a decade, see https://github.com/eclipse/uomo/tree/master/bundles/org.eclipse.uomo.ucum/src/main/java/org/eclipse/uomo/ucum/special.

If this library should be able to deal with every UCUM unit, it won't be able to do so without the meta-information in a similar way https://github.com/eclipse/uomo/tree/master/bundles/org.eclipse.uomo.ucum/src/main/java/org/eclipse/uomo/ucum/model does and a few other packages. So you can use those classes written for UCUM over many years or help replicating something similar here, otherwise there will be cases like metric prefixes UCUM just does not permit and you cannot catch them, nor will e.g. format() or parse() ever make any sense of those operations and units involved.

garretwilson commented 7 years ago

... the SI system is well-defined and self-contained ...

So which specification provides me a single, unambiguous serialization of SI units? Does the SI specification provide symbols appropriate for computer serialization in e.g. XML files?

(But if I use SI then I'm precluding users from ever using any unit that is not included in SI, and for my client I don't know that I can make that assumption.)

Is there any other specification that is simpler than UCUM but provides a standardized serialization of SI and English/American units suitable for interchange in e.g. XML or JSON?

keilw commented 7 years ago

Please see http://uom.si/, a dedicated site for other modules here on GitHub (currently also on top of UCUM or other systems, if UCUM is actively continued, that may have to change, so clashes between systems are avoided)

SI so far does not require its own formatter. Use EBNFUnitFormat for greater flexibility or SimpleUnitFormat (which is not backed by a BNF algorithm, so while it covers all units some limitations may apply) Except for very derived units e.g. in Table 4 of http://www.bipm.org/en/publications/si-brochure/section2-2.html#section2-2-2 SI (extends Units in implementation modules) the others are supposed to be there. If you're missing something please either raise a PR or issue in https://github.com/unitsofmeasurement/si-units/issues

Any file format, whether it's XML, JSON or plain text (CSV,...) should work with the textual representation. Unlike UCUM where parsing is the focus both SimpleUnitFormat and EBNFUnitFormat for SI-based units should format() and parse() known units. Implementations have unit tests for several cases like inversion of a unit. Further help is appreciated but the number of units is not as excessive as in UCUM.

I am not aware of too many other catalogs. oBIX maybe but it is specific to a niche like building and architecture. Otherwise Unicode CLDR provides a "representative" subset of the SI system plus some common units from other systems like US or Imperial. It is specialized on printing and formatting in many different languages, so for M2M data exchange you normally don't need that. A reason why ICU4J is probably not used directly in UOMo or similar modules, but we hope to create a "bridge" on top of the https://github.com/unitsofmeasurement/uom-systems/tree/master/unicode-java8 module here (ICU4J is currently much too big for Java ME 8, it barely fits into modern Android versions) where powerful UI representation of those units is required.

garretwilson commented 7 years ago

@keilw you've pointed me to a library that is your idea of some unit systems and even includes UCUM. But you haven't pointed me to any formal specification for consistent serialization of units. I would rather not put μ into a persistence file. Even worse, I don't want to put m2 with a superscript. (Look, I can't even seem to include a superscript here in GitHub comments --- see the problems that causes with computer serializations? Yet the SI clearly indicates that square meters should be m2 with a superscript.)

It seems to me that BIPM by itself doesn't cut it for unambiguous computer serializations. I need a specification I can use --- we're working with multiple developers with multiple modules across multiple departments, in multiple programming languages. I can't just say "use Werner's Java uom.si library if you want to know how to serialize some units". (The Python developers wouldn't be very happy about that, for example.) So unless you can point me to some other specification, it seems that UCUM is the only game in town.

So I'd like to get back to this issue #97. I need to come to some conclusion soon, because I'm devoting a lot of resources to this, and @magnonasc and I have other things we need to be working on.

@magnonasc has indicated to me that there are some specific unit tests that are currently failing with the current approach. Please indicate to @magnonasc how you would like to improve the UCUM implementation so that these tests no longer fail, so that we can release a working version as soon as possible. Thank you.

keilw commented 7 years ago

@garretwilson In the "open" (for all we know free to use for both commercial and non-commercial purposes, at least based on the UCUM license terms) I don't think you'll find more extensive gatherings of units than UCUM did. If you are willing to pay for it, then ISO 80000 would be the way to go. However it is commercial and even UCUM is vague in what they use from it (https://en.wikipedia.org/wiki/ISO/IEC_80000 and the ISQ which goes beyond the SI but unlike UCUM there should also not be overlapping or competing definitions of a "basis" e.g. using dm3in one case and m3 in another or gram instead of kg for mass)

UOMo UCUM is working and other Eclipse projects like OSBP (Open Standard Business Platform) are already using it in production. It makes use of the ucum-essence.xml exactly as UCUM intended (primarily for parsing but if you take e.g. the code string from a Concept you can easily use that for whatever purpose you need it) While originally created by Grahame and other contributors (who dedicated a lot of time before some OHF stakeholders decided they no longer want to do that at Eclipse) for an OSGi/Eclipse P2 setup, it's all Java and works in plain Java from Enterprise to at least SE Embedded. It is working NOW, all 3 branches build on CI https://hudson.eclipse.org/uomo/ but you have to use it the way UCUM has planned and other implementions in JavaScript, Python or whatever also follow.

If you wanted @magnonasc to help create a dedicated UCUM implementation, well, there is none (except in UOMo) that is tailor-made for UCUM, so he and others would have to start it. The UCUM module will have to be more or less autonomous, but I see no reason to do a completely independent implementation of JSR 363 in this module (like UOMo does, the migration to 363 is ongoing, but it does not matter to UOMo because the UCUM module won't change its behavior much and still work against the UCUM XML or tests) For some strange reason only German Wikipedia has a rich overview of UCUM: https://de.wikipedia.org/wiki/Unified_Code_for_Units_of_Measure (the English version however points out UOMo as a key if not "reference" implementation of UCUM for Java) It quotes the UCUM spec about the key elements

https://github.com/eclipse/uomo/tree/master/bundles/org.eclipse.uomo.ucum/src/main/java/org/eclipse/uomo/ucum/model contains representations of most in UOMo. UOMo UCUM and other implementations parsing the XML file directly contain mutable elements, a bit like the Spring Framework (also full of getters and setters to modify "beans") while JSR 363 implementations are meant to be mostly immutable. If you do static declarations like the UCUM class aims for, this may not be necessary, so BaseUnit could probably be reused from the SE or another implementation, but at least the DefinedUnit or equivalents have to be created for UCUM.

http://unitsofmeasure.org/ucum.html#section-Semantics mentions at least two or three special cases. Special Units and Arbitrary Units (http://unitsofmeasure.org/ucum.html#section-Arbitrary-Units) both must be treated differently, e.g. conversions into other units or a metric Prefix is not allowed for those units. It is quite possible this also makes a dedicated UCUMPrefix (similar to the Prefix class in UOMo) necessary to ensure, you cannot apply them to certain units and e.g. a MeasurementException is thrown in that case.

With that plus a specially tailored and improved UCUMFormat, where necessary also specialized UnitConverter implementations it should work to implement UCUM in an "early binding" fashion (UOMo already does "late binding" so it is all runtime exceptions or errors not compile time) if you or your clients have a strong requirement and wish to complete that.

keilw commented 7 years ago

Most UCUM implementations are proprietary closed-source projects by big pharma companies or other healthcare providers. Except the official implementations here, at Apache or Eclipse there are several of us who use JSR 363 in corporate closed-source implementations because clients and their work attitude or business demand it. We can only offer to do some of it open source here but not force people. As with UCUM there are also closed-source commercial implementations of nearly every JSR.

garretwilson commented 7 years ago

Thank you for all that wonderful information. You made clear that the UCUM is the best standard we have at the moment.

If you wanted @magnonasc to help create a dedicated UCUM implementation, well, there is none (except in UOMo) that is tailor-made for UCUM, so he and others would have to start it.

I haven't asked @magnonasc "to help create a dedicated UCUM implementation". I have asked @magnonasc to help you fix yours and get it to to the point so that it doesn't break with the very basic of use.

So please, please, please, let's get back to this issue. I'll repeat what I said above: @magnonasc has indicated to me that there are some specific unit tests that are currently failing with the current approach. @magnonasc has proposed a solution to fix your implementation so that the unit tests pass. You don't like @magnonasc's proposed solution. So please indicate to us the solution you would like @magnonasc to implement.

Will you let @magnonasc fix your library? If so, please let him know how you would like him to fix it. We are here trying to help. I am using my company's resources to help you fix your project. You are getting free help. The unit tests are breaking, and @magnonasc is waiting for you to let him know how to proceed. Please let him know so that we can go forward. Thank you.

keilw commented 7 years ago

@garretwilson Thanks for your feedback.

First of all, while as one of the Maintenance Leads for JSR 363 myself, @leomrlima and @dautelle have a certain ownership, that primarily applies to the RI and derived fully compatible implementation for Java SE (https://github.com/unitsofmeasurement/uom-se) other modules were almost exactly based on prior libraries like JScience, so the UCUM library has not been modified much before @magnonasc and myself started adding e.g. new units recently. Other forks of JScience 5 were created by Opower (now part of Oracle, but can't say if they still work on it now) https://github.com/jeffkole/jackson-module-unitsofmeasure contains an entire working JScience 5 artifact. And the Jackson Module by Opower used the very same UCUMFormat (see the opower-sources.jar if you want in https://github.com/jeffkole/jackson-module-unitsofmeasure/tree/master/repo/org/jscience/jscience-physics/5.0-opower) to serialize unit-relevant JSON strings and RESTful Web Services ever since. And never complained that what they used in this JSON serializer would not meet their requirements or that of their customers.

So it is not "our" or "my" library unless you mean the wider community behind JScience, Unit-API 0.6 or even JSR-275 and 108.

I am happy to proceed and have @magnonasc help with THIS library, systems-ucum-java8. Where appropriate and it meets the needs of a UCUM implementation, it can certainly use uom-se, but it must not make any breaking changes to basic implementation classes like say adding isSpecial(). @desruisseaux mentioned, that he got "his own" implementation at Apache SIS: https://github.com/apache/sis/tree/trunk/core/sis-utility/src/main/java/org/apache/sis/measure There AbstractUnit exists with a special scope attribute that is specific to EPSG (http://www.epsg.org/) and its own unit catalogue. Similar to UCUM, but in that case specific to Oil and Gas, but I hope you get the picture. Anything that can be reused from either unit-ri or uom-se fine, it'll save all of us time, but for domain specific extensions that only help a particular niche or catalog, either extend it here or create a whole new JSR 363 implementation like UOMo UCUM or Apache SIS did.

Cheers, Werner

garretwilson commented 7 years ago

Can we talk specifics, @keilw ? I love your long stories about history of the project and such, but right now I just need to get this working.

@magnonasc tells me the following unit tests are broken:

@Test
public void testFormatUCUMCSMICROLiter() {
  assertEquals("uL", FORMAT_CS.format(MICRO(UCUM.LITER))); // returns nst
}

@Test
public void testFormatUCUMCSDECILiter() {
  assertEquals("dL", FORMAT_CS.format(DECI(UCUM.LITER))); // returns st/10000
}

@Test
public void testFormatUCUMCSDECIStere() {
  assertEquals("dst", FORMAT_CS.format(DECI(UCUM.STERE))); // returns L
}

@Test
public void testFormatUCUMCSCENTIStere() {
  assertEquals("cst", FORMAT_CS.format(CENTI(UCUM.STERE))); // returns cst
}

@Test
public void testFormatUCUMCSGill() {
  assertEquals("[gil_br]", FORMAT_CS.format(UCUM.GILL_BRITISH)); // returns st.454609/3200000000000
}

Will you be able to coordinate with @magnonasc to find a way forward to get these unit tests working? If not I'm going to pull @magnonasc off the project. I have no choice---I need him elsewhere. We're trying to help you here.

@magnonasc wants to fix these unit tests. We're now waiting on you to tell him how he can proceed.

Please, @keilw . I don't need a long story about the history of JScience or why the current architecture whatever whatever. Right now the history doesn't matter. Right now the reasons don't matter. Right now I don't care how you fix them. Right now there is only one thing important: These unit tests are breaking; will you let @magnonasc fix them? I cannot use your library with these unit tests breaking. I have to know now if you will let us work with you to fix them or not.

keilw commented 7 years ago

@garretwilson Of course @magnonasc is welcome to fix them, but any "special attribute" or UCUM-specific requirement like storing multiple levels of parent or base attributes must be done entirely in this module via a decicated class that implementsjavax.measure.Unit or a subclass of AbstractUnitwith the necessary stuff and algorithms overridden here. Adding extra quantities under the quantity module here, sure, whatever is justified and can be reused, that's what it's there for, but no breaking or incompatible changes to the RI or its drop-in-replacement uom-se. These are used by various projects and we must preserve the way they use them. That's a natural responsibility of a standard like a JSR. The guys at Spring or Angular can simply add some fancy new feature even if it breaks backward-compatibility, but that's not the way good standards work. Even if they may sometimes feel a bit "slow" but that's how many stanards work. The SI system is about to get the first major redefinition after nearly 200 years in 2018 ;-)

magnonasc commented 7 years ago

I took a look at the api of uom-se to have an idea of what we'd have to change..

The methods on AbstractUnit are all declared as final, so unless you take it out on uom-se we wouldn't be able to extend AbstractUnit to change an implementation.. With that, the only other way would be to create or copy the implementation of uom-se to UCUM and change it, but it's a very, very, very bad thing to do, as we'd have duplicated code.

The other possibility i.e., implements Unit, is okay.. we could do that either, BUT, I saw that the methods of AlternateUnit, TransformedUnit, ProductUnit and AnnotatedUnit all are based on AbstractUnit (not Unit) and they have parameters for AbstractUnit aswell, so we'd need to create implementations for those too.

So if you're not able to at least take that final out from the methods of AbstractUnit, this fix will be a little bit impraticable.

keilw commented 7 years ago

Can cou give a list of methods you would like to see non-final? We cannot make all of them open, e.g. getSystemUnit() is already a wrapper around the abstracttoSystemUnit() method, so toSystemUnit() shall be implemented or overriden in the proper way for a particular subclass. Other methods like alternate() have a clear purpose, but the API does not mandate creating an AlternateUnit, so we could discuss making some non-final, allowing to override them where it helps.

keilw commented 7 years ago

We can also take some inspiration from SIS: https://github.com/apache/sis/blob/trunk/core/sis-utility/src/main/java/org/apache/sis/measure/AbstractUnit.java

A few methods are not declared in AbstractUnitwhile others are final there, too. Let's not forget, uom-se and base classes there must remain compatible with the RI, but as long as the behavior isn't changed, some methods could either shift from the abstract base class to concrete classes or be less final to give those a certain freedom. As for AbstractUnit, of course all concrete subclasses extend that, Unit in the API is just an interface and until some day when maybe the API could have Java SE 8 as a minimum version, the interface cannot use a default implementation there, so an abstract base class is usually necessary. SIS has it under the same name, in UOMo the base class is called UcumUnit(on top of a more abstract Conceptclass which is UCUM specific)

magnonasc commented 7 years ago

I'd like to have at least transform() as not final at the moment, I just need to change the implementation to see if I could fix the problem by this way. I'll probably change just one line of it, make the formatter compatible with its changes and then test, and test, and test again.

There's no need for that, @keilw, I tried to make the changes by myself on my machine.. The current implementation is nothing flexible, I can't change a single line in a method without breaking everything, and if I simply create other implementations I'd need to almost create another implementation of the whole project, it's just not worth it at all.

So all I can say now is that our only solution is to take stere out.. There's not anything else to do :/

keilw commented 7 years ago

@magnonasc There's a reason UOMo UCUM or SIS created their own implementations. It works for almost every case so far, but UCUM is a different beast. If removing STERE works for your project or product, that sounds fine to me. Unless there's demand, we can live with a partial UCUM implementation by this library.

magnonasc commented 7 years ago

I have a question about this.. when we try to format GILL_BR, we simply get [GILL_BR], but if there's a MILLI(GILL_BR), then the formatter will separe the converter and the parent unit of the transformation, it would get MILLI converter and m3.454609/3200000000000 as a ProductUnit but if we want a GILL_BR from that, we don't have something like a dictionary of units with its definition based on the system unit, right? With that, the only way to get the unit is in a brute force way, which would make the implementation have a very, very bad efficiency.. So I think I got back to 0, and I don't know what to do anymore. Any idea? This implementation works really bad with a TransformedUnit and ProductUnit.

keilw commented 7 years ago

I assume you mean the UCUMFormat implementation of UnitFormat? I mentioned here or in other threads that it was more or less a quick and dirty clone of what now became EBNFUnitFormat without really looking into the needs and definitions of the UCUM catalog. Sorry if you had a different impression, but I said it may need a lot of work. Aside from that, GILL_BR is among the reasons I asked https://github.com/unitsofmeasurement/uom-systems/issues/100. The Parser implementation in UCUMFormat works extremely well compared to print/format (not a single test fails there). And what would you really plan for MILLI(GILL_BR)? Should it say M[GILL_BR], [MGILL_BR] or something else? Unlike the Print column none of the others make real sense for output at least combined with prefixes.

Actually the GILL_US, GILL_BR and others in tables like http://unitsofmeasure.org/ucum.html#para-34 don't even have a Print column. Meaning, we should not have a properties entry here either because UCUM has none and does not intend to print it at all. So what you get m3.454609/3200000000000 is perfectly fine for this case IMO. It is mathematically correct. If no print label is defined, that is similar to other UnitFormat implementations. It is the best representation based on available information about these units and the desired operation. The only other alternative would be to throw an exception but that would require UCUM specific meta-information similar to isSpecial(). Something like canFormat() etc. but it would also need a special attribute in a special implementation or extension to AbstractUnit like we spoke.

magnonasc commented 7 years ago

GILL_BR was just an example, that happens to LITER too. The formatter works fine when the unit is present on the symbolMap, i.e., GILL_BR by itself, but as I mentioned before, we get a productUnit with as the base and not GILL_BR itself. I'm pretty sure that the problem here is the implementation of those type of units, but after this long discussion, we saw that we can't change that.. So if the only way to format those units is the formatter making calculations, then we need to find the better approach.

first approach, brute force: decompose the unit and go through all the others and compare their definition. Which is really, really bad.

second approach, definition property file: we'd have to create a file to keep the definition for the units, and the formatter would get the closest match and use it, applying the remained as a converter of that unit. Which is still bad, but better than the other.

In fact we can't have the most efficient way of fixing this problem, because we're not able to implement it, but in the future maybe it's a good idea having a dedicated UCUM implementation without depending on uom-se, if you want it, off course, I know you suggested dedicated UCUM apis in other comments, but this one is yours so it's your decision.

keilw commented 7 years ago

Not yet sure, how it works in UCUM, but a problem in TransformedUnit that also affected a few tests in uom-se seems tracked down:-) In most cases a TransformedUnit is not supposed to have a symbol, but to fix some other issue earlier, it was introduced. However there are just 3 or 4 cases this constructor is used. All other cases should have a null symbol.

magnonasc commented 7 years ago

What was this issue? Can you point me that so I can take a look if that can give me any idea for this?