unitsofmeasurement / uom-systems

Units of Measurement Systems
http://www.uom.systems
Other
36 stars 17 forks source link

Syntax validation and differences between uom-systems/ucum and ucum-essence.xml #173

Open JohnTimm opened 4 years ago

JohnTimm commented 4 years ago

I am looking for a JSR-385 based library to parse and validate UCUM units in our FHIR server implementation: http://github.com/ibm/fhir with the potential for supporting unit conversion in the future. I wrote a unit test to test parsing on: http://hl7.org/fhir/valueset-ucum-common.html and found a number of issues:

  1. handling of annotations in numerator or denominator (e.g. code: %/100{WBC}, display: percent / 100 WBC) Encountered " <ANNOTATION> "{WBC} "" at line 1, column 6.| For this one there are also problems with this syntax: /{oif} where there is only an annotation in the denominator (or the numerator)

  2. handling of annotations that contains spaces (e.g. code: %{Negative Control}, display: percent Negative Control) Lexical error at line 1, column 11. Encountered: " " (32), after : "{NEGATIVE"

  3. missing symbols(e.g. [iU] (or [IU]) for international units, bit_s, bd, etc.

Here are the numbers from the codes listed in that valueset:

Total: 1364 Success: 1117 Error: 247

It looks like part of the problem is how strict the UCUM format parser is. In the short term, I can look at "fixing up" some of the codes before passing them to the parser (e.g. remove spaces from annotations). The thing that concerns me the most, however, is how many missing symbols there are. Especially if you consider what's in ucum-essence.xml and compare that to the resource files that the format parser uses.

Is there a way to configure UCUMFormatParser to use ucum-essence.xml as a starting point for its symbol map? I looked into using Eclipse uomo but the activity there isn't the same as this project and it doesn't look like it is up to speed on its JSR 385 compliance. Please advise.

Here's a list of the 247 codes that generated exceptions:

%/100{WBC} %{Negative Control} /[arb'U] /[HPF] /[iU] /[LPF] /[HPF] /[LPF] /1010 /1012 /1012{rbc} /106 /109 /100{cells} /100{neutrophils} /100{spermatozoa} /100{WBC} /100{WBCs} /cm[H2O] [APL'U] [APL'U]/mL [arb'U] [arb'U]/L [arb'U]/mL [AU] [BAU] [beth'U] [beth'U] [CFU] [CFU]/L [CFU]/mL [Ch] [drp] [drp]/[HPF] [drp]/h [drp]/min [drp]/mL [drp]/s [GPL'U] [iU] [IU]/(2.h) [IU]/(24.h) [IU]/109{RBCs} [IU]/d [IU]/dL [IU]/g [IU]/g{Hb} [iU]/g{Hgb} [IU]/h [IU]/kg [IU]/kg/d [IU]/L [IU]/min [IU]/mL [MPL'U] [tb'U] [todd'U] [todd'U] {# of calculi} {# of donor informative markers} {# of fetuses} {# of informative markers} {2 or 3 times}/d {3 times}/d {4 times}/d {5 times}/d {cells}/[HPF] {clock time} U{G} {P2Y12 Reaction Units} 1012/L 103 103.{RBC} 103.U 103/L 103/mL 103/uL 103{Copies}/mL 10-3{Polarization'U} 105 106 106.[iU] 106.eq/mL 106.U 106/{Specimen} 106/kg 106/L 106/mL 106/mm3 106/uL 10-6{Immunofluorescence'U} 108 109/L 109/mL 109/uL cm[H2O] cm[H2O]/(s.m) cm[H2O]/L/s cm[Hg] dB eq eq/L eq/mL eq/mmol eq/umol GBq [iU] k[IU]/L k[IU]/mL kPa m[iU] m[IU]/L m[IU]/mL meq meq/(12.h) meq/(2.h) meq/(24.h) meq/(8.h) meq/(8.h.kg) meq/(kg.d) meq/{Specimen} meq/d meq/dL meq/g meq/g{Cre} meq/h meq/kg meq/kg/h meq/kg/min meq/L meq/m2 meq/min meq/mL mg/d/(173.10-2.m2) mL/cm[H2O] mL/min/(173.10-2.m2) mm[H2O] mm[Hg] mosm mosm/kg mosm/L mPa ng/106 osm/kg osm/L U/1010{cells} U/1012 U/106 U/109 u[IU] u[IU]/L u[IU]/mL ueq ueq/L ueq/mL 104/uL [bdsk'U] cm[H2O]/s/m {CPM}/103{cell} U/1010 U/(10.g){feces} U{25Cel}/L U{37Cel}/L U/1012{RBCs} {Globules}/[HPF] g/(8.h){shift} g/kg/(8.h){shift} [HPF] [GPL'U]/mL [MPL'U]/mL [in_i'H2O] [IU] [IU]/L{37Cel} [IU]/mg{creat} [ka'U] [LPF] [mclg'U] meq/g{creat} meq/{specimen} meq/{total_volume} 106.[CFU]/L 106.[IU] 106/(24.h) mPa.s ng/106{RBCs} nmol/min/106{cells} {#}/[HPF] {#}/[LPF] osm /104{RBCs} /[IU] /103 /103.{RBCs} /1012{RBCs} 103{copies}/mL 103{RBCs} %[slope] /100{Spermatozoa} [Amb'a'1'U] [CCID_50] [D'ag'U] [diop] [dye'U] [FFU] [hnsf'U] [hp_C] [hp_M] [hp_Q] [hp_X] [in_i'Hg] [iU]/dL [iU]/g [iU]/kg [iU]/L [iU]/mL [knk'U] [Lf] [mesh_i] [MET] [p'diop] [PFU] [PNU] [S] [smgy'U] [smoot] [TCID_50] [USP'U] 10 10^ a_g a_j a_t b B B[kW] B[mV] B[SPL] B[uV] B[V] B[W] Bd bit_s k[iU]/mL m[H2O] m[Hg] R REM

Needs #59

keilw commented 4 years ago

Thanks for the input and creating the JUnit test, is there a chance it could be run here, e.g. on a special Maven profile? There are a few units we found missing, but those helping us then cold not contribute further on it: https://github.com/unitsofmeasurement/uom-systems/issues/59 Does that match the missing symbols or units? You are right about UOMo UCUM, it is fully functional and supports the latest ucum-essence.xml, but it is currently based on version 1.x of the API and Indriya (JSR 363) The biggest difference between the UCUM class and the XML file is, that the class implements the SystemOfUnits interface and supports the type-safe unit model of JSR 385, while UOMO UCUM implements the most basic types like Unit, but with late-binding via quantity wildcard. The UnitFormat implementations also do that for parsing, but I can't say for sure, if it would be possible to use if for UCUM the same way UOMo does?

keilw commented 4 years ago

In https://ucum.org/ucum.html#chemical the "international unit" exists twice with variations of both the print format and c/s. The only way to manifest that is via an alias like INTERNATIONAL_UNIT_ALT ("alternate", happy about other name suggestions) because there is no UnitFormat.alias() that would work for a variant. Parsing the c/i variant leads to an ambiguity, AFAIK the first one is picked there. If we should eliminate one, please advise, but it seems the ucum-essence contains a few of those irregularities, not many but a handful maybe.

keilw commented 4 years ago

Btw, how come BAUD is missing, it is already there since 2018? I also added tests for Baud to https://github.com/unitsofmeasurement/uom-systems/blob/master/ucum/src/test/java/systems/uom/ucum/format/UCUMFormatTable4Test.java, so @JohnTimm could you elaborate, what fails with "Bd"?

keilw commented 4 years ago

@JohnTimm I hope, you are well because we haven't heard any feedback for almost a month? A significant number of these are combinations with previously missing units like "eq" but most of them are there now, could you repeat the test with 2.1-SNAPSHOT of uom-systems?

alexanderkiel commented 1 year ago

I also need all units from https://build.fhir.org/valueset-ucum-units.html. I would happy to contribute with some guideline.

keilw commented 1 year ago

@alexanderkiel Is that list from FHIR identical to the 1364+ entries in UCUM? It seems many of them are not in the latest UCUM files, and a large portion are combined units like "pmol/min" which should be derived from either UCUM or other system units like PICO(MOL).divide(MINUTE). Others are annotated units created like RED_BLOOD_CELLS = ((AbstractUnit)Units.ONE).annotate("RBC") or Unit<Volume> PERCENT_VOL = ((AbstractUnit)Units.PERCENT).annotate("vol").

There is no system for that, and it does not seem part of UCUM, so either something application specific or a domain specific system under uom-domain, I'd say a module under health sounds appropriate. You'd be more than welcome to contribute if you have time.

alexanderkiel commented 1 year ago

Hi @keilw I'm not an expert in UCUM. I work on a FHIR server written in Clojure/Java and use the systems.uom/systems-ucum and systems.uom/systems-quantity dependencies inside a query engine in order to be able to represent quantities so that the calculations are able to make use of some unit conversations.

Both the data and the queries can contain UCUM units from the FHIR UCUM Valueset I mentioned above. All the quantities have to pass a parsing step before I can evaluate queries. So I have a problem if I encounter a unit that can't be parsed.

Although it would be good to support as many units as possible in the future, maybe you have a recommendation for me how I can deal with unknown units. Is there a hook were I can just return for example an annotated dimensionless unit for unknown units?