Closed joergklausen closed 2 years ago
This is a similar issue as CF is experiencing with UGRID and NUG (potentially other activities as well). I haven't seen a conclusion on the CF work flow in this context, but assumes that could be used as a model?
@gaochen-larc review notation for consistency and suggest descriptions
https://physics.nist.gov/cuu/Units/index.html. This site has a lot useful information. A good reference for what we need to do.
More information from NIST perspective: https://physics.nist.gov/cuu/Units/checklist.html in terms of governance...
Discussion Topics:
Geopotential metre, gpm: It appears this unit is for geopotential height. However, the geopotential height is always reported in the length unit, e.g., m. No precise definition of this unit is found through google search and limited textbook search. Comments are solicited in terms of the definition and the need of unit to justify if this unit should remain in the current code list.
Consolidate "ug_C_m-3", "ug_Cl_m-3", "ug_N_m-3", "ug_S_m-3", "ug_m-3_20C", "ug_m-3_25C" to "ug m-3": These units are on the current code list to represent the measurements of specific species of C, Cl, and N or measurements reported at specific temperature. However, the all these measurements are reported in "ug m-3". The additional information are used to define the variables or how the variable is reported. This practice mixes the physical quantities and units of physical quantities and is incompatible with the SI governance. The specific variable information and/or variable reporting information can be incorporated into variable name and variable definition/description. Comments are solicited to confirm if it is feasible to consolidate these non-SI units to SI unit: ug m-3.
Consolidate "mg_N_l-1" and "mg_S_l-1" to "mg l-1": For the reason stated in 2. Comments are solicited to move forward.
Change mg_m-3_25C to mg m-3: Similar to 2). "25 C" can be stated in variable description/definition or part of variable name. Please provide comments.
Label correction for "pptv", "ppbv", and "ppmv": the term "dry air" should be removed as these units are commonly used to represent volume fractions for dry air and ambient air. In addition, the SI equivalent units should be provided in description column. Comments?
Request addition of "ppt", "ppb", and "ppm": these are the units commonly used to report mass fractions or mole fractions for atmospheric composition measurements. SI equivalent units should be provided in description column. Comments, objections?
Request addition of "km-1" and "Mm-1": these are the units used to report particle scattering, absorption, and extinction coefficients. These are SI units. Comments, objections?
Question about "m2_Hz-1": should this unit be "m2 s"? need feedback here...
Unit Notation issue: all units in the current code list use "_" which is not compatible with SI notation. However, the SI notation may not be practical when data are reported in the ASCII files, like .ict files... Comments?
Re: https://github.com/wmo-im/wmds/issues/159#issuecomment-776770571:
Practical data management issues with SI unit symbols in ASCII files:
Should we deal with these potential problems while resolving the compatibility with SI unit issue?
@gaochen-larc, I'm afraid I don't understand what you mean by this:
- Space between terms would be an issue for space delimited files.
Please elucidate.
@gaochen-larc, I'm afraid I don't understand what you mean by this:
- Space between terms would be an issue for space delimited files.
Please elucidate.
"m s-1" => "m" and "s-1" when parsing the file
Greek and other 'strange' character are not per se a problem, nor are spaces in terms ... but it is prudent to avoid both. We should use UTF-8 character encoding for everything we do, and we should double quote all strings we use in CSV files. Unfortunately some very common tools like Excel (arrgh) doesn't support this out of the box. I think we are okay for definitions and descriptions etc, but I would recommend we stick to ASCII alphabet 5 and no spaces for any identifiers (called 'notations' on codes.wmo.int/wmdr).
The US NIST (and I believe the BIPM) recommends a middle dot "·" (or a period if a middle dot can't be used) to separate the individual units of a compound unit specification.
The US NIST (and I believe the BIPM) recommends a middle dot "·" (or a period if a middle dot can't be used) to separate the individual units of a compound unit specification.
good idea! We can use this: ASCII CODE 250. any objections?
Greek and other 'strange' character are not per se a problem, nor are spaces in terms ... but it is prudent to avoid both. We should use UTF-8 character encoding for everything we do, and we should double quote all strings we use in CSV files. Unfortunately some very common tools like Excel (arrgh) doesn't support this out of the box. I think we are okay for definitions and descriptions etc, but I would recommend we stick to ASCII alphabet 5 and no spaces for any identifiers (called 'notations' on codes.wmo.int/wmdr).
Agree with you. Need clarification for "ASCII alphabet 5". I do not know what "ASCII alphabet 5" is and could not find it in google search...
I'm not sure this is the correct place for this, apologies if not.
We’re in the process of mapping current metocean metadata (from Voluntary Observing Ships and Ships of Opportunity) to the WMDS and have some questions on the units field / table. We’ll also be mapping historic metadata in the future, going back pre 1900s and have a few questions over the intent and how to use the field appropriately.
Thanks,
Dave Berry.
@gaochen-larc, I would like to add another adjustment to your list:
@gaochen-larc "Unknown": What is this intended for? unrecognized unit? more specific word should be used."
ISO CodeList for nilReason values
inapplicable |
missing |
template |
unknown |
withheld |
1. I'm not sure either, what "N_units" means, but it can't stand for newton, because there is already an entry for this (notation "N", see https://codes.wmo.int/wmdr/unit/N). "N_units" also appears in the BUFR table C-6 (see https://codes.wmo.int/common/unit/N_units), on which the WIGOS measurement unit table was based.
N Units comes from Common Code Table C-6 and is used to report the atmospheric refractivity (http://codes.wmo.int/bufr4/b/15/_036). The comment below appears in the manual on codes, Volume I.2
(5) The refractivity, N, is related to the refractive index, n, by the formula N = 106 (n – 1). N is therefore dimensionless but values computed by the formula are by convention described as being in "N units".
The US NIST (and I believe the BIPM) recommends a middle dot "·" (or a period if a middle dot can't be used) to separate the individual units of a compound unit specification.
good idea! We can use this: ASCII CODE 250. any objections?
More thoughts on middle dot: 1) Middle dot as well as the Greek letter "µ" are part of the extended ASCII codes (Character code 128-155), ISO 8859-1 2) Should we keep everything under ASCII printable characters (character code 32-127)? There were issues encountered when legacy text editor was used. @joergklausen @semmerson @amilan17 and Franziska comments or suggestions?
To me, 'unknown' is an adequate nilReason in this context, as it expresses that no value has been reported.
1. I'm not sure either, what "N_units" means, but it can't stand for newton, because there is already an entry for this (notation "N", see https://codes.wmo.int/wmdr/unit/N). "N_units" also appears in the BUFR table C-6 (see https://codes.wmo.int/common/unit/N_units), on which the WIGOS measurement unit table was based.
N Units comes from Common Code Table C-6 and is used to report the atmospheric refractivity (http://codes.wmo.int/bufr4/b/15/_036). The comment below appears in the manual on codes, Volume I.2
(5) The refractivity, N, is related to the refractive index, n, by the formula N = 106 (n – 1). N is therefore dimensionless but values computed by the formula are by convention described as being in "N units".
Thanks @DavidBerryNOC very useful. @fstuerzl Please include a definition along these lines.
The US NIST (and I believe the BIPM) recommends a middle dot "·" (or a period if a middle dot can't be used) to separate the individual units of a compound unit specification.
good idea! We can use this: ASCII CODE 250. any objections?
More thoughts on middle dot:
- Middle dot as well as the Greek letter "µ" are part of the extended ASCII codes (Character code 128-155), ISO 8859-1
- Should we keep everything under ASCII printable characters (character code 32-127)? There were issues encountered when legacy text editor was used. @joergklausen @semmerson @amilan17 and Franziska comments or suggestions?
I would not recommend the use of any character for notation (WMO306_CD) that I cannot easily find on my keyboard without resolving to
@gaochen-larc, I would like to add another adjustment to your list:
- The notations log(m-1) and log(m-2) are confusing, because an underscore normally stands for a multiplication, which in this case makes no sense. With logarithmic scaling the units should be log(m-1) and log(m-2).
I agree, this should be fixed, also the name is incorrect, there is no "log per m". If anything, it would be "logarithm of inverse meter", but there is a more fundamental problem here: logarithm is only defined for dimensionless numbers! So, the logarithm of a number of inverse meters divided by inverse meters exists, but not the logarithm of a number of inverse meters. Also, the base (presumably, base 10) needs to be specified. There are 2 entries that suffer from this as far as I can see. This notation is also used in the C-6 code list.
@DavidBerryNOC I'm not sure the UDUNITS package can handle a unit that's a Galilean transformation of a dimensionless unit.
I'll have to check.
At https://github.com/wmo-im/wmds/issues/159#issuecomment-781416666@gaochen-larc wrote
- Should we keep everything under ASCII printable characters (character code 32-127)? There were issues encountered when legacy text editor was used. @joergklausen @semmerson @amilan17 and Franziska comments or suggestions?
I suppose it depends on whether the unit is produced by a human or by a computer:
At https://github.com/wmo-im/wmds/issues/159#issuecomment-781475053, @joergklausen wrote
... Also, the base (presumably, base 10) needs to be specified. There are 2 entries that suffer from this as far as I can see. This notation is also used in the C-6 code list.
One could use "lg" for base 10, "ln" for base e, and "lb" for base 2. I believe these symbols are relatively common.
Regarding the "log" issue: Similar to what Jörg wrote, I would interpret the given "units" as "inverse (square) meter displayed on a logarithmic scale", which means the actual units are m^(-1) and m^(-2) . Therefore I would propose to remove these two items from the table and - if neccessary - include m^(-1) and m^(-2). I checked the OSCAR database and found no instance, where they are used.
At #159 (comment)@gaochen-larc wrote
- Should we keep everything under ASCII printable characters (character code 32-127)? There were issues encountered when legacy text editor was used. @joergklausen @semmerson @amilan17 and Franziska comments or suggestions?
I suppose it depends on whether the unit is produced by a human or by a computer:
- Human produced: Use US-ASCII (e.g., "kg.m/s2", "kg.m/s^2", "ug") (Tip of the hat to @joergklausen)
- Computer produced: Use UTF-8-encoded Unicode (e.g., "kg·m/s²", "µg")
So based on the discussion, middle dot is out. Could we use "." (period) to represent multiplication?
I came across this paper: UNITS FOR USE IN ATMOSPHERIC CHEMISTRY by Schwartz and Warneck (http://www.iupac.org/publications/pac/1995/pdf/6708x1377.pdf).
This paper highlighted the needs and issues to apply SI units in the field of atmospheric chemistry. It has some useful recommendations. I believe this paper is highly relevant to our discussion here.
At https://github.com/wmo-im/wmds/issues/159#issuecomment-786306054, @gaochen-larc wrote
So based on the discussion, middle dot is out.
I didn't get that from the discussion. Computers can easily interpret and generate a middle dot, so I'd keep it for them.
Because 1) it can be difficult for humans to generate a middle dot; 2) humans can generate a period easily; and 3) computers can be told to interpret a period as a middle dot; I'd keep both.
Could we use "." (period) to represent multiplication?
Easily. I believe the NIST document recommends a period as a substitute for a middle dot.
I didn't get that from the discussion. Computers can easily interpret and generate a middle dot, so I'd keep it for them.
Sorry, I meant we use "." period for notation, i.e., kg.m/s2
Sorry, I meant we use "." period for notation, i.e., kg.m/s2
I understand. My point is that there's really no reason to rule out use of the middle dot.
Sorry, I meant we use "." period for notation, i.e., kg.m/s2 I understand. My point is that there's really no reason to rule out use of the middle dot. …
My concern is very much on data reported in ASCII files. Space or middle dot is not a problem for HDF and netCDF files. However, the chemical measurements I deal with are mostly reported in ICARTT files... The csv or other ascii files are still popular in reporting air quality monitoring data.
The attached xlsx file is a version of the updated unit table. This spreadsheet contains the original labels and notations, as well as the updated ones, descriptions, and comments highlighting the changes. 15 original units were removed or replaced. 11 new units were added.
There are still two minor issues:
1) Acceleration due to gravity, g. Obviously “g” is often used for grams. However, this will not cause confusion if one reads both the label and notation. To avoid this potential conflict, we may choose to use “g0” for standard acceleration due to gravity.
2) Centibars per 12 hours. The original notation does not make sense to me. Hope this is not the unit the data providers have used for a long time. I wonder if we should remove this unit and add one for centibars per day, provided this will upset the data providers too much.
Looking for comment, suggestions, and/or corrections.
Branch: https://github.com/wmo-im/wmds/tree/issue159
Summary and Purpose: Improve the unit table: remove redundant or incorrect entries, add descriptions, correct labels and notations and add missing units
Stakeholder(s): @gaochen-larc
Notation, general
Proposal: Change "_" to ".", where it symbolizes multiplication Reason: The use of the underscore is inconsistent in table 1-02 (e.g. notation N_units”). It is useful to select one symbol to indicate a multiplication and only that. For compatibility reasons we choose “.”.
Removal of units
Proposal: Remove entry label | notation |
---|---|
degree Celsius | Cel |
Reason: Entry is not needed, because the table also contains the unit “degC”. Semantically “degC” makes more sense.
Proposal: Remove invalid entries label | notation |
---|---|
logarithm per metre | log_(m-1) |
logarithm per square metre | log_(m-2) |
Reason: ill-defined, mathematically incorrect.
Proposal: Remove the following entries and supersede them by existing concepts label | notation | superseded by |
---|---|---|
milligram N per litre | mg_N_l-1 | mg.l-1 |
milligram S per litre | mg_S_l-1 | mg.l-1 |
microgram C per cubic metre | ug_C_m-3 | ug.m-3 |
microgram CI per cubic metre | ug_Cl_m-3 | ug.m-3 |
microgram N per cubic metre | ug_N_m-3 | ug.m-3 |
microgram S per cubic metre | ug_S_m-3 | ug.m-3 |
milligrams per cubic metre at 25 degrees | mg_m-3_25C | ug.m-3 |
micrograms per cubic metre at 20 degrees | ug_m-3_20C | ug.m-3 |
micrograms per cubic metre at 25 degrees | ug_m-3_25C | ug.m-3 |
degrees true | deg_true | deg |
Reason: Concept too narrow, misleading
Change labels, notations and/or descriptions
Proposal: Replace notations label | notation (old) | notation |
---|---|---|
degrees Celsius per metre | C_m-1 | degC.m-1 |
degrees Celsius per 100 metres | C_(100m)-1 | degC.hm-1 |
Reason: Consistency, degree Celsius should always be represented as “degC” in the notations.
Proposal: Change label label (old) | label | notation |
---|---|---|
degrees Celsius | degrees Celsius (°C) | degC |
Reason: label contains symbol °C
Proposal: Replace notation and add descriptions label | notation (old) | notation | description |
---|---|---|---|
centibars per 12 hours | cb_-1 | cb.(12h)-1 | 1 centibar is equivalent to SI unit 1 kPa |
centibars per second | cb_s-1 | cb.s-1 | 1 centibar is equivalent to SI unit 1 kPa |
Reason: ill-defined
Proposal: Replace notation label | notation (old) | notation |
---|---|---|
nautical mile | nautical_mile | nmi |
Reason:
Proposal: Replace notation, add description label | notation (old) | notation | description |
---|---|---|---|
degree (angle) | degree_(angle) | deg | plane or phase angle |
Reason: unnecessary parts from notation removed, missing description added
Proposal: Change Label and add descriptions label (old) | label | notation | description |
---|---|---|---|
parts per billion by volume dry air | parts per billion by volume | ppbv | Ratio of the volume of a certain substance to the volume of medium/matrix in which it is contained. |
parts per million by volume dry air | parts per million by volume | ppmv | Ratio of the volume of a certain substance to the volume of medium/matrix in which it is contained. |
parts per trillion by volume dry air | parts per trillion by volume | pptv | Ratio of the volume of a certain substance to the volume of medium/matrix in which it is contained. |
Reason: old label incorrect, missing descriptions provided
Proposal: Add description to existing units label | notation | description |
---|---|---|
geopotential metre | gpm | The height of a given point in the atmosphere in units proportional to the potential energy of unit mass (geopotential) at this height relative to sea level. (AMS Glossary of Meteorology) |
Dobson Unit | DU | Equivalent to 2.687×1020 molecules m-2 at standard temperature and pressure (273 K, 1 atm pressure), https://glossary.ametsoc.org/wiki/Dobson_unit |
N units | N_units | Unit of atmospheric refractivity, the refractivity, N, is related to the refractive index, n, by the formula N = 106 (n – 1). N is therefore dimensionless but values computed by the formula are by convention described as being in "N units". (WMO Manual on Codes I.2) |
dekapascal | daPa | dekapascal or decapascal |
square degrees | deg2 | square of phase or plane angle |
degrees per second | deg.s-1 | plane or phase angle per second |
nanomoles per mole | nmol.mol-1 | Equivalent to expressing mole fraction in ppb |
picomoles per mole | pmol.mol-1 | Equivalent to expressing mole fraction in ppt |
(unknown) | unknown | The correct value is not known to, and not computable by, the sender of this data. However, a correct value probably exists. |
Reason: description added to enhance usability and clarity
Addition of new units
Proposal: Add new units label | notation | description |
---|---|---|
Per kilometre | km-1 | |
Per megametre | Mm-1 | |
(missing) | missing | The correct value is not readily available to the sender of this data. Furthermore, a correct value may not exist. |
(inapplicable) | inapplicable | There is no value (categorical data). |
Reason: make code list more complete to represent common measurements or cases
Proposal: Add new units
label | notation | description |
---|---|---|
micrograms per cubic metre | ug.m-3 | |
milligrams per litre | mg.l-1 | |
milligrams per cubic centimetre | mg.cm-3 | |
nanograms per kilogram | ng.kg-1 | SI unit for mass mixing ratio, equivalent to pptm or ppt |
micrograms per kilogram | ug.kg-1 | SI unit for mass mixing ratio, equivalent to ppbm or ppb |
milligrams per kilogram | mg.kg-1 | SI unit for mass mixing ratio, equivalent to ppmm or ppm |
cubic micrometres per cubic centimetre | um3.cm-3 | SI unit for volumetric mixing ratio, equivalent to pptv |
cubic millimetres per cubic metre | mm3.m-3 | SI unit for volumetric mixing ratio, equivalent to ppbv |
cubic centimetres per cubic metre | cm3.m-3 | SI unit for volumetric mixing ratio, equivalent to ppmv |
per cubic centimetre | cm-3 | |
per cubic metre | m-3 |
Reason: add new SI units to promote use of SI units
Propose to add two more units:
per cubic centimetre, cm-3 and per cubic metre, m-3.
These units are commonly used to report aerosol particle concentrations and cloud particle concentrations.
Sorry for not putting these with the other additions!
"Ratio of the amount (as mass, volume or number) of pure substance to the amount of medium/matrix in which it is contained." This description is more suitable for ppb, ppt, or ppm. "pptv", "ppbv", and "ppmv" are specific for volume mixing ratio.
Branch created https://github.com/wmo-im/wmds/blob/issue159/tables_en/1-02.csv, superseded terms included here: https://github.com/wmo-im/wmds/blob/issue159/tables_en/superseded.txt. @amilan17, do you have any updates on the use of "." in a URL?
Discussion required for:
Additional variables requested by @ejwelton in #269:
Proposal: Add new units
label | notation | description |
---|---|---|
micrometre | um | |
nanometre | nm |
Reason: New entries are needed for the specification of wavelengths.
@gaochen-larc - I cannot find any external conversations about units. Please assume that you can move forward with current decisions.
@gaochen-larc Please confirm this can be considered 'validated'.
I confirm this version can be considered as "validated".
@gaochen-larc I am afraid, this issue is not ready still. I found 2 entries for nm (155, 175), one with british, the other with US spelling. Entry 175 should be dropped, entry 155 be renamed to british spelling (nanometre). After that, all entries are with britsh spelling. Also, many descriptions are still empty. @fstuerzl: Please provide descriptions where missing. If the name itself is sufficient as a description, please copy over.
@ferrighi Can you please assist in completing descriptions?
@gaochen-larc I am afraid, this issue is not ready still. I found 2 entries for nm (155, 175), one with british, the other with US spelling. Entry 175 should be dropped, entry 155 be renamed to british spelling (nanometre). After that, all entries are with britsh spelling. Also, many descriptions are still empty. @fstuerzl: Please provide descriptions where missing. If the name itself is sufficient as a description, please copy over.
Second entry for nanometre is removed.
@ferrighi , @gaochen-larc, here my proposal for some missing descriptions: 1-02_descriptions.xlsx
Updated branch: https://github.com/wmo-im/wmds/blob/issue159/tables_en/1-02.csv
Table updated with @gaochen-larc and @joergklausen's additions: 1-02_descriptions_gc_v01_stf.xlsx [edit: corrections]
Descriptions for the following units are still missing: notation | name |
---|---|
mm6.m-3 | millimetres to the sixth power per cubic metre |
m2.s | square metres second |
m2.s-2 | square metres per square second |
m2.rad-1.s | square metres per radian second |
m2.Hz-1 | square metres per hertz |
m3.m-3 | cubic metres per cubic metre |
m(2.3-1).s-1 | metres to the two thirds power per second |
kg.m-2.s-1 | kilograms per square metre per second |
kg-2.s-1 | per square kilogram per second |
s.m-1 | seconds per metre |
K.m2.kg-1.s-1 | kelvin square metres per kilogram per second |
W.m-2.sr-1.cm | watts per square metre per steradian centimetre |
W.m-2.sr-1.m | watts per square metre per steradian metre |
Updated branch: Unit table https://github.com/wmo-im/wmds/blob/issue159/tables_en/1-02.csv Superseded file https://github.com/wmo-im/wmds/blob/issue159/tables_en/superseded.txt
Branch
https://github.com/wmo-im/wmds/blob/issue159/tables_en/1-02.csv https://github.com/wmo-im/wmds/blob/issue159/tables_en/superseded.txt
Summary and Purpose
Improve the unit table by removing redundant or incorrect entries, adding or correcting names and descriptions, Changing the notations to a more consistent syntax and adding missing units.
Stakeholder(s)
@gaochen-larc
Proposal
Replace "_" in the notations by "." to symbolise multiplication. All entries in the table that use the underscore will be superseded by new entries using a period. Example: deg_s-1 is superseded by deg.s-1.
Add unique description to every unit to provide an unambiguous definition and correct names when necessary.
Add the following new variables:
Remove the following units from the code table
Reason
original comment:
Consider how to link with CF / unidata, who maintain an excellent table of units at https://www.unidata.ucar.edu/downloads/udunits/ to augment/change how we manage units at https://codes.wmo.int/wmdr/_unit
old link: https://www.unidata.ucar.edu/software/udunits/udunits-current/doc/udunits/udunits2.html#Database