hgrecco / pint

Operate and manipulate physical quantities in Python
http://pint.readthedocs.org/
Other
2.4k stars 472 forks source link

Suggestion: Add unitless dimensions #1277

Closed edmundsj closed 1 year ago

edmundsj commented 3 years ago

Not sure if this would be difficult to implement internally, but one common thing I do with units is convert between various unitless representations (i.e. I want to convert a percent to ppm).

Suggested additional dimensionless units:

hgrecco commented 3 years ago

I would be ok with thi.

keewis commented 3 years ago

note that "ppb", "ppt", and "ppq" are ambiguous ("billion" might mean either 109 or 1012, depending on the language / country), and that "ppthousand" should probably be an alias of "permille"

hgrecco commented 3 years ago

@keewis True, this is ambiguos. I think we should not add it.

jules-ch commented 3 years ago

The internationally recognized symbol % (percent) may be used with the SI. When it is used, a space separates the number and the symbol %. The symbol % should be used rather than the name “percent”. In written text, however, the symbol % generally takes the meaning of “parts per hundred”. Phrases such as “percentage by mass”, “percentage by volume”, or “percentage by amount of substance” shall not be used; the extra information on the quantity should instead be conveyed in the description and symbol for the quantity. The term “ppm”, meaning 10e−6 relative value, or 1 part in 10e6 , or parts per million, is also used. This is analogous to the meaning of percent as parts per hundred. The terms “parts per billion” and “parts per trillion” and their respective abbreviations “ppb” and “ppt”, are also used, but their meanings are language dependent. For this reason the abbreviations ppb and ppt should be avoided.

From the SI Brochure BIPM : https://www.bipm.org/utils/common/pdf/si-brochure/SI-Brochure-9-EN.pdf

cpascual commented 3 years ago

In view of the above info, in my opinion percent and ppm should definitely be supported out of the box (and ppthousand / permille maybe too), while ppb, ppt, ppq are better left out.

I also think that the % symbol should be supported for percent. I am aware that thanks to #911 it is now possible to add a custom preprocessor to support it, but IMHO, it should be available out of the box

hCraker commented 2 years ago

Any updates on adding ppm and other concentration units to pint?

dalito commented 2 years ago

@hCraker - Note that "ppm" is not a unit for concentration accoding to the citation from the SI Brochure BIPM above.

Converting between concentration units is complicated (at least for concentrations in chemistry). pint would need a lot more information to do the coversion (molar mass, density, non-ideal mixing effects etc.). So this is best handled in a seperate chemistry / chemical engineering package that may build upon pint (like https://thermo.readthedocs.io/index.html does).

hCraker commented 2 years ago

Thanks for the response @dalito. It sounds like this issue should be closed since these sorts of unit conversions belong in an extension to pint and not in pint itself.

jules-ch commented 1 year ago

Anyone want to contribute to this? I think we can all agree on percent, ppm support for new releases of pint.

It's gonna be a breaking change since % is interpreted as modulo by the parser. Idk if that's still the case.

Thought @hgrecco

cpascual commented 1 year ago

Anyone want to contribute to this?

If we are talking about adding a pre-processor for replacing % with percent and then adding percent and ppm to the registry (and add some unit tests and docs), I can do it.

Please just confirm that this is the wanted approach (@jules-ch , @hgrecco , ...)

jules-ch commented 1 year ago

Yes thats what I'm thinking ! @hgrecco agreed ?

cpascual commented 1 year ago

So, I have this preliminary scratch that runs without issues:

from pint import UnitRegistry
from pint.testing import assert_equal

pct_preproc = lambda string: string.replace("%", "percent")
ureg = UnitRegistry(preprocessors=[pct_preproc])

ureg.define("percent = 0.01 = %")
ureg.define("ppm = 1e-6")

assert ureg("%") == ureg("percent") == ureg.percent
assert ureg("ppm") == ureg.ppm

a = ureg.Quantity("10 %")
b = ureg.Quantity("100 ppm")
c = ureg.Quantity("0.5")

assert f"{a}" == "10 percent"
assert f"{a:~}" == "10 %"
assert f"{b}" == "100 ppm"
assert f"{b:~}" == "100 ppm"

assert_equal(a, 0.1)
assert_equal(1000 * b, a) 
assert_equal(c, 5 * a)

assert_equal ((1 * ureg.meter)/(1 * ureg.kilometer), 0.1 * ureg.percent)
assert c.to("percent").m == 50
# assert c.to("%").m == 50   # <-- this raises an exception

As you can see, it all works as expected except for the last (commented) line, where I check the use of % in Quantity.to(), and which raises the following exception:

Traceback (most recent call last):
  File "/home/carlos/.config/VSCodium/User/globalStorage/buenon.scratchpads/scratchpads/56b8016bc5c16deae167b5cfb0a76e9f/scratch52.py", line 29, in <module>
    assert c.to("%").m == 50  # <-- this raises an exception
  File "/home/carlos/src/pint/pint/facets/plain/quantity.py", line 520, in to
    other = to_units_container(other, self._REGISTRY)
  File "/home/carlos/src/pint/pint/util.py", line 901, in to_units_container
    return registry._parse_units(unit_like)
  File "/home/carlos/src/pint/pint/facets/nonmultiplicative/registry.py", line 66, in _parse_units
    return super()._parse_units(input_string, as_delta, case_sensitive)
  File "/home/carlos/src/pint/pint/facets/plain/registry.py", line 1095, in _parse_units
    units = ParserHelper.from_string(input_string, self.non_int_type)
  File "/home/carlos/src/pint/pint/util.py", line 625, in from_string
    ret = build_eval_tree(gen).evaluate(
  File "/home/carlos/src/pint/pint/pint_eval.py", line 120, in evaluate
    raise DefinitionSyntaxError('missing unary operator "%s"' % op_text)
pint.errors.DefinitionSyntaxError: missing unary operator "%"

It seems that the registry preprocessors are not used when calling .to(). I can try to debug that myself, but if any of you knows the solution straight away, it would save me some time.

cpascual commented 1 year ago

Also, a question about design: do we want percent and ppm to be unitless (as in my example above) or to be based on the dimensionless count unit?

i.e, do we want percent = 0.01 = % or percent = 0.01 * count = %?

keewis commented 1 year ago

I don't think it should be based on count, since percent is a relation (ratio?) of two count values: n_subset / n_total. We could introduce a new dimensionless unit (ratio? proportion? fraction?), though, and base percent (%), permille (), and ppm on that.

jules-ch commented 1 year ago

We can use what has been defined in MetPy both preprocessors & percent definition. It works with MetPy without any hiccups

jules-ch commented 1 year ago

See https://github.com/Unidata/MetPy/blob/main/src/metpy/units.py

jules-ch commented 1 year ago

I think we can treat the bug you found in another issue @cpascual

cpascual commented 1 year ago

Good, then I'll submit a PR with the approach that I drafted in my example (which is essentially the same as in MetPy, AFAICT) and leave the support of .to("%") to a different issue.

Is that ok with you @jules-ch ?