hgrecco / pint

Operate and manipulate physical quantities in Python
http://pint.readthedocs.org/
Other
2.4k stars 472 forks source link

Automatically alias definitions where 1 X = 1 Y, and hash-equivalent aliases #1433

Open clbarnes opened 2 years ago

clbarnes commented 2 years ago

Example: microns and micrometers are the same thing: i.e. Quantity(1, "micron") == Quantity(1, "micrometer") because in the default registry configuration, there's the line micron = micrometer. However, Unit("micron") != Unit("micrometer"), i.e. they are not aliases, unless you set it in your local registry.

Would it be possible to have unit definitions for X and Y where N X = N Y automatically be added as an alias as well?

Additionally, while aliases affect __eq__, they don't affect __hash__, which means they are treated differently in sets and dicts. Would it be too breaking a change to implement this, possibly by determining a normative name for all unit aliases (in the same way that abbreviations map to the full unit name) to hash against?

hgrecco commented 2 years ago

I think I am not following. In Pint two quantities are equal if they have the same magnitude and unit (after converting one to the other units). So it is not possible to have Quantity(1, "micron") == Quantity(1, "micrometer") unless Unit("micron") == Unit("micrometer"). And in fact this is what happens.

>>> import pint
>>> ureg = pint.UnitRegistry()
>>> ureg.Unit("micrometer") == ureg.Unit("micrometer")
True
>>> ureg.Unit("micrometer") != ureg.Unit("micrometer")
False
clbarnes commented 2 years ago

Is there a typo in your snippet? For my point, at least, I think the second comparison should be between micron and micrometer.

In a clean virtual environment with brand-new pip install pint, I get

In [1]: import pint

In [2]: ureg = pint.UnitRegistry()

In [3]: ureg.Unit("micron") == ureg.Unit("micrometer")
Out[3]: False

In [4]: ureg.Quantity("1 micrometer") == ureg.Quantity("1 micron")
Out[4]: True

I think this is because, as you say, when quantities are compared, one is converted to the other before checking and then the magnitudes are compared. The definitions file tells you how to convert one unit into another. However, when there's no magnitude, no conversion takes place, and so the two units are simply compared (and the names are different, so it's false). If they're defined as an alias, they are equal, so my proposal is that trivial conversions like micron/micrometer automatically define aliases as well.

hgrecco commented 2 years ago

You are right, sorry about that.

But I still do not follow. What do you mean by defining them as alias. They are alias. The definition file defines units, not quanities.

For the record, in the past Units were NOT exposed to the user and were just dicts. When the Unit Class was introduced the previous behavior was kept. I wonder if there is any penalty or problem to change to canonic names upon comparison (or on definition while keeping what the user has chosen as well)

clbarnes commented 2 years ago

As I understand it, the definition file, in this context, has two modes of operation: conversions ("how to turn a quantity with one unit into a quantity with another unit") and aliases ("which sets of units mean the same thing"). Aliases imply a conversion (I think), but trivial conversions do not, at present, imply an alias.

Again, as I understand it:

# tell pint how to convert into microns into half-microns
half_micron = 2 * micron

# tell pint how to convert microns into micrometers
micron = micrometer
# which is implicitly the same as
micron = 1 micrometer

# tell pint that microns and micrometers are the same thing
@alias micron = micrometer

Saying that a micron is equal to 1 micrometer is not the same thing, in pint's eyes, as saying that micron is an alias for micrometer. My suggestion is that micron = 1 micrometer is parsed and implicitly creates the alias as well.

You can check this:

>>> import pint
>>> ureg1 = pint.UnitRegistry()
>>> ureg1.Unit("micron") == ureg2.Unit("micrometer")
False
>>> ureg2 = pint.UnitRegistry()
>>> ureg2.define("@alias micron = micrometer")
>>> ureg2.Unit("micron") == ureg2.Unit("micrometer")
True

You need to define 2 uregs for this to work, I suspect because the new definition doesn't remove the cached comparison (maybe it should).

hgrecco commented 2 years ago

The definition file only contains one thing relating: units definitions. For each unit you can define the name, conversion to other units, symbol and aliases. That is all.

For example

>>> ureg1.Unit("meter") == ureg1.Unit("metre")
True
>>> ureg1.Unit("meter") == ureg1.Unit("m")
True

because the definition file states:

meter = [length] = m = metre

So if you change the defintion file to

micron = 1e-6 * meter = µ = micrometer

you should get

>>> ureg.Unit("micron") == ureg.Unit("micrometer")
True

(I am guessing here, haven't tested)

What you are seeing is a bug. I think do to the fact that we do put the conversion in a canonical form.

We should fix it, thanks for spotting this.

clbarnes commented 2 years ago

This is already in the default definition file: https://github.com/hgrecco/pint/blob/master/pint/default_en.txt#L147 and unfortunately, that last test does not pass, unless an explicit @alias is given as documented here https://pint.readthedocs.io/en/stable/defining.html If it's just a bug, then great! No design decisions to be made :)

hgrecco commented 2 years ago

No. The current definition that you are linking states:

micron = micrometer = µ

Have you tried replacing it for my suggestion?

jules-ch commented 2 years ago

careful micrometers are 1e-6 meters :)

hgrecco commented 2 years ago

Yes! An ugly error on my side. But I still wonder if thia approach fixes the bug

(I edit the comment to avoid having something THAT wrong around :-D)

hgrecco commented 2 years ago

Indeed, changing the definition to this micron = 1e-6 * meter = µ = micrometer fixes the problem. But I think it would be better to always convert units in the converters to base units to avoid these problems.

clbarnes commented 2 years ago

PR raised!

I think it would be better to always convert units in the converters to base units to avoid these problems.

Would canonicalising units in __hash__ also be on the table?

hgrecco commented 2 years ago

Yes, but in another PR. We need to see how it's impact. Wha tdo you think @jules-ch ?