yt-project / yt

Main yt repository
http://yt-project.org
Other
461 stars 276 forks source link

Cosmology unit arithmetic is behaving oddly #4622

Open jzuhone opened 1 year ago

jzuhone commented 1 year ago

Bug report

Bug summary

For cosmological datasets, conversions between code_length, which are typically comoving, and physical lengths, such as cm and Mpc, are time-dependent, since the notion of a comoving length scale is constant with time but the actual physical distance associated with that scale increases with scale factor. yt and unyt allow for the possibility of multiplying unyt_array and unyt_quantity instances across different datasets with different scale factors, and the results should be consistent. I found some ways in which they do not appear to be.

Code for reproduction

@ngoldbaum's original text for the yt 3.0 (now 4.0) paper has a code snippet that goes like this at first:

>>> import yt
>>> ds1 = yt.load('Enzo_64/DD0002/data0002')
>>> ds2 = yt.load('Enzo_64/DD0043/data0043')
>>> print(ds1.length_unit, ds2.length_unit)
128 Mpccm/h, 128 Mpccm/h
>>> print(ds1.length_unit.in_cgs())
6.26145538088e+25 cm
>>> print(ds2.length_unit.in_cgs())
5.55517285026e+26 cm 

So far, so good. Now if we multiply the length_units by each other, in his example we get this:

>>> print(ds1.length_unit*ds2.length_unit)
145359.100149 Mpccm**2
>>> print(ds2.length_unit*ds1.length_unit)
1846.7055432 Mpccm**2

This seems not intuitive, but what's happening here is that in binary operations the first quantity's unit registry is used, and the of the second is converted.

The current versions of yt and unyt instead give the same result regardless of multiplication order:

>>> print(ds1.length_unit*ds2.length_unit)
32501.48780004 Mpccm**2
>>> print(ds2.length_unit*ds1.length_unit)
32501.48780004 Mpccm**2

Which at first glance appears that the above behavior has changed (I'm not sure when) to be more intuitive, since in reality of course the unit Mpccm/h is the same between the datasets even if the scale factor "comoving" changes between them. However, this result is wrong, given the scale factors don't agree:

>>> print((ds1.length_unit*ds2.length_unit).to("Mpc**2"))
3653.18593144, 'Mpc**2
>>> print((ds2.length_unit*ds1.length_unit).to("Mpc**2"))
3653.18593144, 'Mpc**2'
>>> print((ds1.length_unit*ds2.length_unit).units.registry.lut["a"])
(0.11255715934195751, 1, 0.0, '\\rm{a}', False)
>>> print((ds2.length_unit*ds1.length_unit).units.registry.lut["a"])
(0.9986088499304776, 1, 0.0, '\\rm{a}', False)

and this result is downright pathological:

>>> print(ds1.length_unit*ds2.length_unit)
32501.48780004 Mpccm**2
>>> print((ds1.length_unit*ds2.length_unit).to("Mpccm**2"))
288353.69995810196 Mpccm**2
>>> print(ds2.length_unit*ds1.length_unit)
32501.48780004 Mpccm**2
>>> print((ds2.length_unit*ds1.length_unit).to("Mpccm**2"))
3663.371440594022 Mpccm**2

I'm not sure what to do here, but we need to fix this so that it gives a sensible result every time, regardless of what we think that result should be. Pinging various stakeholders.

Version Information

neutrinoceros commented 1 year ago

Sounds like something I would break, as I care about symmetry of multiplication but I'm mostly unaware of how comoving lengths work. I bet this regression was introduced sometime between unyt 2.8.0 and 2.9.5

neutrinoceros commented 1 year ago

Actually I get the same (broken) behaviour on (yt 4.0.0 + unyt 2.8.0) and (yt 4.2.1 + unyt 2.9.5). Now I'm suspecting it could be a change of behaviour in one of unyt's dependencies (I've used numpy 1.25 + sympy 1.12 in both tests), or that the regression is older than yt 4.0.0 ?

brittonsmith commented 1 year ago

Comoving units have always worried me in this context. Regardless of the changing behavior, my recollection is that arithmetic is done by converting to a common unit, doing the math, and converting back. I also seem to remember that this common unit for length was cm. Would it be difficult to add the ability to control the units in which math operations are performed? If so, then we could designate a comoving unit for this and these problems should go away.

Barring that, it would probably be better to raise an exception if the common arithmetic unit is not the same between two numbers.