yt-project / yt

Main yt repository
http://yt-project.org
Other
454 stars 272 forks source link

BLD: require numpy 2.0.0 (stable) at build time #4930

Closed neutrinoceros closed 1 week ago

neutrinoceros commented 3 weeks ago

PR Summary

Numpy 2.0 just dropped on PyPI (πŸŽ‰). This change is a pretext to make sure everything still runs smoothly. From the conversation on #4874, I expect I'll need to bump a couple answer tests but everything else I expect to stay green.

Closes #4874

neutrinoceros commented 3 weeks ago

I'm puzzled. I don't understand how it's possible that the one of the test is still failing after bumping.

matthewturk commented 2 weeks ago

Is it possible there's a NaN? I'm not sure how NaNs would impact the typical equality/hash comparisons.

neutrinoceros commented 2 weeks ago

Good thinking ! indeed, hashing an array isn't a stable operation if it contains a NaN

In [8]: a = np.arange(10, dtype="float")

In [9]: a
Out[9]: array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

In [10]: a[0] = np.nan

In [11]: hash(tuple(a))
Out[11]: -5529612596851953840

In [12]: hash(tuple(a))
Out[12]: -5371981722103082231

In [13]: hash(tuple(a))
Out[13]: -7440010721638526525

In [14]: hash(tuple(a))
Out[14]: -4675353377044086749

In [15]: hash(tuple(a))
Out[15]: 1634418349654970603

In [16]: hash(tuple(a))
Out[16]: 5494725379304409846

In [17]: a = np.arange(10, dtype="float")

In [18]: hash(tuple(a))
Out[18]: -2040549277248155741

In [19]: hash(tuple(a))
Out[19]: -2040549277248155741

In [20]: hash(tuple(a))
Out[20]: -2040549277248155741

In [21]: hash(tuple(a))
Out[21]: -2040549277248155741

In [22]: hash(tuple(a))
Out[22]: -2040549277248155741

In [23]: hash(tuple(a))
Out[23]: -2040549277248155741

(I should point out that this little experiment was conducted with numpy 1.26) So maybe what's happening is that we have some NaN in an output field that wasn't present when running with numpy 1.x πŸ€” Could be a bug on our side or a in NumPy (or both ?? πŸ™€) @chrishavlin, any chance you still have whatever scripts you used to decipher this situation a couple days ago ?

matthewturk commented 2 weeks ago

I really need to learn more about NaNs. I had no idea there was, officially, a 'payload': https://en.wikipedia.org/wiki/NaN

chrishavlin commented 2 weeks ago

oh interesting, it's not even the same tests that are failing -- still enzo, but the velocity_divergence field rather than the velocity_magnitude. Likely a very similar issue though, I'll take a closer look this morning.

neutrinoceros commented 2 weeks ago

Awesome, thank you very much. Feel free to push to this branch if you need to !

chrishavlin commented 2 weeks ago

well I'm thoroughly confused. Manually comparing values didn't show any difference in grid values for ('gas', 'velocity_divergence') between np 2 and np 1.26.4. Then I spent a while getting nose working with py 3.10 to match what Jenkins is doing.... and I reproduced the error with just np2:

in an environment with np2, store the answer tests for enzo:

nosetests --answer-store --local --with-answer-testing --local-dir /Users/chavlin/data/yt_answer_store/enzo_test yt/frontends/enzo/tests/test_outputs.py  --answer-name=local-ezno yt.frontends.enzo --answer-big-data --verbose

and then immediately run a comparison in the same np2 environment

nosetests --local --with-answer-testing --local-dir /Users/chavlin/data/yt_answer_store/enzo_test yt/frontends/enzo/tests/test_outputs.py  --answer-name=local-ezno yt.frontends.enzo --answer-big-data --verbose

and that test fails for me:


======================================================================
FAIL: GridValues_galaxy0030_all_('gas', 'velocity_divergence')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/chavlin/.pyenv/versions/3.10.12/envs/yt_np2_delete/lib/python3.10/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/Users/chavlin/src/yt_/yt_dev/yt/yt/utilities/answer_testing/framework.py", line 418, in __call__
    self.compare(nv, ov)
  File "/Users/chavlin/src/yt_/yt_dev/yt/yt/utilities/answer_testing/framework.py", line 700, in compare
    assert_equal(new_result[k], old_result[k])
  File "/Users/chavlin/.pyenv/versions/3.10.12/envs/yt_np2_delete/lib/python3.10/site-packages/numpy/testing/_private/utils.py", line 414, in assert_equal
    raise AssertionError(msg)
AssertionError: 
Items are not equal:
 ACTUAL: 'ec255050b61b671af5425c0e4ae7a17d'
 DESIRED: 'f16afe8932b49e9b2acadf304ae3a168'

But this is only happening within nose -- I made a little manual answer test and couldn't reproduce the failure (script is here). Not even sure where to look now...

EDIT: on my desktop (Ubuntu), I don't have the above issue -- storing the enzo tests then immediately re-running passes as expected.

neutrinoceros commented 2 weeks ago

Interesting that you get

 ACTUAL: 'ec255050b61b671af5425c0e4ae7a17d'
 DESIRED: 'f16afe8932b49e9b2acadf304ae3a168'

while the latest attempt on Jenkins has

 ACTUAL: 'f16afe8932b49e9b2acadf304ae3a168'
 DESIRED: 'ec255050b61b671af5425c0e4ae7a17d'

So the hash isn't stable but it's not completely random either.

On my desktop (Ubuntu), I don't have the above issue -- storing the enzo tests then immediately re-running passes as expected.

Just curious, how many attempts did you make ? Was it one (un)lucky try ?

matthewturk commented 2 weeks ago

Maybe there's an unstable ordering issue -- is it possible argsort could be involved?

neutrinoceros commented 2 weeks ago

wait sorting can be unstable ??? in what sense ?

matthewturk commented 2 weeks ago

So I'm speculating. :) What I'm wondering is if argsort could potentially have an instability. I suspect that I'm way off in the wild on this, though. My concern was what if:

indices = np.argsort(some_array_with_duplicates)

and then sorting by those indices could potentially result in different orderings. But I'm kind of grasping at straws here and I don't think it's the source of the error.

I will note that we've struggled with divergence in the past -- it has been the source of differences (but reliable, not stochastic) as a result of order-of-operations, subtraction of tiny numbers from single precision, etc.

neutrinoceros commented 2 weeks ago

Oh I missed that you were talking about *arg*sort, which indeed may be unstable, if the fact that it accepts an optional keyword argument kind="stable" is any indication

matthewturk commented 2 weeks ago

oh, and stable is new in 2.0.0!

matthewturk commented 2 weeks ago

Ah, but while I thought we used argsort in testing, turns out we kinda don't.

$ grep -lr argsort --include="*.py"
yt/visualization/volume_rendering/create_spline.py
yt/visualization/plot_modifications.py
yt/fields/tests/test_fields.py
yt/frontends/tipsy/io.py
yt/frontends/stream/data_structures.py
yt/frontends/stream/definitions.py
yt/frontends/art/io.py
yt/frontends/gadget/io.py
yt/frontends/flash/data_structures.py
yt/data_objects/selection_objects/ray.py
yt/data_objects/level_sets/tests/test_clump_finding.py
yt/data_objects/level_sets/clump_tools.py
yt/data_objects/particle_trajectories.py
yt/geometry/coordinates/cartesian_coordinates.py
yt/utilities/particle_generator.py
yt/utilities/lib/tests/test_geometry_utils.py
yt/utilities/lib/cykdtree/tests/test_utils.py
yt/utilities/flagging_methods.py
neutrinoceros commented 2 weeks ago

oh, and stable is new in 2.0.0!

is it ? the docs says

Changed in version 1.15.0.: The β€˜stable’ option was added.

matthewturk commented 2 weeks ago

Right -- kind="stable" was in 1.15.0, but stable: bool was added in 2.0.0. (The two are not unrelated though :) )

neutrinoceros commented 2 weeks ago

Ooooow. To think I actually worked on this a couple months back and apparently obliterated it from my memory...

chrishavlin commented 2 weeks ago

well i don't have an answer but have found some suspicious behavior. I'm going to summarize it all here

how to reproduce failure

install yt, nose

Working from a fresh py3.10 environment:

pip install -e .[enzo,test]

For running nose with python 3.10, you need to edit nose a bit. The jenkins build does the following:

find .venv/lib/python3.10/site-packages/nose -name '*.py' -exec sed -i -e s/collections.Callable/collections.abc.Callable/g '{}' ';'

on a mac and with a pyenv-virtualenv called yt_np2_test_again i used:

find /Users/chavlin/.pyenv/versions/yt_np2_test_again/lib/python3.10/site-packages/nose -name "*.py" -exec sed -i '' s/collections.Callable/collections.abc.Callable/g {} +

running the tests

In an effort to match more closely what jenkins is doing, I hacked away at tests/nose_runner.py and tests/tests.yaml so that I could call python tests/nose_runner.py and have it just run the suspect enzo test (code on this branch). My initial comment above (this comment) actually has a mistake in how nosetests is called: it included both a path to the enzo test file and the module specification, so it was running tests twice (turns out I think it was exposing the same or a very similar error though -- see below).

the failure

Here's the output of running python tests/nose_runner.py (the hacked version that will just call the suspect enzo test):

(yt_np2_test_again) chavlin@lism01-chavlin yt % python tests/nose_runner.py               
NoseWorker-1: WILL DO self.name = py310_local_enzo_010
GridHierarchy_galaxy0030_all ... ok
ParentageRelationships_galaxy0030_all ... ok
GridValues_galaxy0030_all_('gas', 'velocity_divergence') ... ok
PixelizedProjectionValues_galaxy0030_all_('gas', 'velocity_divergence')_0_None ... ok
PixelizedProjectionValues_galaxy0030_all_('gas', 'velocity_divergence')_0_('gas', 'density') ... ok
PixelizedProjectionValues_galaxy0030_sphere_('max', (0_1, 'unitary'))_('gas', 'velocity_divergence')_0_None ... ok
PixelizedProjectionValues_galaxy0030_sphere_('max', (0_1, 'unitary'))_('gas', 'velocity_divergence')_0_('gas', 'density') ... ok
PixelizedProjectionValues_galaxy0030_all_('gas', 'velocity_divergence')_1_None ... ok
PixelizedProjectionValues_galaxy0030_all_('gas', 'velocity_divergence')_1_('gas', 'density') ... ok
PixelizedProjectionValues_galaxy0030_sphere_('max', (0_1, 'unitary'))_('gas', 'velocity_divergence')_1_None ... ok
PixelizedProjectionValues_galaxy0030_sphere_('max', (0_1, 'unitary'))_('gas', 'velocity_divergence')_1_('gas', 'density') ... ok
PixelizedProjectionValues_galaxy0030_all_('gas', 'velocity_divergence')_2_None ... ok
PixelizedProjectionValues_galaxy0030_all_('gas', 'velocity_divergence')_2_('gas', 'density') ... ok
PixelizedProjectionValues_galaxy0030_sphere_('max', (0_1, 'unitary'))_('gas', 'velocity_divergence')_2_None ... ok
PixelizedProjectionValues_galaxy0030_sphere_('max', (0_1, 'unitary'))_('gas', 'velocity_divergence')_2_('gas', 'density') ... ok

----------------------------------------------------------------------
XML: /Users/chavlin/src/yt_/yt_dev/np2_testing/yt/nosetests.xml
----------------------------------------------------------------------
Ran 15 tests in 58.996s

OK
Updating and precise version information requires 
gitpython to be installed.
Try: python -m pip install gitpython
GridHierarchy_galaxy0030_all ... ok
ParentageRelationships_galaxy0030_all ... ok
GridValues_galaxy0030_all_('gas', 'velocity_divergence') ... FAIL
PixelizedProjectionValues_galaxy0030_all_('gas', 'velocity_divergence')_0_None ... ok
PixelizedProjectionValues_galaxy0030_all_('gas', 'velocity_divergence')_0_('gas', 'density') ... ok
PixelizedProjectionValues_galaxy0030_sphere_('max', (0_1, 'unitary'))_('gas', 'velocity_divergence')_0_None ... ok
PixelizedProjectionValues_galaxy0030_sphere_('max', (0_1, 'unitary'))_('gas', 'velocity_divergence')_0_('gas', 'density') ... ok
PixelizedProjectionValues_galaxy0030_all_('gas', 'velocity_divergence')_1_None ... ok
PixelizedProjectionValues_galaxy0030_all_('gas', 'velocity_divergence')_1_('gas', 'density') ... ok
PixelizedProjectionValues_galaxy0030_sphere_('max', (0_1, 'unitary'))_('gas', 'velocity_divergence')_1_None ... ok
PixelizedProjectionValues_galaxy0030_sphere_('max', (0_1, 'unitary'))_('gas', 'velocity_divergence')_1_('gas', 'density') ... ok
PixelizedProjectionValues_galaxy0030_all_('gas', 'velocity_divergence')_2_None ... ok
PixelizedProjectionValues_galaxy0030_all_('gas', 'velocity_divergence')_2_('gas', 'density') ... ok
PixelizedProjectionValues_galaxy0030_sphere_('max', (0_1, 'unitary'))_('gas', 'velocity_divergence')_2_None ... ok
PixelizedProjectionValues_galaxy0030_sphere_('max', (0_1, 'unitary'))_('gas', 'velocity_divergence')_2_('gas', 'density') ... ok

======================================================================
FAIL: GridValues_galaxy0030_all_('gas', 'velocity_divergence')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/chavlin/.pyenv/versions/yt_np2_test_again/lib/python3.10/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/Users/chavlin/src/yt_/yt_dev/np2_testing/yt/yt/utilities/answer_testing/framework.py", line 418, in __call__
    self.compare(nv, ov)
  File "/Users/chavlin/src/yt_/yt_dev/np2_testing/yt/yt/utilities/answer_testing/framework.py", line 700, in compare
    assert_equal(new_result[k], old_result[k])
  File "/Users/chavlin/.pyenv/versions/yt_np2_test_again/lib/python3.10/site-packages/numpy/testing/_private/utils.py", line 414, in assert_equal
    raise AssertionError(msg)
AssertionError: 
Items are not equal:
 ACTUAL: 'f16afe8932b49e9b2acadf304ae3a168'
 DESIRED: 'ec255050b61b671af5425c0e4ae7a17d'

----------------------------------------------------------------------
XML: /Users/chavlin/src/yt_/yt_dev/np2_testing/yt/nosetests.xml
----------------------------------------------------------------------
Ran 15 tests in 60.008s

FAILED (failures=1)
NoseWorker-1: Exiting

Initially the answer store does not exist -- it runs the tests and stores them. Then it immediately runs the tests again and fails with the exact same ACTUAL/DESIRED hashes in the failing test here.

getting it to pass?

run again

If I immediately run tests again, with the answer store already existing, tests pass:

(yt_np2_test_again) chavlin@lism01-chavlin yt % python tests/nose_runner.py
NoseWorker-1: WILL DO self.name = py310_local_enzo_010
Updating and precise version information requires 
gitpython to be installed.
Try: python -m pip install gitpython
GridHierarchy_galaxy0030_all ... ok
ParentageRelationships_galaxy0030_all ... ok
GridValues_galaxy0030_all_('gas', 'velocity_divergence') ... ok
PixelizedProjectionValues_galaxy0030_all_('gas', 'velocity_divergence')_0_None ... ok
PixelizedProjectionValues_galaxy0030_all_('gas', 'velocity_divergence')_0_('gas', 'density') ... ok
PixelizedProjectionValues_galaxy0030_sphere_('max', (0_1, 'unitary'))_('gas', 'velocity_divergence')_0_None ... ok
PixelizedProjectionValues_galaxy0030_sphere_('max', (0_1, 'unitary'))_('gas', 'velocity_divergence')_0_('gas', 'density') ... ok
PixelizedProjectionValues_galaxy0030_all_('gas', 'velocity_divergence')_1_None ... ok
PixelizedProjectionValues_galaxy0030_all_('gas', 'velocity_divergence')_1_('gas', 'density') ... ok
PixelizedProjectionValues_galaxy0030_sphere_('max', (0_1, 'unitary'))_('gas', 'velocity_divergence')_1_None ... ok
PixelizedProjectionValues_galaxy0030_sphere_('max', (0_1, 'unitary'))_('gas', 'velocity_divergence')_1_('gas', 'density') ... ok
PixelizedProjectionValues_galaxy0030_all_('gas', 'velocity_divergence')_2_None ... ok
PixelizedProjectionValues_galaxy0030_all_('gas', 'velocity_divergence')_2_('gas', 'density') ... ok
PixelizedProjectionValues_galaxy0030_sphere_('max', (0_1, 'unitary'))_('gas', 'velocity_divergence')_2_None ... ok
PixelizedProjectionValues_galaxy0030_sphere_('max', (0_1, 'unitary'))_('gas', 'velocity_divergence')_2_('gas', 'density') ... ok

----------------------------------------------------------------------
XML: /Users/chavlin/src/yt_/yt_dev/np2_testing/yt/nosetests.xml
----------------------------------------------------------------------
Ran 15 tests in 58.911s

OK

dataset state?

In trying to find the actual cause of the error, I came across a couple of ways to get the test to pass that are likely fixing a symptom and not the cause...

First, test_galaxy0030 instantiates a ds that gets passed to the big_patch_amr function:

https://github.com/yt-project/yt/blob/01db599ba50624e74865c465c6c378502598c617/yt/frontends/enzo/tests/test_outputs.py#L117-L126

If I edit that to instead pass the filename of the dataset, then tests pass initially.

Second, the actual calculation of the divergence is very sensitive to small changes in the value of grid-spacing. Here's the relevant code:

https://github.com/yt-project/yt/blob/01db599ba50624e74865c465c6c378502598c617/yt/fields/vector_operations.py#L355-L367

If I change those ds lines to convert to cm (e.g., ds = div_fac * just_one(data["index", "dx"]).to('cm')) then tests pass initially.

chrishavlin commented 2 weeks ago

no update on a fix, but looks like the failure isn't np 2 related. I think this behavior has been around a while and it likely is only being exposed now because of bumping the answers: here's a test script

import yt
import numpy as np

yt.set_log_level(50)

def _get_vals():
    ds = yt.load("IsolatedGalaxy/galaxy0030/galaxy0030")
    g1 = ds.index.grids[0]
    vdiv_1 = g1['gas', 'velocity_divergence']
    return vdiv_1

if __name__ == "__main__":
    print(np.version.version)
    v1 = _get_vals()
    v2 = _get_vals()
    print(np.max(np.abs(v1 - v2))) # 4.930380657631324e-32 1/s
    print(np.all(v1==v2)) # False
    v3 = _get_vals()
    print(np.all(v1==v3)) # False
    print(np.all(v2==v3)) # True

The script above behaves the same for np>2 and np<2 (including when building yt with np<2 -- I went back as far as yt-4.1.4).

neutrinoceros commented 2 weeks ago

So, yt 4.1.4 was in January 2023, but #3856[^1] was first released in yt 4.1.0. Maybe that's actually a regression that happened somewhere in between ?

[^1]: last time these answers were bumped successfully

chrishavlin commented 2 weeks ago

Ok -- kinda figured out why this is happening, but still not certain of the root cause:

So the issue is related to unit registry state. Within the _divergence calculation:

https://github.com/yt-project/yt/blob/01db599ba50624e74865c465c6c378502598c617/yt/fields/vector_operations.py#L355-L367

Those ds = div_fac * just_one(data["index", "dx"]) lines end up with the unit registry of the previous dataset.

Here's a standalone script that illustrates the bug:

import yt

yt.set_log_level(50)

print("load ds 1\n")
ds = yt.load("IsolatedGalaxy/galaxy0030/galaxy0030")
print(f"Initial registry id on load 1: {id(ds.unit_registry)}") # 139759425784224
dx = 1. * ds.index.grids[0]['index', 'dx'][0,0,0]
print(f"dx registry: {id(dx.units.registry)}") # 139759425784224

print("\nload ds 2\n")
ds2 = yt.load("IsolatedGalaxy/galaxy0030/galaxy0030") 
print(f"Initial registry id on load 2: {id(ds2.unit_registry)}") # 140563377278064

dx0 = ds2.index.grids[0]['index', 'dx'][0,0,0]
print(f"dx0 registry: {id(dx0.units.registry)}") # 140563377278064

dx = 1. * dx0
print(f"dx registry: {id(dx.units.registry)}") # 139759425784224

for which you get

load ds 1

Initial registry id on load 1: 140180192729504
Parsing Hierarchy : 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 173/173 [00:00<00:00, 23574.99it/s]
dx registry: 140180192729504

load ds 2

Initial registry id on load 2: 140180163540928
Parsing Hierarchy : 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 173/173 [00:00<00:00, 25036.73it/s]
dx0 registry: 140180163540928
dx registry: 140180192729504

That the unit registry of the final dx variable points to the registry of the previous dataset.

If I comment all the @lru_cache decorators in unyt/array.py then the that final dx points to its proper registry and the subsequent hashes in the failing nose tests will match. So for some reason unyt is picking up the registry from the prior dataset. But it's not immediately clear why that would matter: in stepping through the code, I've found that the following spot results in a change in hash despite being a conversion from '1/s' to '1/s':

https://github.com/yt-project/yt/blob/01db599ba50624e74865c465c6c378502598c617/yt/data_objects/selection_objects/data_selection_objects.py#L288

Adding the following will print out the conversion factor that ends up being used

                        if "velocity_divergence" in str(field):
                            print("_generate_fields")
                            print(id(fd.units.registry))
                            from unyt.array import _sanitize_units_convert
                            new_units = _sanitize_units_convert(fi.units, fd.units.registry)
                            (conv_factor, offset) = fd.units.get_conversion_factor(
                                new_units, fd.dtype
                            )
                            print(f"conversion factor: {conv_factor}")

when you access

import yt
ds = yt.load("IsolatedGalaxy/galaxy0030/galaxy0030")
g1 = ds.index.grids[0]['gas', 'velocity_divergence']

it hits this line twice, the first time through the conversion factor is not exactly 1:

_generate_fields
140347495891200
conversion factor: 1.0000000000000002

the second time through it is exactly one:

_generate_fields
140347495891200
conversion factor: 1.0

and with the unit registry state issue, when you open the dataset again, the initial conversion factor is identically 1.0, so the final hash differs between the two cases.

a quick fix?

turns out that you change this line:

https://github.com/yt-project/yt/blob/01db599ba50624e74865c465c6c378502598c617/yt/fields/vector_operations.py#L365

to pass in the str repr of the unit:

new_field = data.ds.arr(np.zeros(data[xn].shape, dtype=np.float64), str(f.units))

the unit-registry state issue is fixed (because it forces unyt will pull in the correct registry associated with data.ds rather than f?), so maybe we should just do that and be done?

neutrinoceros commented 2 weeks ago

I've found that the following spot results in a change in hash despite being a conversion from '1/s' to '1/s'

If the str '1/s' is (or more specifically, its id) involved in the hash computation I would expect a change: for short strings that only contain alphanumeric (and _) characters, Python only stores one copy of the string, but / isn't part of thes, so two instances of '1/2' while have different ids. Not sure that's relevant though.

But your fix seems safe anyway, so let's ! Thanks again for this very impressive inquiry !

chrishavlin commented 2 weeks ago

If the str '1/s' is (or more specifically, its id) involved in the hash computation I would expect a change: for short strings that only contain alphanumeric (and _) characters, Python only stores one copy of the string, but / isn't part of thes, so two instances of '1/2' while have different ids.

the hashes were just on based the array values though -- there are actual differences in the arrays. Here's a simpler example that illustrates the problem:

import yt

yt.set_log_level(50)

ds = yt.load("IsolatedGalaxy/galaxy0030/galaxy0030")
dx = ds.index.grids[0]['index', 'dx'][0,0,0]
x1 = ds.index.grids[0]['gas', 'relative_velocity_x'][0,0,0]
x2 = x1 / dx
x2c = x2.to(str(x2.units))
print(x2c.units, x2.units, x2c.units == x2.units)
print(x2c - x2)

shows

1/s 1/s True
-5.877471754111438e-39 1/s

in any case, the quick fix is probably the way to go -- i have to hop off right now though, may not get to it before next week.

neutrinoceros commented 2 weeks ago

there are actual differences in the arrays

ooow, I completely missed that. Yes, I think your fix is the way to go, so I'll push it now (with you as the commit author).

neutrinoceros commented 2 weeks ago

just pushed it now. I wasn't sure how to phrase the commit message so feel free to amend it. I'll come back in about an hour to bump the answers again if needed, otherwise I'll just undraft the PR.

neutrinoceros commented 2 weeks ago

Ok Jenkins is back to green, but just to be sure, let's double check that it's really stable : @yt-fido test this please