yt-project / unyt

Handle, manipulate, and convert data with units in Python
https://unyt.readthedocs.io
BSD 3-Clause "New" or "Revised" License
364 stars 49 forks source link

Particle IDs are converted to floats although saved as unsigned integers #517

Closed arkordt closed 1 month ago

arkordt commented 1 month ago

Description

I was working with an arepo snapshot and noticed that the ParticleIDs field is of type float64 although it is saved as uint64 in the hdf5 file. Originally suspecting a data type conversion in yt, it turns out that the conversion is done in unyt. Most physical quantities may be represented by floating point numbers but this is not appropriate for ID variables.

In the current yt main branch version (66ddd0eb1), the unit conversion occurs in yt/data_objects/selection_objects/data_selection_objects.py:217. It calls convert_to_unit on a YTArray and in unyt/array.py:712, there is an explicit conversion of all integer-like types to float32 or float64 (depending on the number of particles). There is an additional conversion of all values to float64 in yt (see here), too, but even commenting out the conversion part, IDs are returned as floats.

Although this bug can be traced back to unyt, I am not sure if it may be better to fix it in yt, e.g. by skipping the unit conversion for certain dimensionless quantities.

What I Did

I was using my own arepo snapshot but it is the same for the yt test data. Tracing back the location where the conversion occurs was done using pdb.

>>> import yt
>>> ds = yt.load('yt-test-data/ArepoBullet/snapshot_150.hdf5')
>>> ds.all_data()['PartType0', 'ParticleIDs']
unyt_array([1710352., 1710346., 1710350., ..., 5721056., 5843308., 5843259.],
      dtype=float32, units='(dimensionless)')
chrishavlin commented 1 month ago

Here's a simpler unyt-only illustration:

import unyt 
x = unyt.unyt_array([1, 2], "1", dtype='int')
print(x, x.dtype)
y = x.to("1")
print(y, y.dtype)
[1 2] dimensionless int64
[1. 2.] dimensionless float64

But this is actually the currently expected behavior (see https://unyt.readthedocs.io/en/stable/usage.html#dealing-with-data-types ).

Most physical quantities may be represented by floating point numbers but this is not appropriate for ID variables.

I think that this point (which I agree with) is more of a yt issue. Maybe particle IDs need to be handled with a special case since they are not physical quantities.

neutrinoceros commented 1 month ago

Indeed I don't think there's anything to fix on unyt's side, but it should be addressed in yt.