Closed arkordt closed 1 month ago
Here's a simpler unyt-only illustration:
import unyt
x = unyt.unyt_array([1, 2], "1", dtype='int')
print(x, x.dtype)
y = x.to("1")
print(y, y.dtype)
[1 2] dimensionless int64
[1. 2.] dimensionless float64
But this is actually the currently expected behavior (see https://unyt.readthedocs.io/en/stable/usage.html#dealing-with-data-types ).
Most physical quantities may be represented by floating point numbers but this is not appropriate for ID variables.
I think that this point (which I agree with) is more of a yt issue. Maybe particle IDs need to be handled with a special case since they are not physical quantities.
Indeed I don't think there's anything to fix on unyt's side, but it should be addressed in yt.
Description
I was working with an arepo snapshot and noticed that the ParticleIDs field is of type float64 although it is saved as uint64 in the hdf5 file. Originally suspecting a data type conversion in yt, it turns out that the conversion is done in unyt. Most physical quantities may be represented by floating point numbers but this is not appropriate for ID variables.
In the current yt main branch version (66ddd0eb1), the unit conversion occurs in yt/data_objects/selection_objects/data_selection_objects.py:217. It calls
convert_to_unit
on aYTArray
and in unyt/array.py:712, there is an explicit conversion of all integer-like types to float32 or float64 (depending on the number of particles). There is an additional conversion of all values to float64 in yt (see here), too, but even commenting out the conversion part, IDs are returned as floats.Although this bug can be traced back to unyt, I am not sure if it may be better to fix it in yt, e.g. by skipping the unit conversion for certain dimensionless quantities.
What I Did
I was using my own arepo snapshot but it is the same for the yt test data. Tracing back the location where the conversion occurs was done using pdb.