Particle IDs are converted to floats although saved as unsigned integers

arkordt commented 2 months ago

unyt version: 3.0.2
Python version: 3.12.6
Operating System: macOS

Description

I was working with an arepo snapshot and noticed that the ParticleIDs field is of type float64 although it is saved as uint64 in the hdf5 file. Originally suspecting a data type conversion in yt, it turns out that the conversion is done in unyt. Most physical quantities may be represented by floating point numbers but this is not appropriate for ID variables.

In the current yt main branch version (66ddd0eb1), the unit conversion occurs in yt/data_objects/selection_objects/data_selection_objects.py:217. It calls convert_to_unit on a YTArray and in unyt/array.py:712, there is an explicit conversion of all integer-like types to float32 or float64 (depending on the number of particles). There is an additional conversion of all values to float64 in yt (see here), too, but even commenting out the conversion part, IDs are returned as floats.

Although this bug can be traced back to unyt, I am not sure if it may be better to fix it in yt, e.g. by skipping the unit conversion for certain dimensionless quantities.

What I Did

I was using my own arepo snapshot but it is the same for the yt test data. Tracing back the location where the conversion occurs was done using pdb.

yt version: 4.3 or current main branch

>>> import yt
>>> ds = yt.load('yt-test-data/ArepoBullet/snapshot_150.hdf5')
>>> ds.all_data()['PartType0', 'ParticleIDs']
unyt_array([1710352., 1710346., 1710350., ..., 5721056., 5843308., 5843259.],
      dtype=float32, units='(dimensionless)')

chrishavlin commented 2 months ago

Here's a simpler unyt-only illustration:

import unyt 
x = unyt.unyt_array([1, 2], "1", dtype='int')
print(x, x.dtype)
y = x.to("1")
print(y, y.dtype)

[1 2] dimensionless int64
[1. 2.] dimensionless float64

But this is actually the currently expected behavior (see https://unyt.readthedocs.io/en/stable/usage.html#dealing-with-data-types ).

Most physical quantities may be represented by floating point numbers but this is not appropriate for ID variables.

I think that this point (which I agree with) is more of a yt issue. Maybe particle IDs need to be handled with a special case since they are not physical quantities.

neutrinoceros commented 2 months ago

Indeed I don't think there's anything to fix on unyt's side, but it should be addressed in yt.

yt-project / unyt

Particle IDs are converted to floats although saved as unsigned integers #517

Description

What I Did