All data types defined in ml_dtypes/_src/dtypes.cc are assigned kNpyDescrKind = 'V', except for float8_e5m2.
Why is this an issue?
This discrepancy affects the saving and loading of arrays using .npy or .npz file formats. Specifically, numpy.load() can successfully load an .npy file if its header contains 'descr': '<V1'. However, it fails when the header contains 'descr': '<f1', resulting in the following error:
File "/home/user/.local/lib/python3.9/site-packages/numpy/lib/format.py", line 655, in _read_array_header
raise ValueError(msg.format(d['descr'])) from e
ValueError: descr is not a valid dtype descriptor: '<f1'
If an array serialization solution relies on the numpy.save() / numpy.load() APIs, it will experience inconsistent behavior for types defined in ml_dtypes. All ml_dtypes types with kind 'V' (Void) can be saved and loaded, albeit with a loss of type information. However, float8_e5m2 requires special handling, as numpy.load() fails when encountering a header with 'descr': '<f1'.
To ensure consistency and robustness, I propose that all "custom" NumPy types in ml_dtypes should be assigned kind 'V' (Void). This would align all types with the existing convention and avoid issues with serialization.
Risk/Pain Assessment of the Transition
The transition should have minimal impact on platform-independent formats, such as .npy or .npz, since they currently do not work with the float8_e5m2 type. (np.load fails to load 'descr': '<f1' header)
Binary serialization formats, like pickle.dump, would be affected by this change. However, the inherent risks of binary incompatibility are expected for such formats, as they are not intended to serve as reliable interchange formats.
All data types defined in ml_dtypes/_src/dtypes.cc are assigned
kNpyDescrKind = 'V'
, except forfloat8_e5m2
.Why is this an issue?
This discrepancy affects the saving and loading of arrays using .npy or .npz file formats. Specifically, numpy.load() can successfully load an .npy file if its header contains
'descr': '<V1'
. However, it fails when the header contains'descr': '<f1'
, resulting in the following error:If an array serialization solution relies on the
numpy.save()
/numpy.load()
APIs, it will experience inconsistent behavior for types defined in ml_dtypes. All ml_dtypes types with kind'V'
(Void) can be saved and loaded, albeit with a loss of type information. However, float8_e5m2 requires special handling, as numpy.load() fails when encountering a header with'descr': '<f1'
.To ensure consistency and robustness, I propose that all "custom" NumPy types in ml_dtypes should be assigned kind 'V' (Void). This would align all types with the existing convention and avoid issues with serialization.
Risk/Pain Assessment of the Transition
The transition should have minimal impact on platform-independent formats, such as .npy or .npz, since they currently do not work with the
float8_e5m2
type. (np.load fails to load'descr': '<f1'
header)Binary serialization formats, like pickle.dump, would be affected by this change. However, the inherent risks of binary incompatibility are expected for such formats, as they are not intended to serve as reliable interchange formats.