If the user provides a Features type value to datasets.Dataset with members having Array2D with a value for dtype, it is not respected during with_format("numpy") which should return a np.array with dtype that the user provided for Array2D. It seems for floats, it will be set to float32 and for ints it will be set to int64
Steps to reproduce the bug
import numpy as np
import datasets
from datasets import Dataset, Features, Array2D
print(f"datasets version: {datasets.__version__}")
data_info = {
"arr_float" : "float64",
"arr_int" : "int32"
}
sample = {key : [np.zeros([4, 5], dtype=dtype)] for key, dtype in data_info.items()}
features = {key : Array2D(shape=(None, 5), dtype=dtype) for key, dtype in data_info.items()}
features = Features(features)
dataset = Dataset.from_dict(sample, features=features)
ds = dataset.with_format("numpy")
for key in features:
print(f"{key} feature dtype: ", ds.features[key].dtype)
print(f"{key} dtype:", ds[key].dtype)
Describe the bug
If the user provides a
Features
type value todatasets.Dataset
with members havingArray2D
with a value fordtype
, it is not respected duringwith_format("numpy")
which should return anp.array
withdtype
that the user provided forArray2D
. It seems for floats, it will be set tofloat32
and for ints it will be set toint64
Steps to reproduce the bug
Output:
Expected behavior
It should return a
np.array
withdtype
that the user provided for the corresponding member in theFeatures
type valueEnvironment info
datasets
version: 3.0.2huggingface_hub
version: 0.26.1fsspec
version: 2024.5.0