Closed alex-hh closed 1 month ago
When calling filter on an iterable dataset, the features get set to None
import numpy as np import time from datasets import Dataset, Features, Array3D
features=Features(**{"array0": Array3D((None, 10, 10), dtype="float32"), "array1": Array3D((None,10,10), dtype="float32")}) dataset = Dataset.from_dict({f"array{i}": [np.zeros((x,10,10), dtype=np.float32) for x in [2000,1000]*25] for i in range(2)}, features=features) ds = dataset.to_iterable_dataset() orig_column_names = ds.column_names ds = ds.filter(lambda x: True) assert ds.column_names == orig_column_names
Filter should preserve features information
3.0.2
closed by https://github.com/huggingface/datasets/pull/7209, thanks @alex-hh !
Describe the bug
When calling filter on an iterable dataset, the features get set to None
Steps to reproduce the bug
import numpy as np import time from datasets import Dataset, Features, Array3D
Expected behavior
Filter should preserve features information
Environment info
3.0.2