Open alex-hh opened 1 month ago
The below easily eats up 32G of RAM. Leaving it for a while bricked the laptop with 16GB.
dataset = load_dataset("Voxel51/OxfordFlowers102", data_dir="data").with_format("numpy")
processed_dataset = dataset.map(lambda x: x)
Similar problems occur if using a real transform function in .map()
.
Describe the bug
When working with large arrays, setting with_format to e.g. numpy then applying map causes a significant slowdown for iterable datasets.
Steps to reproduce the bug
Then
takes 27 s, whereas
takes ~1s
Expected behavior
Map should not introduce a slowdown when formatting is enabled.
Environment info
3.0.2