Closed pminervini closed 3 days ago
I'm building train, dev, and test using from_generator; however, in all three cases, the logger prints Generating train split: It's not possible to change the split name since it seems to be hardcoded: https://github.com/huggingface/datasets/blob/main/src/datasets/packaged_modules/generator/generator.py
from_generator
Generating train split:
In [1]: from datasets import Dataset In [2]: def gen(): ...: yield {"pokemon": "bulbasaur", "type": "grass"} ...: In [3]: ds = Dataset.from_generator(gen) Generating train split: 1 examples [00:00, 133.89 examples/s]
It should be possible to specify any split name
datasets
huggingface_hub
fsspec
Thanks for reporting, @pminervini.
I agree we should give the option to define the split name.
Indeed, there is a PR that addresses precisely this issue:
I am reviewing it.
Booom! thank you guys :)
Describe the bug
I'm building train, dev, and test using
from_generator
; however, in all three cases, the logger printsGenerating train split:
It's not possible to change the split name since it seems to be hardcoded: https://github.com/huggingface/datasets/blob/main/src/datasets/packaged_modules/generator/generator.pySteps to reproduce the bug
Expected behavior
It should be possible to specify any split name
Environment info
datasets
version: 2.19.2huggingface_hub
version: 0.23.3fsspec
version: 2023.10.0