huggingface / datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
https://huggingface.co/docs/datasets
Apache License 2.0
19.19k stars 2.68k forks source link

Support default config name when no builder configs #5070

Closed albertvillanova closed 2 years ago

albertvillanova commented 2 years ago

Is your feature request related to a problem? Please describe. As discussed with @stas00, we could support defining a default config name, even if no predefined allowed config names are set. That is, support DEFAULT_CONFIG_NAME, even when BUILDER_CONFIGS is not defined.

Additional context In order to support creating configs on the fly by name (not using kwargs), the list of allowed builder configs BUILDER_CONFIGS must not be set.

However, if so, then DEFAULT_CONFIG_NAME is not supported.

stas00 commented 2 years ago

Thank you for creating this feature request, Albert.

For context this is the datatest where Albert has been helping me to switch to on-the-fly split config https://huggingface.co/datasets/HuggingFaceM4/cm4-synthetic-testing

and the attempt to switch on-the-fly splits was here: https://huggingface.co/datasets/HuggingFaceM4/cm4-synthetic-testing/discussions/2/files

but which I had to revert since providing no split breaks at run time.