Open severo opened 6 months ago
And give an example of each supported feature type in the YAML config. See https://discuss.huggingface.co/t/appropriate-yaml-for-dataset-info-list-float/74418 for example: I think we currently have no reference to share to the user.
Hey @severo, i just had a look into this. As far as i can see, there is no section about "More YAML tags" anymore in the Dataset docs. Is this correct? If yes, is this issue outdated or do i miss something?
Indeed, it has been removed in https://github.com/huggingface/datasets/pull/5470#discussion_r1088471903
The spec is here: https://github.com/huggingface/hub-docs/blob/main/datasetcard.md?plain=1
Somewhat related: discussion about the spec: https://github.com/huggingface/dataset-viewer/issues/2639
Also: should we just redirect to the spec (https://github.com/huggingface/hub-docs/blob/main/datasetcard.md?plain=1), or should we create a dedicated doc page for this? Adding the link would already by a good step forward.
Also: https://github.com/huggingface/hub-docs/blob/main/datasetcard.md?plain=1 is outdated:
configs: # Optional for datasets with multiple configurations like glue.
- {config_0} # Example for glue: sst2
- {config_1} # Example for glue: cola
It does not respect the current format: https://huggingface.co/docs/hub/datasets-manual-configuration.
Ideally, it should be the reference, with more details than https://huggingface.co/docs/hub/datasets-manual-configuration, not the other way.
cc @polinaeterna for example if you want to look at it
Adding the link would already by a good step forward.
Shall i start out with this and have a look where it leads us @severo? Or would you suggest a different approachch?
Hmmm, I think we have to improve the spec first. Then, link to it from the docs page, otherwise the link would not bring much value.
Let me know if i can help out somehow! Would be down for it. 😄
Do you want to work on a PR to improve the spec https://github.com/huggingface/hub-docs/blob/main/datasetcard.md?plain=1 ?
The idea is to add the structure of the configs:
field, to match https://huggingface.co/docs/hub/datasets-manual-configuration at least (config_name, data_files, etc). Some more fields can be passed, if I'm not wrong (it's defined in https://github.com/huggingface/datasets, but @polinaeterna knows these details better than I)
I would love to! Will open a PR for discussion.
Since the spec is improved, shall i open a PR to link the YAML configuratino page?
Link to https://huggingface.co/docs/datasets/v2.7.1/en/dataset_card#more-yaml-tags from https://huggingface.co/docs/hub/datasets-manual-configuration, to complement with all the possible values in README's YAML