huggingface / datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
https://huggingface.co/docs/datasets
Apache License 2.0
19.31k stars 2.7k forks source link

Support features in metadata configs #7182

Closed albertvillanova closed 1 month ago

albertvillanova commented 2 months ago

Support features in metadata configs, like:

configs:
  - config_name: default
    features:
      - name: id
        dtype: int64
      - name:  name
        dtype: string
      - name: score
        dtype: float64

This will allow to avoid inference of data types.

Currently, we allow passing this information in the dataset_info (instead of configs) field, but this is not intuitive and it is not properly documented.

TODO:

HuggingFaceDocBuilderDev commented 2 months ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

albertvillanova commented 2 months ago

The CI issue is unrelated: