huggingface / datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
https://huggingface.co/docs/datasets
Apache License 2.0
19.01k stars 2.63k forks source link

Specifying datatype when adding a column to a dataset. #7142

Open varadhbhatnagar opened 1 week ago

varadhbhatnagar commented 1 week ago

Feature request

There should be a way to specify the datatype of a column in datasets.add_column().

Motivation

To specify a custom datatype, we have to use datasets.add_column() followed by datasets.cast_column() which is slow for large datasets. Another workaround is to pass a numpy.array() of desired type to the datasets.add_column() function.

IMO this functionality should be natively supported.

https://discuss.huggingface.co/t/add-column-with-a-particular-type-in-datasets/95674

Your contribution

I can submit a PR for this.

varadhbhatnagar commented 1 week ago

self-assign