huggingface / datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
https://huggingface.co/docs/datasets
Apache License 2.0
19.29k stars 2.7k forks source link

Specifying datatype when adding a column to a dataset. #7142

Closed varadhbhatnagar closed 2 months ago

varadhbhatnagar commented 2 months ago

Feature request

There should be a way to specify the datatype of a column in datasets.add_column().

Motivation

To specify a custom datatype, we have to use datasets.add_column() followed by datasets.cast_column() which is slow for large datasets. Another workaround is to pass a numpy.array() of desired type to the datasets.add_column() function.

IMO this functionality should be natively supported.

https://discuss.huggingface.co/t/add-column-with-a-particular-type-in-datasets/95674

Your contribution

I can submit a PR for this.

varadhbhatnagar commented 2 months ago

self-assign