There should be a way to specify the datatype of a column in datasets.add_column().
Motivation
To specify a custom datatype, we have to use datasets.add_column() followed by datasets.cast_column() which is slow for large datasets. Another workaround is to pass a numpy.array() of desired type to the datasets.add_column() function.
IMO this functionality should be natively supported.
Feature request
There should be a way to specify the datatype of a column in
datasets.add_column()
.Motivation
To specify a custom datatype, we have to use
datasets.add_column()
followed bydatasets.cast_column()
which is slow for large datasets. Another workaround is to pass anumpy.array()
of desired type to thedatasets.add_column()
function.IMO this functionality should be natively supported.
https://discuss.huggingface.co/t/add-column-with-a-particular-type-in-datasets/95674
Your contribution
I can submit a PR for this.