huggingface / datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
https://huggingface.co/docs/datasets
Apache License 2.0
19.27k stars 2.7k forks source link

make shape verification to use ArrayXD instead of nested lists for map #3484

Open tshu-w opened 2 years ago

tshu-w commented 2 years ago

As describe in https://github.com/huggingface/datasets/issues/2005#issuecomment-793716753 and mentioned by @mariosasko in image feature example, IMO make shape verifcaiton to use ArrayXD instead of nested lists for map can help user reduce unnecessary cast. I notice datasets have done something special for input_ids and attention_mask which is also unnecessary after this feature added.

mariosasko commented 2 years ago

Hi!

Yes, this makes sense for numeric values, but first I have to finish https://github.com/huggingface/datasets/pull/3336 because currently ArrayXD only allows the first dimension to be dynamic.