mindspore-ai / mindspore

MindSpore is a new open source deep learning training/inference framework that could be used for mobile, edge and cloud scenarios.
https://gitee.com/mindspore/mindspore
Apache License 2.0
4.31k stars 709 forks source link

fail to remove dataset feature from huggingface (unable to align with DatasetDict.map parameters remove_columns in transformers) #307

Open EdwinWang37 opened 1 month ago

EdwinWang37 commented 1 month ago

Environment

mindspore2.3.1

Problems

https://github.com/mindspore-ai/mindspore/blob/bd3c5dd1236bb5f2199b7f5ac2f2e6452879128f/mindspore/python/mindspore/dataset/engine/datasets.py#L852

unable to align with DatasetDict.map parameters remove_columns in transformers

Experimental data screenshot

as follow, we can see the upper data with no "text", cause it is easily processed with DatasetDict.map(remove_columns = ''text''), but in the bottom data, i fail to process it in mindspore with similar easy method. And maybe it will increase programmers' development cost to
tackle with it. image

Requirement

If there is similar function to do that, please tell me, i will appreciate it. And if no, is it possible to add this parameters to easily finish it?

wyq-Git commented 3 weeks ago

please try mindnlp. https://github.com/mindspore-lab/mindnlp