huggingface / datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
https://huggingface.co/docs/datasets
Apache License 2.0
19.28k stars 2.7k forks source link

Adding column with dict struction when mapping lead to wrong order #7247

Open chchch0109 opened 1 month ago

chchch0109 commented 1 month ago

Describe the bug

in map() function, I want to add a new column with a dict structure.

def map_fn(example):
  example['text'] = {'user': ..., 'assistant': ...}
  return example

However this leads to a wrong order {'assistant':..., 'user':...} in the dataset. Thus I can't concatenate two datasets due to the different feature structures. Here is a minimal reproducible example This seems an issue in low level pyarrow library instead of datasets, however, I think datasets should allow concatenate two datasets actually in the same structure.

Steps to reproduce the bug

Here is a minimal reproducible example

Expected behavior

two datasets could be concatenated.

Environment info

N/A