PR type

[x] Bug Fix
[ ] New Feature
[ ] Document Updates
[ ] More Models or Datasets Support

PR information

Dataset concatenation may raise following errors:

_check_if_features_can_be_aligned
    raise ValueError(
ValueError: The features can't be aligned because the key history of features {'system': Value(dtype='null', id=None), 'history': Sequence(feature=Sequence(feature=Value(dtype='string', id=None), length=-1, id=None), length=-1, id=None), 'query': Value(dtype='string', id=None), 'response': Value(dtype='string', id=None)} has unexpected type - Sequence(feature=Sequence(feature=Value(dtype='string', id=None), length=-1, id=None), length=-1, id=None) (expected either Sequence(feature=Value(dtype='null', id=None), length=-1, id=None) or Value("null").

This is because some dataset has empty values and None values, and another one has normal history values, so the arrow_dataset will treat them as difference types.

How to solve:

reduce column after the dataset instantiated, and before the concatenation.

Experiment results

Paste your experiment result here(if needed).

modelscope / swift

Fix dataset concatenation #1193

PR type

PR information

Experiment results