Open KennethEnevoldsen opened 3 years ago
We could do something. There is a part of .info
which is split specific (cache files, split instructions) but maybe if could be made to work.
Yes this was kinda the idea I was going for. DatasetDict.info would be the shared info amongs the datasets (maybe even some info on how they differ).
Currently, only
Dataset
contains the .info or .features, but as many datasets contains standard splits (train, test) and thus the underlying information is the same (or at least should be) across the datasets.For instance:
I could imagine that this wouldn't work for datasets dicts which hold entirely different datasets (multimodal datasets), but it seems odd that splits of the same dataset is treated the same as what is essentially different datasets.
Intuitively it would also make sense that if a dataset is supplied via. the load_dataset that is have a common .info which covers the entire dataset.
It is entirely possible that I am missing another perspective