Closed timlac closed 10 months ago
The expression "train+test" concatenates the splits.
The individual splits as separate datasets can be obtained as follows:
train_ds, test_ds = load_dataset("<dataset_name>", split=["train", "test"])
train_10pct_ds, test_10pct_ds = load_dataset("<dataset_name>", split=["train[:10%]", "test[:%10]"])
Describe the bug
According to the documentation is should be possible to run the following command:
train_test_ds = datasets.load_dataset("bookcorpus", split="train+test")
to load the train and test sets from the dataset.
However executing the equivalent code:
speech_commands_v1 = load_dataset("superb", "ks", split="train+test")
only yields the following output:
Where loading the dataset without the split argument yields:
Thus, the API seems to be broken in this regard.
This is a bit annoying since I want to be able to use the split argument with
split="train[:10%]+test[:10%]"
to have smaller dataset to work with when validating my model is working correctly.Steps to reproduce the bug
speech_commands_v1 = load_dataset("superb", "ks", split="train+test")
Expected behavior
Environment info