If a dataset is created with ds.save_to_disk(), load_dataset() fails to load the local dataset and throws errors. The solution could be either providing a custom loading script, or maybe using load_from_disk method as proposed in this PR. The dataset is stored using .arrow format and the splits are stored in separate folders (E.g.: ds.save_to_disk(os.path.join(path, dataset_name))).
Starting the training later is done by adapting the config files with:
If a dataset is created with
ds.save_to_disk()
,load_dataset()
fails to load the local dataset and throws errors. The solution could be either providing a custom loading script, or maybe usingload_from_disk
method as proposed in this PR. The dataset is stored using.arrow
format and the splits are stored in separate folders (E.g.:ds.save_to_disk(os.path.join(path, dataset_name))
).Starting the training later is done by adapting the
config
files with: