Open jako5 opened 4 months ago
From what I can gather, for the current codebase to work with the current dataset, the file structure has to look like this for the synthetic dataset:
and the splits beeing extracted from the 60GB dataset, renamed and placed in the data folder like this
Hi everybody, there seem to be multiple issues going on, which are unfortunately preventing me from doing any tests with the repo.
... as an argument, while the preprocessing script itself
...does not allow this argument.
Furthermore, the current version of the test_set_release dataset does not seem to contain any split_train.txt (or train_list.txt) files that are required to split the datasets during preprocessing. I really hope you can push the required updates, as it would be great to do some tests with this :)