argument --num_files added in main.py: if 1 (default value) saves datapoints (object wise) to 1 file (.pkl), if -1 makes as many .pkl files as objects, if n > 1, saves to n almost equally sized files (last one can differ in size).
That makes the dataset compatible with custom PyTorch Dataset and library PyTorch Generator (https://stackoverflow.com/questions/54571377/how-to-create-a-custom-pytorch-dataset-when-the-order-and-the-total-number-of-tr/54572327#54572327).
The files are saved to data_split ("train" or "dev") directory, which is created if not there already, and filenames are the same but with the batch number in the end (e. g. train/train_record_50.pkl)
argument --num_files added in main.py: if 1 (default value) saves datapoints (object wise) to 1 file (.pkl), if -1 makes as many .pkl files as objects, if n > 1, saves to n almost equally sized files (last one can differ in size). That makes the dataset compatible with custom PyTorch Dataset and library PyTorch Generator (https://stackoverflow.com/questions/54571377/how-to-create-a-custom-pytorch-dataset-when-the-order-and-the-total-number-of-tr/54572327#54572327). The files are saved to data_split ("train" or "dev") directory, which is created if not there already, and filenames are the same but with the batch number in the end (e. g. train/train_record_50.pkl)