hotpotqa / hotpot

Apache License 2.0
445 stars 75 forks source link

Saving records to multiple .pkl files #38

Open arinaruck opened 4 years ago

arinaruck commented 4 years ago

argument --num_files added in main.py: if 1 (default value) saves datapoints (object wise) to 1 file (.pkl), if -1 makes as many .pkl files as objects, if n > 1, saves to n almost equally sized files (last one can differ in size). That makes the dataset compatible with custom PyTorch Dataset and library PyTorch Generator (https://stackoverflow.com/questions/54571377/how-to-create-a-custom-pytorch-dataset-when-the-order-and-the-total-number-of-tr/54572327#54572327). The files are saved to data_split ("train" or "dev") directory, which is created if not there already, and filenames are the same but with the batch number in the end (e. g. train/train_record_50.pkl)