Closed aciborowska closed 4 years ago
Done.
future/src/data_generation/data_generator.py
has only code to process github dataset, old Rao's code is removed. future/src/data_generation/data_generator.py
invokes compute_ob.py
to produce utility_data.tsv.I uploaded dummy 1K dataset to google drive and still running script to generate github_20K. Once it's done I'll upload bigger dataset.
Computing OB between posts and answers is a time consuming process. It should be done only once, when we create the dataset to avoid repeating that step every time we run the model.
To do: