[CLOSED] Fix target_train_data_fraction overriding pretrain_data_fraction

Issue by pyeres Thursday Apr 16, 2020 at 04:17 GMT Originally opened as https://github.com/nyu-mll/jiant/pull/1070

This PR addresses the bug described in issue #1066 ("Results with pretrain_data_fraction args").

This PR...

Adds a new private field to the Task object (_instance_generators) to store instance generators with the appropriate phase-specific data fraction settings.
Adds new methods, getter get_instance_generator and setter set_instance_generator, to Task to access the _instance_generators.
Updates preprocessing, training, and evaluation code to use the new getter and setter.
Updates a test that uses evaluation logic to use the new getter and setter.
Adds tests for Task's that an exception is raised when training instance generator is requested without specifying the phase (because phase determines data fraction).

Validation: the config provided in issue #1066 was run with these code changes. Results logged for the task (MNLI) using a pretraining data fraction were more in line with expectation: after 25k steps training accuracy was ~100%, while val and test accuracy were ~75%.

pyeres included the following code: https://github.com/nyu-mll/jiant/pull/1070/commits

nyu-mll / jiant-v1-legacy

[CLOSED] Fix target_train_data_fraction overriding pretrain_data_fraction #1070