Eval tasks not fully independent

Issue by iftenney Thursday Jul 12, 2018 at 02:03 GMT Originally opened as https://github.com/nyu-mll/jiant/issues/140

Running demo.conf with eval_tasks = "sts-b,cola" produces different results on sts-b than running with eval_tasks = "sts-b".

Example commands:

python main.py -c config/demo.conf -o 'exp_name = demo-base'  # evals on sts-b only
python main.py -c config/demo.conf -o 'exp_name = demo-plus, eval_tasks = "sts-b,cola"'

Results are in /nfs/jsalt/home/iftenney/eval_diff, run at commit 1741812b

demo-base gets sts-b_spearmanr: 0.685, while demo-plus gets sts-b_spearmanr: 0.679. Not a large difference, but we should figure out why there's any interaction at all to rule out anything pernicious.

I suspect this is due to RNG seeding causing different initialization (or data feeding?) when multiple models are initialized. We can test this by re-seeding the RNG (deterministically by task) before initializing the model for each task, instead of using the global seed.

nyu-mll / jiant-v1-legacy

Eval tasks not fully independent #140