Training/validation/testing details

HekpoMaH commented 3 years ago

First of all congratulations on being accepted to EMNLP!

I'm a GNN and hyperGNN learning enthusiast. I've been playing around with the code, but it remains unclear to me on how do you pick your best model. From what I understood, I suspect it has to do sth with the USE_TEST flag, but it's still unclear to me how exactly to use the flag. Currently, if I e.g. run it for 10 epochs, validating every five, with the USE_TEST: True, it will validate 2 times on the test data, right? (My reasoning is based on lines 298 and below, combined with loops/evaluation.py)

Now, imagine I want to run the training, save the best model (to do this I set SAVE: True) based on validation data and then run on the test set, to obtain test set performance. (Alternatively, it's ok for me to run for X epochs and then do test on test set data right after it). If I set USE_TEST: False it doesn't seem it is not going to evaluate on the test set in the end (which is put in eval_vl_data) , whereas if I set the flag to True, eval_vl_data is set to contain test_data and thus will validate on test data every 5 epochs or so.

What is the correct set of commands I should be running? Do I need to modify the code?

P.S. ev_tr_data seems to be never used outside of lines 298-305

migalkin commented 3 years ago

Hi!

it remains unclear to me on how do you pick your best model

In simple words, it's designed in a way that you run a bunch of hyperparameters optimization experiments on validation with USE_TEST: False, and once you have the best hyperparams, run one final experiment on test with the USE_TEST: True flag.

Currently, if I e.g. run it for 10 epochs, validating every five, with the USE_TEST: True, it will validate 2 times on the test data, right?

Right!

I want to run the training, save the best model ... based on validation data and then run on the test set

The SAVE flag is rather for analyzing the model guts and weights after it has been trained and evaluated on the test set, ie, after you performed all the hyperparameter optimization. As of now, there is no functionality to save the best model and load it for the test run.

ev_tr_data seems to be never used outside of lines 298-305

This is a remnant from debugging to evaluate how a model overfits on a subset of training data, it is used in the training loop https://github.com/migalkin/StarE/blob/f40a5ee082d61851477e9870c21e991c7d91deb3/loops/loops.py#L137

Right now the CLI param for it is disabled in the run.py, but you can uncomment it and play around. https://github.com/migalkin/StarE/blob/f40a5ee082d61851477e9870c21e991c7d91deb3/run.py#L56 Usually training sets are large compared to validation/test, so it might be a long process :)

HekpoMaH commented 3 years ago

Thanks for the detailed explanation. So, if I understand correctly, it is the model at the end of training (always) that is picked for testing. E.g. you decide best hyper-params (incl. epochs to run for) on the validation dataset, and you run once on the test dataset and results reported in the end are for the last model?

migalkin commented 3 years ago

Yes, this is correct. We used wandb to track validation experiments, selected the best performing hyperparams and ran one final experiment on the test set with those hyperparams.

migalkin / StarE

Training/validation/testing details #8