Closed HekpoMaH closed 3 years ago
Hi!
it remains unclear to me on how do you pick your best model
In simple words, it's designed in a way that you run a bunch of hyperparameters optimization experiments on validation with USE_TEST: False
, and once you have the best hyperparams, run one final experiment on test with the USE_TEST: True
flag.
Currently, if I e.g. run it for 10 epochs, validating every five, with the USE_TEST: True, it will validate 2 times on the test data, right?
Right!
I want to run the training, save the best model ... based on validation data and then run on the test set
The SAVE
flag is rather for analyzing the model guts and weights after it has been trained and evaluated on the test set, ie, after you performed all the hyperparameter optimization. As of now, there is no functionality to save the best model and load it for the test run.
ev_tr_data seems to be never used outside of lines 298-305
This is a remnant from debugging to evaluate how a model overfits on a subset of training data, it is used in the training loop https://github.com/migalkin/StarE/blob/f40a5ee082d61851477e9870c21e991c7d91deb3/loops/loops.py#L137
Right now the CLI param for it is disabled in the run.py, but you can uncomment it and play around. https://github.com/migalkin/StarE/blob/f40a5ee082d61851477e9870c21e991c7d91deb3/run.py#L56 Usually training sets are large compared to validation/test, so it might be a long process :)
Thanks for the detailed explanation. So, if I understand correctly, it is the model at the end of training (always) that is picked for testing. E.g. you decide best hyper-params (incl. epochs to run for) on the validation dataset, and you run once on the test dataset and results reported in the end are for the last model?
First of all congratulations on being accepted to EMNLP!
I'm a GNN and hyperGNN learning enthusiast. I've been playing around with the code, but it remains unclear to me on how do you pick your best model. From what I understood, I suspect it has to do sth with the
USE_TEST
flag, but it's still unclear to me how exactly to use the flag. Currently, if I e.g. run it for 10 epochs, validating every five, with theUSE_TEST: True
, it will validate 2 times on the test data, right? (My reasoning is based on lines 298 and below, combined withloops/evaluation.py
)Now, imagine I want to run the training, save the best model (to do this I set
SAVE: True
) based on validation data and then run on the test set, to obtain test set performance. (Alternatively, it's ok for me to run for X epochs and then do test on test set data right after it). If I setUSE_TEST: False
it doesn't seem it is not going to evaluate on the test set in the end (which is put ineval_vl_data
) , whereas if I set the flag toTrue
,eval_vl_data
is set to containtest_data
and thus will validate on test data every 5 epochs or so.What is the correct set of commands I should be running? Do I need to modify the code?
P.S.
ev_tr_data
seems to be never used outside of lines 298-305