Closed danielgafni closed 2 years ago
Hi, thanks for raising this issue. Before you start to write code, let us have a discussion to see if the PR is necessary. The following are my thoughts for discussion:
demo_mode=True
(ref). This will tell the trainer to only evaluate the policy on the test task and no training is done.I personally don't like the idea of early stopping, and the trainer saves the model snapshot with the best validation score. The mapping between this model and the training iteration can be traced in the training log.
Hey! Thanks for the quick response.
I agree with your points, it's definitely possible to keep the current interface. This usage pattern should be documented somewhere tho.
re: early stopping - it can be necessary in some scenarios. The model is indeed being saved after every iteration, but the trainer.run
method doesn't return the information about the best model. You can find it in the logs, sure, but there has to be a programmatic way to do it. Otherwise it's impossible to load the best model automatically. Maybe a solution here is to log the model based on val_score
, not on train_score
. Then the models/best.npz
model would mean "model with best validation score". What do you think about it?
Maybe a solution here is to log the model based on val_score, not on train_score. Then the models/best.npz model would mean "model with best validation score". What do you think about it?
I made sure the best model (its log too) was based on the validation score (related source code), can you double check this part?
Oh, you are right, sorry for the confusion.
@lerrytang so what about early stopping? can we add an early stopping patience and threshold parameters to the trainer? E.g. stop the training loop if the test score doesn't improve by
While it is common in to use early stopping in supervised learning problems, early stopping is misleading in solving tasks with neuroevolution (from my experience). For example, one often sees the learning curve (test scores) dip for quite a long time before rising up when training a locomotion controller. You may think we can put a knob on how many iterations we should tolerate before we see any progress, but I don't think this extra hyper-parameter is worth the trouble.
So basically you are saying it's always better to run the ES for a lot of iterations? I'll do some reward/iter plotting for my problem that involves timeseries (which means my train and validation data might come from different distributions). It's probably very data-dependent. Will tell you the results once we add the custom log_fn :)
Currently I see two ways of using the
Trainer.test_task
:test_task
to the trainer, because thetrain_task
is non-optional. Looks like there should be a way to do this withevojax
.trainer.run
return the best model score and not the last model score?I propose the following (high level) logic:
Probably early stopping would be pretty necessary for the
trainer.fit
method. Currently there is no way to determine when to do it and even which model iteration has the best result.I'm willing to implement this logic in a PR.