google / evojax

Apache License 2.0
847 stars 87 forks source link

Some proposals about the `Trainer` logic #10

Closed danielgafni closed 2 years ago

danielgafni commented 2 years ago

Currently I see two ways of using the Trainer.test_task:

  1. The test_task of the trainer is used for validation. The actual test set is being holdout and not seen during training or validation. In this case, how do I run the actual test? I can't pass just the test_task to the trainer, because the train_task is non-optional. Looks like there should be a way to do this with evojax.
  2. The test_task of the trainer is used for the actual test, no validation is used at all. In this case, why does the trainer.run return the best model score and not the last model score?

I propose the following (high level) logic:

best_val_reward = trainer.fit(train_task: VectorizedTask, val_task: Optional[VectorizedTask] = None)  # maybe the user doesn't want validation (e.g. train on latest data without early stopping)
test_reward = trainer.test(test_task: VectorizedTask, checkpoint="best|last|path")  # specify which checkpoint to use for testing

Probably early stopping would be pretty necessary for the trainer.fit method. Currently there is no way to determine when to do it and even which model iteration has the best result.

I'm willing to implement this logic in a PR.

lerrytang commented 2 years ago

Hi, thanks for raising this issue. Before you start to write code, let us have a discussion to see if the PR is necessary. The following are my thoughts for discussion:

  1. Your argument makes sense. How about this solution? In training, we pass the training task and the validation task to the trainer. In tests, we pass the test task as both the train and the test tasks and also set demo_mode=True (ref). This will tell the trainer to only evaluate the policy on the test task and no training is done.
  2. If my proposed solution in 1 seems reasonable to you, in tests the best model is equivalent to the last model and the problem is solved.

I personally don't like the idea of early stopping, and the trainer saves the model snapshot with the best validation score. The mapping between this model and the training iteration can be traced in the training log.

danielgafni commented 2 years ago

Hey! Thanks for the quick response.

I agree with your points, it's definitely possible to keep the current interface. This usage pattern should be documented somewhere tho.

re: early stopping - it can be necessary in some scenarios. The model is indeed being saved after every iteration, but the trainer.run method doesn't return the information about the best model. You can find it in the logs, sure, but there has to be a programmatic way to do it. Otherwise it's impossible to load the best model automatically. Maybe a solution here is to log the model based on val_score, not on train_score. Then the models/best.npz model would mean "model with best validation score". What do you think about it?

lerrytang commented 2 years ago

Maybe a solution here is to log the model based on val_score, not on train_score. Then the models/best.npz model would mean "model with best validation score". What do you think about it?

I made sure the best model (its log too) was based on the validation score (related source code), can you double check this part?

danielgafni commented 2 years ago

Oh, you are right, sorry for the confusion.

danielgafni commented 2 years ago

@lerrytang so what about early stopping? can we add an early stopping patience and threshold parameters to the trainer? E.g. stop the training loop if the test score doesn't improve by over the last iterations

lerrytang commented 2 years ago

While it is common in to use early stopping in supervised learning problems, early stopping is misleading in solving tasks with neuroevolution (from my experience). For example, one often sees the learning curve (test scores) dip for quite a long time before rising up when training a locomotion controller. You may think we can put a knob on how many iterations we should tolerate before we see any progress, but I don't think this extra hyper-parameter is worth the trouble.

danielgafni commented 2 years ago

So basically you are saying it's always better to run the ES for a lot of iterations? I'll do some reward/iter plotting for my problem that involves timeseries (which means my train and validation data might come from different distributions). It's probably very data-dependent. Will tell you the results once we add the custom log_fn :)