Closed KornbergFresnel closed 2 years ago
Actually, I have some personal thoughts about the evaluation worker, which should work quite differently from normal rollout workers.
First, the evaluation worker should not sample data in asynchronous manners, which will cause a waste of computing resources. Instead, it is supposed to wait for the training manager to send signals along with policy parameters to be evaluated. Maybe it has to maintain a local parameter buffer to handle the case when new signals coming in during evaluation time.
Second, the information received from the training manager should contain the corresponding (training) epoch number, so the evaluation worker can log the evaluation metrics with the training epoch rather than the evaluator's local sample epoch.
Thank you!
The changes from PR #12 added a new feature to do policy evaluation, while it was ignored. Polishment is required
settings https://github.com/sjtu-marl/malib/blob/9efda1b6c9db555f1360aa9759a717349c1fe32d/malib/settings.py#L95
parameter https://github.com/sjtu-marl/malib/blob/9efda1b6c9db555f1360aa9759a717349c1fe32d/malib/rollout/rollout_worker.py#L38
some related logics https://github.com/sjtu-marl/malib/blob/9efda1b6c9db555f1360aa9759a717349c1fe32d/malib/rollout/base_worker.py#L162 https://github.com/sjtu-marl/malib/blob/9efda1b6c9db555f1360aa9759a717349c1fe32d/malib/rollout/base_worker.py#L104 https://github.com/sjtu-marl/malib/blob/9efda1b6c9db555f1360aa9759a717349c1fe32d/malib/rollout/base_worker.py#L315