Ugly implementation of evaluation control

sjtu-marl / malib

A parallel framework for population-based multi-agent reinforcement learning.

MIT License

498 stars 60 forks source link

Actually, I have some personal thoughts about the evaluation worker, which should work quite differently from normal rollout workers.

First, the evaluation worker should not sample data in asynchronous manners, which will cause a waste of computing resources. Instead, it is supposed to wait for the training manager to send signals along with policy parameters to be evaluated. Maybe it has to maintain a local parameter buffer to handle the case when new signals coming in during evaluation time.

Second, the information received from the training manager should contain the corresponding (training) epoch number, so the evaluation worker can log the evaluation metrics with the training epoch rather than the evaluator's local sample epoch.

Thank you!

sjtu-marl / malib

Ugly implementation of evaluation control #17