opendilab / DI-engine

OpenDILab Decision AI Engine. The Most Comprehensive Reinforcement Learning Framework B.P.
https://di-engine-docs.readthedocs.io
Apache License 2.0
3.12k stars 380 forks source link

League Evaluation Metric #18

Closed zxzzz0 closed 3 years ago

zxzzz0 commented 3 years ago

Added this issue as suggested by @PaParaZz1.

TrueSkill is a ranking metric developed by Microsoft for game matchmaking. Unlike ELO which just measures one agent's strength, TrueSkill can measure both strength and stability. Each player starts with mu=25.000 and sigma=8.333. Former one (mu) measures strength and the latter one (sigma) measures stability. After receiving payoffs of one matching, mu and sigma will be updated accordingly from the TrueSkill API. Final agent's score can be defined as mu - 3 * sigma to take both strength and stability into consideration.

Currently this metric is missing in the league demo. It would be better to add it.

PaParaZz1 commented 3 years ago

TrueSkill demo result in tensorboard Screen Shot 2021-08-10 at 3 23 42 PM

zxzzz0 commented 3 years ago

This issue has been solved in PR #22.