League Evaluation Metric

zxzzz0 commented 3 years ago

Added this issue as suggested by @PaParaZz1.

[x] I have marked all applicable categories:
- [ ] exception-raising bug
- [ ] RL algorithm bug
- [ ] system worker bug
- [ ] system utils bug
- [ ] code design/refactor
- [ ] documentation request
- [x] new feature request
[x] I have visited the readme and [doc]()
[x] I have searched through the issue tracker and pr tracker
[x] I have mentioned version numbers, operating system and environment, where applicable: N/A

TrueSkill is a ranking metric developed by Microsoft for game matchmaking. Unlike ELO which just measures one agent's strength, TrueSkill can measure both strength and stability. Each player starts with mu=25.000 and sigma=8.333. Former one (mu) measures strength and the latter one (sigma) measures stability. After receiving payoffs of one matching, mu and sigma will be updated accordingly from the TrueSkill API. Final agent's score can be defined as mu - 3 * sigma to take both strength and stability into consideration.

Currently this metric is missing in the league demo. It would be better to add it.

PaParaZz1 commented 3 years ago

TrueSkill demo result in tensorboard Screen Shot 2021-08-10 at 3 23 42 PM

zxzzz0 commented 3 years ago

This issue has been solved in PR #22.

opendilab / DI-engine

League Evaluation Metric #18