Implement ELO rating function

We should have a function which receives as arguments:

a list of model names

After every game, the winning player takes points from the losing one. (https://en.wikipedia.org/wiki/Elo_rating_system)
a "number of games" parameter (needs looking into: are we randomly pitting "players" against each other? Are we rather going through all possible games? And returns a dictionary whose keys are model names and values are ELO ratings.

This part on the wiki page also seems relevant for implementation:

An example may help to clarify: Suppose player A has a rating of 1613...

paulbricman / DebateGPT