Upgrade the algorithm for individual scores

lenhoanglnh commented 1 year ago

Currently, and perhaps surprisingly, the algorithm that transform a contributor's comparisons into its scores fails to guarantee a desirable monotonicity property. Namely if a contributor further pushes his comparison between A and B towards A, then the score of A should increase and that of B should decrease. https://arxiv.org/pdf/2211.01179.pdf

Essentially, the reason why this property fails is because, when the contributor says A is vastly better than B, the corresponding spring has a very small stiffness, which can make the comparison nearly meaningless for the algorithm.

The reason why this annoying property holds is because the algorithm demands a quadratic loss, so that it can be minimized by a straightforward matrix inversion. Thereby, we avoid doing optimization, with related issues such as nondeterminism and lack of guarantee of small distance to the minimum (in the parameter space).

I have previously proposed an alternative, based on "Binomial Bradley Terry", which I think satisfies monotonicity (to be verified). Moving towards this would provide better properties, but would require greater computation care to perform the optimization. Additionally, doing optimization will make it significantly easier to include other kinds of data, e.g. likes/dislikes or direct scoring.

Next step is for Lê to clarify the new algorithms.

amatissart commented 1 year ago

A draft for the implementation of individual scores based on BBT is in #1531

Remaining tasks

[x] Review (and potentially add more) unit tests in https://github.com/tournesol-app/tournesol/blob/indiv-scores-bbt/backend/ml/tests/test_mehestan_individual.py
[x] Any other check before we can opt to replace the current approach with BBT?
[x] Validate the value of alpha
[x] Observe the impact on scores distribution, and adapt the value of W. At first glance, the distribution of individual raw scores with BBT is wider, so W should probably be reduced.

amatissart commented 1 year ago

Validate the value of alpha

By evaluating the prior weight with the approach described in "Tournesol's algorithms / Section 3.5" on the public dataset and the main criterion "largelyrecommended", I get $\alpha^{contributor}{prior} = 0.23$.

So I would suggest to keep 1 significant digit, and to use $$\alpha = 0.2$$, which is consistent with our intention to reduce the value of alpha (currently set to $\alpha = 1.0$ with the non-BBT algorithm)

tournesol-app / tournesol

Upgrade the algorithm for individual scores #1491