Closed lenhoanglnh closed 1 year ago
A draft for the implementation of individual scores based on BBT is in #1531
Remaining tasks
W
. At first glance, the distribution of individual raw scores with BBT is wider, so W should probably be reduced.Validate the value of alpha
By evaluating the prior weight with the approach described in "Tournesol's algorithms / Section 3.5" on the public dataset and the main criterion "largelyrecommended", I get $\alpha^{contributor}{prior} = 0.23$.
So I would suggest to keep 1 significant digit, and to use $$\alpha = 0.2$$, which is consistent with our intention to reduce the value of alpha (currently set to $\alpha = 1.0$ with the non-BBT algorithm)
Currently, and perhaps surprisingly, the algorithm that transform a contributor's comparisons into its scores fails to guarantee a desirable monotonicity property. Namely if a contributor further pushes his comparison between A and B towards A, then the score of A should increase and that of B should decrease. https://arxiv.org/pdf/2211.01179.pdf
Essentially, the reason why this property fails is because, when the contributor says A is vastly better than B, the corresponding spring has a very small stiffness, which can make the comparison nearly meaningless for the algorithm.
The reason why this annoying property holds is because the algorithm demands a quadratic loss, so that it can be minimized by a straightforward matrix inversion. Thereby, we avoid doing optimization, with related issues such as nondeterminism and lack of guarantee of small distance to the minimum (in the parameter space).
I have previously proposed an alternative, based on "Binomial Bradley Terry", which I think satisfies monotonicity (to be verified). Moving towards this would provide better properties, but would require greater computation care to perform the optimization. Additionally, doing optimization will make it significantly easier to include other kinds of data, e.g. likes/dislikes or direct scoring.
Next step is for Lê to clarify the new algorithms.