official-stockfish / fishtest

The Stockfish testing framework
https://tests.stockfishchess.org/tests
280 stars 129 forks source link

Displaying ELO with LLR statistics #514

Open ghost opened 4 years ago

ghost commented 4 years ago

I find it strange that ELO(for SPRT tests) is only displayed on the live ELO page. Could it be added within the stat box instead of -2.94,2.94(which is the same for every single test) ? Example:

LLR: -0.75 ( +0.22 ELO) {-1.00,3.00}
Total: 19537 W: 3718 L: 3683 D: 12136
Ptnml(0-2): 263, 2205, 4818, 2181, 292

LLR itself could be displayed as -100% +100% range too, if bounds are needed.
Example:

LLR: -21.4% (+0.22 ELO) {-1.00,3.00}
Total: 19537 W: 3718 L: 3683 D: 12136
Ptnml(0-2): 263, 2205, 4818, 2181, 292
MJZ1977 commented 4 years ago

This proposal can be useful because it avoids clicking each time to live page.

vondele commented 4 years ago

but would only make sense if the Elo interval is displayed. Just looking at the Elo without the error bounds is misleading.

ghost commented 4 years ago

@vondele what about this form?

LLR: -21.4% (-3.13 | +0.22 ELO |2.31) {-1.00,3.00}
Total: 19537 W: 3718 L: 3683 D: 12136
Ptnml(0-2): 263, 2205, 4818, 2181, 292
vondele commented 4 years ago

yes, some variant is possible, but LLR in % is strange, it is a log-likelihood ratio so just a number IMO. I agree with removing our -2.94, 2.94 bounds. I might have a look to see what is possible later this week.

ghost commented 4 years ago

@vondele its a "percent to completion". -100% is failed test, +100% is passing. when you see 0.62 without(-2.94 2.94 ) you have no idea if its close to completion or not(you have to divide 0.62/2.94). Also if bounds (-2.94 2.94 ) change in the future, the percentage display will still remain accurate because it displays the range(-100% 100%) for any bounds.