open-spaced-repetition / srs-benchmark

A benchmark for spaced repetition schedulers/algorithms
https://github.com/open-spaced-repetition/fsrs4anki/wiki
62 stars 9 forks source link

Significant update regarding IQR ranges #78

Closed RlzHi closed 6 months ago

RlzHi commented 6 months ago

With a low lookahead, i.e. frequent optimisation, pretrain does great on even n=8 (number of reviews trained on is 8). The issue is that we can't plot the IQR of this easily, or rather it's meaning is limited. For example, if we were plotting accuracy instead of loss, and we used lookahead=1 (size of the testing data) (this is the ideal lookahead - how well does optimised FSRS perform immediately after optimisation), we would have an IQR between 0 and 1. Every accuracy is 0% or 100%.

To summarise: The analyses suggest pretrain works great on all n. It's just hard to show it through IQR because a larger test dataset means less accurate losses. Lookahead 10 shows it quite well, and mean log loss vs mean rmse shows it even better. (Log loss penalises being very wrong more than RMSE).