open-spaced-repetition / srs-benchmark

A benchmark for spaced repetition schedulers/algorithms
https://github.com/open-spaced-repetition/fsrs4anki/wiki
62 stars 9 forks source link

Feat/exp initial difficulty [WIP] #91

Closed L-M-Sherlock closed 3 months ago

L-M-Sherlock commented 5 months ago

Weighted average by reviews:

Algorithm Log Loss RMSE (bins) Parameters
FSRS-4.5 0.3252±0.1514 0.0533±0.0334 17
FSRS-4.5 + exp init d 0.3250±0.1515 0.0530±0.0333 17

Weighted average by log(reviews):

Algorithm Log Loss RMSE (bins) Parameters
FSRS-4.5 0.3485±0.1701 0.0733±0.0474 17
FSRS-4.5 + exp init d 0.3483±0.1702 0.0729±0.0474 17

improved ~0.6% and ~0.5%.

user1823 commented 5 months ago

I am not an expert in statistics, but is this actually an improvement? When there is an uncertainty at the 2nd place of decimal in the RMSE, does it make sense to consider the 3rd and the 4th decimal places?

@Expertium, can you confirm?

Expertium commented 5 months ago

We would need to run a statistical significance test. @L-M-Sherlock could you please run my logp_wilcox (from significance_table.py) on the baseline values of RMSE and the new values? Like this: log_p_value = logp_wilcox(baseline_RMSE, new_RMSE)[0]

L-M-Sherlock commented 5 months ago
image
Expertium commented 5 months ago

Yep, that's definitely significant. Well, statistically, but not practically, since the effect is only about 0.5%

Expertium commented 4 months ago

As I said here, this is such a minor improvement that even if it's statistically significant, I don't think it's worth implementing. You would need 20 such small improvements to get to the point where the new version is noticeably better than FSRS-4.5.