Feat/exp initial difficulty [WIP]

open-spaced-repetition / srs-benchmark

A benchmark for spaced repetition schedulers/algorithms

https://github.com/open-spaced-repetition/fsrs4anki/wiki

62 stars 9 forks source link

Feat/exp initial difficulty [WIP] #91

Closed L-M-Sherlock closed 3 months ago

L-M-Sherlock commented 5 months ago

Weighted average by reviews:

Algorithm	Log Loss	RMSE (bins)	Parameters
FSRS-4.5	0.3252±0.1514	0.0533±0.0334	17
FSRS-4.5 + exp init d	0.3250±0.1515	0.0530±0.0333	17

Weighted average by log(reviews):

Algorithm	Log Loss	RMSE (bins)	Parameters
FSRS-4.5	0.3485±0.1701	0.0733±0.0474	17
FSRS-4.5 + exp init d	0.3483±0.1702	0.0729±0.0474	17

improved ~0.6% and ~0.5%.

user1823 commented 5 months ago

I am not an expert in statistics, but is this actually an improvement? When there is an uncertainty at the 2nd place of decimal in the RMSE, does it make sense to consider the 3rd and the 4th decimal places?

@Expertium, can you confirm?

Expertium commented 5 months ago

We would need to run a statistical significance test. @L-M-Sherlock could you please run my logp_wilcox (from significance_table.py) on the baseline values of RMSE and the new values? Like this: log_p_value = logp_wilcox(baseline_RMSE, new_RMSE)[0]

L-M-Sherlock commented 5 months ago

Expertium commented 5 months ago

Yep, that's definitely significant. Well, statistically, but not practically, since the effect is only about 0.5%

Expertium commented 4 months ago

As I said here, this is such a minor improvement that even if it's statistically significant, I don't think it's worth implementing. You would need 20 such small improvements to get to the point where the new version is noticeably better than FSRS-4.5.