open-spaced-repetition / fsrs4anki

A modern Anki custom scheduling based on Free Spaced Repetition Scheduler algorithm
https://github.com/open-spaced-repetition/fsrs4anki/wiki
MIT License
2.6k stars 127 forks source link

FSRS like to optimize parameter 7 to 0.0 #695

Open Gilfaro opened 2 days ago

Gilfaro commented 2 days ago

FSRS 5 even more than 4.5 likes to optimize parameter 7 which also disables difficulty decay to 0. Even manually changing the parameter after optimization results in better log loss and rmse(bins). Probably needs more testing in benchmark, but setting minimum at about 0.0100 would fix this issue and improve the fit to data.

L-M-Sherlock commented 2 days ago

Even manually changing the parameter after optimization results in better log loss and rmse(bins).

Could you reproduce it in 50%+ cases?

Expertium commented 2 days ago

I think it's better to benchmark it on Anki 20k.

Gilfaro commented 2 days ago

Currently tested on 5 decks in Anki beta 2 and it improved it for all my decks. Here is sample run from small benchmark where numbers are nearly the same, but from end user perspective it is much better if difficulty can be modified by using good/again. Sometimes values much higher than clipper are more optimal, but once the optimizer decides on 0 it seems to be stuck in that local minimum.

Model: FSRS-5
Total number of users: 10
Total number of reviews: 255400
Weighted average by reviews:
FSRS-5 LogLoss (mean±std): 0.3737±0.1183
FSRS-5 RMSE(bins) (mean±std): 0.0496±0.0188
FSRS-5 AUC (mean±std): 0.7064±0.0897

Weighted average by log(reviews):
FSRS-5 LogLoss (mean±std): 0.3660±0.1512
FSRS-5 RMSE(bins) (mean±std): 0.0647±0.0245
FSRS-5 AUC (mean±std): 0.6741±0.1350

Weighted average by users:
FSRS-5 LogLoss (mean±std): 0.3641±0.1547
FSRS-5 RMSE(bins) (mean±std): 0.0671±0.0247
FSRS-5 AUC (mean±std): 0.6668±0.1387

parameters: [0.32095, 1.0439, 2.5808, 15.4029, 7.20385, 0.5284, 1.02275, 0.02555, 1.63765, 0.225, 1.04685, 1.9363, 0.0724, 0.27135, 2.214, 0.34025, 3.1823, 0.6103, 0.73345]

Model: FSRS-5-clamp7
Total number of users: 10
Total number of reviews: 255400
Weighted average by reviews:
FSRS-5 LogLoss (mean±std): 0.3738±0.1183
FSRS-5 RMSE(bins) (mean±std): 0.0499±0.0185
FSRS-5 AUC (mean±std): 0.7062±0.0898

Weighted average by log(reviews):
FSRS-5 LogLoss (mean±std): 0.3658±0.1511
FSRS-5 RMSE(bins) (mean±std): 0.0647±0.0243
FSRS-5 AUC (mean±std): 0.6742±0.1350

Weighted average by users:
FSRS-5 LogLoss (mean±std): 0.3639±0.1545
FSRS-5 RMSE(bins) (mean±std): 0.0671±0.0245
FSRS-5 AUC (mean±std): 0.6669±0.1387

parameters: [0.3195, 1.04455, 2.5806, 15.4029, 7.208, 0.47475, 1.03605, 0.0252, 1.62875, 0.22925, 1.02125, 1.9363, 0.07145, 0.2694, 2.2139, 0.3364, 3.18255, 0.61735, 0.73345]
Expertium commented 2 days ago

@L-M-Sherlock we've talked about this before, and ran these kinds of benchmarks before. The conclusion here is the same - clamping barely affects the metrics. So it just comes down to preferences - do we want D to always change if the user pressed Good, even if it's a small change? I'd say yes.

user1823 commented 8 hours ago

In principle, I like the suggestion, but experience suggests otherwise.

In https://github.com/open-spaced-repetition/fsrs4anki/commit/20d2dae96c33ab05c79cf65e4e2cc67d55251513, L-M-Sherlock used 0.05 as the minimum value for this parameter. (It was called w[5] at that time.)

But, using 0.05 as the minimum limit with my collection, not only made RMSE worse, but also increased the workload (parameters calculated with the new limit on exactly the same collection gave me a backlog of 900 extra cards). First reported in https://github.com/open-spaced-repetition/fsrs4anki/issues/342#issuecomment-1633451433

L-M-Sherlock gave this explanation for the issue:

One rational explanation for your case is, you don't have ease hell, but w[5] assume you have. So w[5] will decrease the difficulty in the long-term. Then w[4] would increase to counteract or even override it, which induces the workload.

To fix the issue, he decreased the lower limit back to 0.

I advised using a small but non-zero lower limit (such as 0.0003)

His response was that such a low value won't result in any appreciable mean reversion and, thus, is no better than using 0.

The difference between 0.0003 and 0 is pretty small. If the initial difficulty is 5 and the current difficulty is 10. If you always press Good, here is the subsequent difficulty:

5 0.0003 + 10 (1-0.0003) = 9.9985 5 0.0003 + 9.9985 (1-0.0003) = 9.997 5 0.0003 + 9.997 (1-0.0003) = 9.9955 ...

Gilfaro commented 2 hours ago

Value 0.05 is way too high as in my N=1 case and small benchmark value of about 0.01 or close to it makes the difficulty scale and either improves RMSE or barely changes it. Within the new 24.10 beta with new FSRS5 this is even worse, as soon as optimizer ends with 0.0 or close on w7 then if I change it to 0.01 or much higher, it improves all metrics by a lot in best case 20%

Expertium commented 2 hours ago

it improves all metrics by a lot in best case 20%

That's extremely weird, considering that the benchmarks show that the difference in RMSE between clamped w7 and unclamped is <1%

brishtibheja commented 1 hour ago

My noob idea is do it both clamped and unclamped when that parameter is 0.0 and keep the params that are better but of course, you can deliberate on a better solution.

Gilfaro commented 1 hour ago

@Expertium It is different as in benchmark case the clamping occurs on batch basis while in my manual case the clamping is done only at the end.

Expertium commented 53 minutes ago

I don't know what you mean