Open Gilfaro opened 2 days ago
Even manually changing the parameter after optimization results in better log loss and rmse(bins).
Could you reproduce it in 50%+ cases?
I think it's better to benchmark it on Anki 20k.
Currently tested on 5 decks in Anki beta 2 and it improved it for all my decks. Here is sample run from small benchmark where numbers are nearly the same, but from end user perspective it is much better if difficulty can be modified by using good/again. Sometimes values much higher than clipper are more optimal, but once the optimizer decides on 0 it seems to be stuck in that local minimum.
Model: FSRS-5
Total number of users: 10
Total number of reviews: 255400
Weighted average by reviews:
FSRS-5 LogLoss (mean±std): 0.3737±0.1183
FSRS-5 RMSE(bins) (mean±std): 0.0496±0.0188
FSRS-5 AUC (mean±std): 0.7064±0.0897
Weighted average by log(reviews):
FSRS-5 LogLoss (mean±std): 0.3660±0.1512
FSRS-5 RMSE(bins) (mean±std): 0.0647±0.0245
FSRS-5 AUC (mean±std): 0.6741±0.1350
Weighted average by users:
FSRS-5 LogLoss (mean±std): 0.3641±0.1547
FSRS-5 RMSE(bins) (mean±std): 0.0671±0.0247
FSRS-5 AUC (mean±std): 0.6668±0.1387
parameters: [0.32095, 1.0439, 2.5808, 15.4029, 7.20385, 0.5284, 1.02275, 0.02555, 1.63765, 0.225, 1.04685, 1.9363, 0.0724, 0.27135, 2.214, 0.34025, 3.1823, 0.6103, 0.73345]
Model: FSRS-5-clamp7
Total number of users: 10
Total number of reviews: 255400
Weighted average by reviews:
FSRS-5 LogLoss (mean±std): 0.3738±0.1183
FSRS-5 RMSE(bins) (mean±std): 0.0499±0.0185
FSRS-5 AUC (mean±std): 0.7062±0.0898
Weighted average by log(reviews):
FSRS-5 LogLoss (mean±std): 0.3658±0.1511
FSRS-5 RMSE(bins) (mean±std): 0.0647±0.0243
FSRS-5 AUC (mean±std): 0.6742±0.1350
Weighted average by users:
FSRS-5 LogLoss (mean±std): 0.3639±0.1545
FSRS-5 RMSE(bins) (mean±std): 0.0671±0.0245
FSRS-5 AUC (mean±std): 0.6669±0.1387
parameters: [0.3195, 1.04455, 2.5806, 15.4029, 7.208, 0.47475, 1.03605, 0.0252, 1.62875, 0.22925, 1.02125, 1.9363, 0.07145, 0.2694, 2.2139, 0.3364, 3.18255, 0.61735, 0.73345]
@L-M-Sherlock we've talked about this before, and ran these kinds of benchmarks before. The conclusion here is the same - clamping barely affects the metrics. So it just comes down to preferences - do we want D to always change if the user pressed Good, even if it's a small change? I'd say yes.
In principle, I like the suggestion, but experience suggests otherwise.
In https://github.com/open-spaced-repetition/fsrs4anki/commit/20d2dae96c33ab05c79cf65e4e2cc67d55251513, L-M-Sherlock used 0.05 as the minimum value for this parameter. (It was called w[5] at that time.)
But, using 0.05 as the minimum limit with my collection, not only made RMSE worse, but also increased the workload (parameters calculated with the new limit on exactly the same collection gave me a backlog of 900 extra cards). First reported in https://github.com/open-spaced-repetition/fsrs4anki/issues/342#issuecomment-1633451433
L-M-Sherlock gave this explanation for the issue:
One rational explanation for your case is, you don't have ease hell, but w[5] assume you have. So w[5] will decrease the difficulty in the long-term. Then w[4] would increase to counteract or even override it, which induces the workload.
To fix the issue, he decreased the lower limit back to 0.
I advised using a small but non-zero lower limit (such as 0.0003)
His response was that such a low value won't result in any appreciable mean reversion and, thus, is no better than using 0.
The difference between 0.0003 and 0 is pretty small. If the initial difficulty is 5 and the current difficulty is 10. If you always press Good, here is the subsequent difficulty:
5 0.0003 + 10 (1-0.0003) = 9.9985 5 0.0003 + 9.9985 (1-0.0003) = 9.997 5 0.0003 + 9.997 (1-0.0003) = 9.9955 ...
Value 0.05 is way too high as in my N=1 case and small benchmark value of about 0.01 or close to it makes the difficulty scale and either improves RMSE or barely changes it. Within the new 24.10 beta with new FSRS5 this is even worse, as soon as optimizer ends with 0.0 or close on w7 then if I change it to 0.01 or much higher, it improves all metrics by a lot in best case 20%
it improves all metrics by a lot in best case 20%
That's extremely weird, considering that the benchmarks show that the difference in RMSE between clamped w7 and unclamped is <1%
My noob idea is do it both clamped and unclamped when that parameter is 0.0 and keep the params that are better but of course, you can deliberate on a better solution.
@Expertium It is different as in benchmark case the clamping occurs on batch basis while in my manual case the clamping is done only at the end.
I don't know what you mean
FSRS 5 even more than 4.5 likes to optimize parameter 7 which also disables difficulty decay to 0. Even manually changing the parameter after optimization results in better log loss and rmse(bins). Probably needs more testing in benchmark, but setting minimum at about 0.0100 would fix this issue and improve the fit to data.