open-spaced-repetition / fsrs4anki

A modern Anki custom scheduling based on Free Spaced Repetition Scheduler algorithm
https://github.com/open-spaced-repetition/fsrs4anki/wiki
MIT License
2.77k stars 137 forks source link

Suggestion for Adjusting Difficulty Score to Use an Asymptote at 10 #697

Closed richard-lacasse closed 1 month ago

richard-lacasse commented 1 month ago

I've noticed that with my cards, they very quickly reach the maximum Difficulty score of 10. This might be somewhat due to my specific parameters, but I suspect others might experience a similar pattern, perhaps to a lesser extent.

When I mark a card as Again or Hard, it increases the Difficulty score significantly. However, when I mark a card as Good, it barely changes. While this is likely intentional, after getting a card wrong about half a dozen times, its Difficulty score becomes stuck at 10. Even if I subsequently get the card right 100 times, getting it wrong just once pushes it back to the maximum Difficulty.

The issue with this is that sorting cards by Difficulty Ascending becomes less useful because many cards are tied at $D = 10$. As a result, I lose the granularity that this sorting method could provide. I think this could be addressed by using an asymptote instead of a hard cap at 10, while keeping the original equation for Difficulty ($D$) the same.

Currently, when $D$ exceeds 10, I believe the algorithm uses something like:

D\prime = min(D\prime, 10)

This means any information beyond 10 is lost each time it's capped. Instead, we could use an asymptote with a damping factor. For example:

1. Linear Dampening Factor

Let:

\Delta D = D\prime - D

Then update $D\prime$ using:

D\prime = D + \Delta D \cdot (1 - \frac{D}{10})

In this equation, as $D$ approaches 10, the factor $1 - \frac{D}{10}$ decreases, reducing the increment $\Delta D$ and causing $D\prime$ to approach 10 asymptotically.

2. Exponential Damping Factor

Alternatively, you could parameterize the damping factor similar to how it's done in the Stability equation:

D\prime = D + \Delta D \cdot e^{-w (10 - D)}

Here, $w$ would be a new weight and is a positive constant that controls the rate of damping. As $D$ approaches 10, the exponential term $e^{-w (10 - D)}$ decreases, causing $D\prime$ to approach 10 asymptotically.

Implementing an asymptote at 10 could preserve the usefulness of sorting by Difficulty and prevent the loss of information due to capping. It might be worth exploring this adjustment in the next version of FSRS to see if it improves the algorithm.

Expertium commented 1 month ago

I'll benchmark it

user1823 commented 1 month ago

This change will be useful for sorting.

However, I don't think that it would improve the algorithm because there won't be any significant change in the value of 11 - D.

I will expect a greater improvement if we simultaneously update the formula that uses D for calculating S. (But, I don't know what changes should be made in that formula.)

Expertium commented 1 month ago

I interpreted his suggestion as "Add this to the main formula". And surprisingly, it (kind of) worked! image

I am shocked. I have tried so many changes to the D formula, so the fact that this helps even by 1% is surprising. @L-M-Sherlock we can improve RMSE by 1% (relatively) by doing this:

    def next_d(self, state: Tensor, rating: Tensor) -> Tensor:
        delta_d = - self.w[6] * (rating - 3)
        new_d = state[:, 1] + delta_d * (1 - (state[:, 1] / 10))  # linear damping
        new_d = self.mean_reversion(self.init_d(4), new_d)
        return new_d

It's not much, but it's "free" - no new parameters are needed

L-M-Sherlock commented 1 month ago

OK. I will benchmark it in 20k collections tomorrow. If the result is stable, I will add it to FSRS-5.

L-M-Sherlock commented 1 month ago

One problem: when D = 10, 1 - D / 10 = 0, rating Easy will not decrease D.

Expertium commented 1 month ago

Dang. I'll modify it and benchmark again

L-M-Sherlock commented 1 month ago
    def linear_damping_one_way(self, delta_d: Tensor, old_d: Tensor) -> Tensor:
        return torch.where(delta_d <= 0, delta_d, delta_d * (10 - old_d) / 9)

    def next_d(self, state: Tensor, rating: Tensor) -> Tensor:
        delta_d = -self.w[6] * (rating - 3)
        new_d = state[:, 1] + self.linear_damping_one_way(delta_d, state[:, 1])
        new_d = self.mean_reversion(self.init_d(4), new_d)
        return new_d

I will benchmark this one.

Expertium commented 1 month ago

Ok, looks good. Why are you dividing by 9, though? Oh, ok, I get it. Because 10-D is never 10, it's at most 9.

L-M-Sherlock commented 1 month ago

Weird. The variant is worse than FSRS-5 in my initial test with top 352 collections:

Model: FSRS-5-dev
Total number of users: 352
Total number of reviews: 10744330
Weighted average by reviews:
FSRS-5-dev-1 LogLoss (mean±std): 0.2939±0.1591
FSRS-5-dev-1 RMSE(bins) (mean±std): 0.0495±0.0335
FSRS-5-dev-1 AUC (mean±std): 0.7001±0.0771

Weighted average by log(reviews):
FSRS-5-dev LogLoss (mean±std): 0.3348±0.1636
FSRS-5-dev RMSE(bins) (mean±std): 0.0681±0.0416
FSRS-5-dev AUC (mean±std): 0.6950±0.0892

Weighted average by users:
FSRS-5-dev LogLoss (mean±std): 0.3380±0.1646
FSRS-5-dev RMSE(bins) (mean±std): 0.0706±0.0426
FSRS-5-dev AUC (mean±std): 0.6942±0.0914

Model: FSRS-5
Total number of users: 352
Total number of reviews: 10744330
Weighted average by reviews:
FSRS-5 LogLoss (mean±std): 0.2938±0.1587
FSRS-5 RMSE(bins) (mean±std): 0.0493±0.0332
FSRS-5 AUC (mean±std): 0.6989±0.0813

Weighted average by log(reviews):
FSRS-5 LogLoss (mean±std): 0.3344±0.1629
FSRS-5 RMSE(bins) (mean±std): 0.0677±0.0411
FSRS-5 AUC (mean±std): 0.6949±0.0905

Weighted average by users:
FSRS-5 LogLoss (mean±std): 0.3375±0.1639
FSRS-5 RMSE(bins) (mean±std): 0.0701±0.0421
FSRS-5 AUC (mean±std): 0.6941±0.0927
L-M-Sherlock commented 1 month ago

I'm benchmarking the two-way method. And it performs better:

Model: FSRS-5-dev
Total number of users: 118
Total number of reviews: 3151255
Weighted average by reviews:
FSRS-5-dev LogLoss (mean±std): 0.3465±0.1428
FSRS-5-dev RMSE(bins) (mean±std): 0.0557±0.0322
FSRS-5-dev AUC (mean±std): 0.6978±0.0639

Weighted average by log(reviews):
FSRS-5-dev LogLoss (mean±std): 0.3624±0.1551
FSRS-5-dev RMSE(bins) (mean±std): 0.0726±0.0395
FSRS-5-dev AUC (mean±std): 0.6995±0.0924

Weighted average by users:
FSRS-5-dev LogLoss (mean±std): 0.3631±0.1570
FSRS-5-dev RMSE(bins) (mean±std): 0.0747±0.0398
FSRS-5-dev AUC (mean±std): 0.6987±0.0955

Model: FSRS-5
Total number of users: 118
Total number of reviews: 3151255
Weighted average by reviews:
FSRS-5 LogLoss (mean±std): 0.3474±0.1433
FSRS-5 RMSE(bins) (mean±std): 0.0563±0.0322
FSRS-5 AUC (mean±std): 0.6952±0.0648

Weighted average by log(reviews):
FSRS-5 LogLoss (mean±std): 0.3629±0.1555
FSRS-5 RMSE(bins) (mean±std): 0.0728±0.0394
FSRS-5 AUC (mean±std): 0.6977±0.0942

Weighted average by users:
FSRS-5 LogLoss (mean±std): 0.3637±0.1573
FSRS-5 RMSE(bins) (mean±std): 0.0749±0.0397
FSRS-5 AUC (mean±std): 0.6969±0.0975

So... should we implement it?

Expertium commented 1 month ago

The two-way is the one where Easy (or any other grade) doesn't change difficulty if it's already at 10? No, let's not. Users will definitely complain.

L-M-Sherlock commented 1 month ago

Yeah. But it improves RMSE by 1% (relatively) among 8493 collections. It seems to say: ease hell is a myth, as similar to #695.

Expertium commented 1 month ago

I still think it would increase the number of complaints from users. @user1823 thoughts?

user1823 commented 1 month ago

Not decreasing the D when the user presses Easy seems to be wrong. But, the RMSE is decreasing. There can be two scenarios that explain this:

In either case, I think that implementing this would be worth it.

However, it will be great if we can think of some modification that prevents this problem while still producing a similar decrease in RMSE.

Expertium commented 1 month ago

I'm afraid that "I pressed Easy and D is still at 100%" will outweigh any algorithmic benefits

user1823 commented 1 month ago

If @richard-lacasse has an idea to solve this problem, then it would be great.

Otherwise, I think that we should implement this for the reason I mentioned here.

I'm afraid that "I pressed Easy and D is still at 100%" will outweigh any algorithmic benefits

To those users, say "just trust the algorithm".

Moreover, this issue is really not that important because with the proposed change, having a card with D = 10 is difficult.

Even currently, I have only 171 cards out of 15000 cards that have D exactly equal to 10, even though 30% of my collection has D between 9 and 10.

richard-lacasse commented 1 month ago

It's an asymptote, the Difficulty should never reach 10 exactly, that's the point. You'd have to hit Again so many times that it get smaller than machine epsilon, but not sure that's gonna be possible.

Expertium commented 1 month ago

Still, suppose that D=9.9. In that case difficulty will barely change if the user presses Easy.

richard-lacasse commented 1 month ago

But it will change a lot relative to the other cards in that space. If you want the changes to be reflected so users can see it, make Difficulty displayed on a log scale or something. I already have 99% of my cards showing up in the 90%-100% range, so the graph is already pretty useless.

richard-lacasse commented 1 month ago

Moreover, this issue is really not that important because with the proposed change, having a card with D = 10 is difficult.

Even currently, I have only 171 cards out of 15000 cards that have D exactly equal to 10, even though 30% of my collection has D between 9 and 10.

Maybe I'm just weird and this isn't a problem for other people.

Screen Shot 2024-10-18 at 8 17 23 AM

This is what my D distribution looks like. When I do a search, 2650/8888 cards fall into prop:d=1, which is maxed out.

richard-lacasse commented 1 month ago

Still, suppose that D=9.9. In that case difficulty will barely change if the user presses Easy.

One issue I can see is that even though it will be fine for sorting purposes, D is still affecting S, so stability won't get as affected by the Easy button once it get's close to 10. But if it's performing better in the simulations anyway...

But, it will take much longer to get near 9.9 because of the asymptote. Every change that's closer to 10 is smaller, so there will have to be a lot of Again button presses to even get near that.

Expertium commented 1 month ago

Alright, fine. @L-M-Sherlock it's up to you. Implement the two-way version if you want

richard-lacasse commented 1 month ago

Are people ever hitting Easy on high difficulty cards anyway? I know I never do. Those are the cards that have given me the most trouble.

user1823 commented 1 month ago

But, it will take much longer to get near 9.9 because of the asymptote. Every change that's closer to 10 is smaller, so there will have to be a lot of Again button presses to even get near that.

I did an experiment to confirm that.

With my parameters and rating_sequence 3, 1, 1, 1, 1, 1, 1, 1, I got the following successive values of D.

Current formula: 5.3624, 9.0620, 10, 10, 10, 10, 10, 10

Proposed formula: 5.3624, 7.0779, 8.1582, 8.8385, 9.2669, 9.5366, 9.7065, 9.8134

So, even though the current formula makes the D equal to 10 just after pressing Again twice, the proposed formula takes D to 9.8 even after 7 Again button presses.

L-M-Sherlock commented 1 month ago

Weighted by number of reviews

Model Parameters LogLoss RMSE (bins) AUC
FSRS-5-Linear Dampening 19 0.319±0.0053 0.049±0.0010 0.702±0.0028
FSRS-5 19 0.320±0.0052 0.050±0.0010 0.700±0.0028

The new median parameters: [0.40255, 1.18385, 3.173, 15.69105, 7.1949, 0.5345, 1.4604, 0.0046, 1.54575, 0.1192, 1.01925, 1.9395, 0.11, 0.29605, 2.2698, 0.2315, 2.9898, 0.51655, 0.6621]

The old median parameters: [0.4043, 1.18225, 3.13915, 15.6764, 7.2285, 0.4904, 1.0644, 0.0239, 1.6253, 0.1423, 1.0983, 1.95, 0.1014, 0.2998, 2.2834, 0.2407, 2.967, 0.5045, 0.66445]

brishtibheja commented 1 month ago

Maybe I'm just weird and this isn't a problem for other people.

@richard-lacasse I have known other users with this so it's not a problem specific to you. I like your idea, and don't think practically anyone would reach a high D value, then press Easy and fret about how DSR value changed.

L-M-Sherlock commented 1 month ago

It is worth noting that w[7] is reduced from 0.0239 to 0.0046. And w[9] is reduced from 0.1423 to 0.1192.

DerIshmaelite commented 1 month ago

Moreover, this issue is really not that important because with the proposed change, having a card with D = 10 is difficult. Even currently, I have only 171 cards out of 15000 cards that have D exactly equal to 10, even though 30% of my collection has D between 9 and 10.

Maybe I'm just weird and this isn't a problem for other people.

Screen Shot 2024-10-18 at 8 17 23 AM

This is what my D distribution looks like. When I do a search, 2650/8888 cards fall into prop:d=1, which is maxed out.

@richard-lacasse Nope you are pretty normal. In the current update, my new decks clock around pretty ungodly Difficulty values. image

richard-lacasse commented 1 month ago

the proposed formula takes D to 9.8 even after 7 Again button presses.

Can you run it to see how long it takes to reach 10 exactly? Mathematically that's impossible, but it will reach machine epsilon at some point and probably round up to 10.

user1823 commented 1 month ago

Assuming that I wrote the code correctly, the D didn't reach 9.995 even after 2000 consecutive Again presses.

L-M-Sherlock commented 1 month ago

It is worth noting that w[7] is reduced from 0.0239 to 0.0046.

Will it cause more complaints?

user1823 commented 1 month ago

It is worth noting that w[7] is reduced from 0.0239 to 0.0046.

Will it cause more complaints?

I don't think so. If D increases slowly on pressing Again, then it should also decrease slowly on pressing Good.

L-M-Sherlock commented 1 month ago

OK, I will implement it in FSRS-rs and benchmark it.

Expertium commented 1 month ago

You need to implement it both in the Rust version and in the Python version, otherwise the comparison won't be fair.

L-M-Sherlock commented 1 month ago

You need to implement it both in the Rust version and in the Python version, otherwise the comparison won't be fair.

I just haven't pushed the updated python version to the repo. The benchmark result above is based on the Python version.

L-M-Sherlock commented 1 month ago
image

With the Linear Dampening, I have more cards whose difficulty is in 10th bin...

My previous parameters: [1.0962, 1.543, 7.8692, 12.0038, 8.1849, 0.5031, 0.6852, 0.001, 1.3281, 0.0796, 0.8666, 2.5417, 0.0128, 0.2952, 0.7547, 0.0001, 3.2912, 0.1097, 0.6747]

My current parameters: [1.114, 1.601, 8.0182, 12.2391, 8.0261, 0.5454, 3.0807, 0.0026, 1.255, 0.0208, 0.7996, 2.5398, 0.0122, 0.3219, 0.3941, 0.0001, 3.6492, 0.102, 0.6778]

The w[6] increases significantly from 0.68 to 3.08.

image

The difficulty increases quickly, too.

However, it improves the metric:

image

By the way, the model version will be updated to FSRS-5.5 if we introduce this change because I have released FSRS-5 in some packages.

L-M-Sherlock commented 1 month ago

Bad news: the change makes FSRS-rs worse. I don't know why. I need to check the consistency between the Python version and the Rust version again.

Update: OK, I find out a serious problem and fix it:

Expertium commented 1 month ago

By the way, the model version will be updated to FSRS-5.5 if we introduce this change because I have released FSRS-5 in some packages.

Does this mean that the benchmark will have FSRS-5 and FSRS-5.5? Also, maybe call it FSRS-5.1 and not FSRS-5.5? It's a pretty small change, after all.

L-M-Sherlock commented 1 month ago

Also, maybe call it FSRS-5.1 and not FSRS-5.5? It's a pretty small change, after all.

It's not a small change because some parameters have been changed significantly. Actually, I guess we need to do something like:

user1823 commented 1 month ago

some parameters have been changed significantly.

I agree. For example,

My parameters with current main: 1.0588, 4.7011, 31.5471, 79.2369, 7.561, 0.9454, 3.0717, 0.0, 1.8531, 0.3046, 1.3861, 1.6149, 0.0357, 0.4327, 1.8443, 0.0003, 6.0, 0.9915, 0.6734

My parameters with linear damping: 1.0652, 4.7012, 30.4195, 79.3164, 7.5425, 1.0393, 3.8364, 0.0, 1.8143, 0.292, 1.3124, 1.5111, 0.0276, 0.4369, 1.7987, 0.0003, 6.0, 1.013, 0.6778

So, there is a 25% increase in w6 and a 23% decrease in w12. The other parameters have changed slightly.

RMSE with old parameters evaluated using old optimizer: 0.0140 RMSE with new parameters evaluated using new optimizer: 0.0138 RMSE with old parameters evaluated using new optimizer: 0.0156

This means that if the user doesn't optimize their parameters after updating, their RMSE will increase. (This is true for almost any change to the algorithm, though.)

Actually, I guess we need to do something like:

Unfortunately, finding a mathematical relation between the old and new parameters will be very difficult in this case.

Expertium commented 1 month ago

It's not a small change because some parameters have been changed significantly.

But the impact on metrics is small, around 1%. I think FSRS-5.1 makes more sense than FSRS-5.5 in this case

L-M-Sherlock commented 1 month ago

Another problem: what if the initial difficulty is 10? In this case, the difficulty is immutable (if w[7] is 0).

Fine. Now we have the reason to set a non-zero lower limit for w[7].

Gilfaro commented 1 month ago

Just as discussed in the other topic about w7 https://github.com/open-spaced-repetition/fsrs4anki/issues/695#issuecomment-2426490983 this change has shifted most optimal value of w7 more into negative, so this means optimizer wants to shit all cards into D=10 and clipper being mode value means that for most of the people all cards will want to be at max D all of the time.

Expertium commented 1 month ago

Then I guess we shouldn't implement damping after all.

Gilfaro commented 1 month ago

The problem is not with the damping itself, but optimizer prefers D=10 with the current setup, so damping D will mean that optimizer accommodates by decreasing w7 even further till it clips to whatever value you set. The best solution would be to investigate why and change the formulas so that it doesn't go into that optimization path.

Expertium commented 1 month ago

I've experimented with dozens of changes to the D formula and nothing worked. At this point I'm just waiting for someone smarter than me to completely re-define D.

L-M-Sherlock commented 1 month ago
image

A conservative converting is new w[6] = old w[6] + 0.5.

richard-lacasse commented 1 month ago

I've experimented with dozens of changes to the D formula and nothing worked. At this point I'm just waiting for someone smarter than me to completely re-define D.

@Expertium I know you've tried R as a parameter in the Difficulty equation, but did you also take R out of the parameter list in Stability? D is a parameter in Stability, so if R is a parameter in both, that might explain why adding R to Difficulty didn't help anything.

Expertium commented 1 month ago

R affecting the increase in stability is one of the most important principles of the DSR model. Without it, we can't model the spacing effect since R depends on interval length, so removing it from the formula of S would mean that S doesn't depend on the interval length.

richard-lacasse commented 1 month ago

S' depends on S, which has the interval length information, and if R is in the formula for Difficulty, and D is in the formula for Stability, then R is indirectly affecting the increase in Stability still.

I'm spitballing here a little bit, but this is the first thing I would toy with.