FSRS gives a large post-lapse stability

user1823 commented 1 year ago

For me, the post-lapse stability calculated by FSRS is usually very high. Take the example of the following card:

Here, the post-lapse stability is 73 days, which is not within the range of 1-4 days (as suggested by SuperMemo).

https://supermemo.guru/wiki/Post-lapse_stability says

It has been shown long ago that the length of the first post-lapse optimum interval is best correlated with the number of memory lapses recorded for the item. Even then, post-lapse interval usually oscillates in the range of 1-4 days for the default forgetting index of 10%. The correlation between lapses and the PLS is not very useful in adding to the efficiency of learning. Some competitive spaced repetition software, as well as SuperMemo in its first years, experimented with re-learning hypotheses based on ancient wisdoms of psychology, e.g. by halving intervals after a memory lapse. Current data shows clearly that this approach is harmful, as it slows down the identification of leeches. Such an approach to handling forgotten items is a form of irrational procrastination.

Note that I did my reviews on AnkiDroid and then rescheduled them using the FSRS helper.

Environment

Scheduler version: v3.15.1
Helper version: Version currently on AnkiWeb (Last updated on 2023-04-29)
w: [1.0574, 1.6845, 5.0361, -1.3685, -1.2342, 0.0002, 1.7343, -0.0874, 1.0907, 1.7128, -0.4931, 0.8331, 0.4939]
requestRetention: 0.94

L-M-Sherlock commented 1 year ago

0.8331

Replace this item with 0.5 and reschedule the card. You will get smaller post-lapse stability.

user1823 commented 1 year ago

0.8331

Replace this item with 0.5 and reschedule the card. You will get smaller post-lapse stability.

This is a workaround, not a solution. What is the point of optimization when we have to manually set the parameters?

L-M-Sherlock commented 1 year ago

So it is an issue related to the optimizer. Maybe we need to modify the formula of post-lapse stability or just add upper limit for this weight.

L-M-Sherlock commented 1 year ago

What is the point of optimization when we have to manually set the parameters?

If FSRS is a neutral network, it is impossible to set the parameters manually.

user1823 commented 1 year ago

0.8331

Replace this item with 0.5 and reschedule the card. You will get smaller post-lapse stability.

By the way, even after replacing 0.8331 by 0.5, the stability for this card would decrease from 72.87 days to 11.07 days (which still doesn't fall within the range of 1-4 days).

Maybe we need to modify the formula of post-lapse stability or just add upper limit for this weight.

I think that just adding an upper limit would not be sufficient as in the above case.

If FSRS is a neural network, it is impossible to set the parameters manually.

Manually replacing the parameters in the scheduler code (just like you advised me to do) can be called as setting the parameters manually. Right?

In my above comment, I meant to ask what the benefit of optimization is if we are going to replace the optimized parameters with some value that we "think" is right.

L-M-Sherlock commented 1 year ago

By the way, even after replacing 0.8331 by 0.5, the stability for this card would decrease from 72.87 days to 11.07 days (which still doesn't fall within the range of 1-4 days).

Why should the stability fall within the range of 1-4 days?

user1823 commented 1 year ago

Why should the stability fall within the range of 1-4 days?

Read this:

https://supermemo.guru/wiki/Post-lapse_stability says

It has been shown long ago that the length of the first post-lapse optimum interval is best correlated with the number of memory lapses recorded for the item. Even then, post-lapse interval usually oscillates in the range of 1-4 days for the default forgetting index of 10%. The correlation between lapses and the PLS is not very useful in adding to the efficiency of learning. Some competitive spaced repetition software, as well as SuperMemo in its first years, experimented with re-learning hypotheses based on ancient wisdoms of psychology, e.g. by halving intervals after a memory lapse. Current data shows clearly that this approach is harmful, as it slows down the identification of leeches. Such an approach to handling forgotten items is a form of irrational procrastination.

L-M-Sherlock commented 1 year ago

Woz mentioned the data proving that the large post-lapse stability is harmful. But he didn't publish the data. In my research, there is not such a limit. I think the point is whether the post-lapse stability given by FSRS is inaccurate.

L-M-Sherlock commented 1 year ago

Could you check the file stability_for_analysis.tsv generated by the optimizer? Mind the stability of rows whose r_history ends with 1. It is the post-lapse stability calculated from your revlog.

user1823 commented 1 year ago

I don't know how to interpret that file. So, I am sharing it (and others that might be useful) here. stability_for_analysis.tsv.zip revlog_history.tsv.zip prediction.tsv.zip

Also, this time, the optimizer yielded slightly different parameters (I don't know why) for the same .apkg file. The new parameters are: [1.0579, 1.6852, 5.0241, -1.2664, -1.1763, 0.0002, 1.7156, -0.0903, 1.0729, 1.7057, -0.4962, 0.8255, 0.4975]

user1823 commented 1 year ago

The following is an example showing that the post-lapse stability calculated by FSRS helper is too large:

In this case, after the first lapse, FSRS gave this card an interval of 20 days. When the card came for review, I could not recall it and had to press Again for the second time. This inaccurate estimation of post-lapse stability affected me in two ways:

It caused me to forget a card that I could have recalled if it came a bit sooner.
It caused the difficulty to climb to 10 (because there were now three Again ratings rather than two).

L-M-Sherlock commented 1 year ago

For example, this row shows that the post-stability is 7.8 when you press one again, six good, and one again. It is calculated from your revlog. It is not the prediction of FSRS.

L-M-Sherlock commented 1 year ago

Maybe it is better to remove $S$ and $w_{11}$ from $S^\prime_f(D,S,R) = w9\cdot D^{w{10}}\cdot S^{w{11}}\cdot e^{w{12}\cdot(1-R)}$.

user1823 commented 1 year ago

For example, this row shows that the post-stability is 7.8 when you press one again, six good, and one again. It is calculated from your revlog. It is not the prediction of FSRS.

What do you mean to say here? Does this support the stability calculated by FSRS or does it show that the stability calculated by FSRS is larger than it should be?

I am asking this because the above example (shared by me) can't be easily compared with this because my example has 7 Good ratings (not 6).

By the way, how can the optimizer determine the actual post-lapse stability when none of my cards would have been reviewed at this interval?

For cards that were lapsed before I switched to FSRS, the post-lapse intervals were 20m, 1d, 3d, 7d, 18d, 22d ...
For cards that were lapsed after I switched to FSRS, the post-lapse intervals were much longer (and that's why the issue exists).

user1823 commented 1 year ago

Maybe it is better to remove S and w7 from Sr′(D,S,R)=S⋅(ew6⋅(11−D)⋅Sw7⋅(ew8⋅(1−R)−1)+1).

Which S? S is used twice on the RHS of the equation.

By the way, I have no idea what should be done here to improve the calculation of the post-lapse stability.

Also, I can't comment on the suggestions you make (apart from just saying that the post-lapse intervals have increased/decreased and so, it looks worse/better).

L-M-Sherlock commented 1 year ago

Which S? S is used twice on the RHS of the equation.

The S before the lapse.

user1823 commented 1 year ago

Maybe it is better to remove S and w7 from Sr′(D,S,R)=S⋅(ew6⋅(11−D)⋅Sw7⋅(ew8⋅(1−R)−1)+1).

By the way, this equation is for recall stability and we are currently talking about post-lapse stability.

L-M-Sherlock commented 1 year ago

My fault. I correct it.

user1823 commented 1 year ago

Which S? S is used twice on the RHS of the equation.

The S before the lapse.

The S without power or the S with power.

I mean do you want the equation to look like $S^\prime_r(D,S,R) = (e^{w_6}\cdot (11-D)\cdot S \cdot(e^{w_8\cdot(1-R)}-1)+1)$

or this: $S^\prime_r(D,S,R) = S\cdot(e^{w_6}\cdot (11-D)\cdot (e^{w_8\cdot(1-R)}-1)+1)$ ?

Edit:

My fault. I correct it.

After this comment of yours, I think that this question is now obsolete.

L-M-Sherlock commented 1 year ago

Before: $S^\prime_f(D,S,R) = w9\cdot D^{w{10}}\cdot S^{w{11}}\cdot e^{w{12}\cdot(1-R)}$ After: $S^\prime_f(D,R) = w9\cdot D^{w{10}}\cdot e^{w_{12}\cdot(1-R)}$

Then, regardless of how large the stability was before the lapse, the post-lapse stability would not be affected.

user1823 commented 1 year ago

Before: Sf′(D,S,R)=w9⋅Dw10⋅Sw11⋅ew12⋅(1−R) After: Sf′(D,R)=w9⋅Dw10⋅ew12⋅(1−R)

Then, regardless of how large the stability was before the lapse, the post-lapse stability would not be affected.

After making this change, the post-lapse stability would definitely be independent of the previous stability.

But now the question is whether it is the right way to solve this issue. Is this approach supported by theoretical considerations or by experiments?

L-M-Sherlock commented 1 year ago

But now the question is whether it is the right way to solve this issue. Is this approach supported by theoretical considerations or by experiments?

Just because the post-lapse stability in SuperMemo is independent of the previous stability. We can implement and test it.

user1823 commented 1 year ago

There is one problem with this approach. What if the post-lapse stability for some cards became greater than the previous stability?

L-M-Sherlock commented 1 year ago

There is one problem with this approach. What if the post-lapse stability for some cards became greater than the previous stability?

I design a new experimental formula for the post-lapse stability: https://colab.research.google.com/github/open-spaced-repetition/fsrs4anki/blob/Expt/improve-post-lapse-stability/fsrs4anki_optimizer.ipynb

I replace $S^{w{11}}$ with $(S+offset)^{w{11}} - offset^{w{11}}$ because $S^{w{11}} > S$ when $S < 1$:

After this improvement, the loss and $w_{11}$ both decrease. It means the model become more accurate and the post-lapse stability is less affected by the stability before the lapse.

Could you test it with your data?

user1823 commented 1 year ago

For me, this version seems to perform worse than the original one.

	Log Loss	RMSE	R-squared
Original Optimizer	0.2241	0.0191	0.8728
post_lapse_stability_bonus	0.2242	0.0201	0.8593

w: [1.0574, 1.6845, 5.0394, -1.376, -1.2214, 0.0002, 1.7266, -0.0738, 1.0821, 1.8622, -0.3466, 0.6787, 0.5199]

Also, though the post-lapse intervals have decreased, they are still quite large.

Also, the following parts from the SuperMemo website makes me think that the previous stability should not be considered in the calculation of post-lapse stability.

In the ideal case, for simple memories, forgetting results in a reset of estimated stability back to near-zero. In theory, only difficult items made of composite memories may show a substantial decrease in the costs of re-learning, however, even that does not show in data.

SuperMemo uses a separate matrix for post-lapse stabilities: PLS[] with Lapse and Retrievability dimensions. The first interval after scoring a failing grade is then determined as follows:

Int[1]:=PLS[Lapses,R]

where:

Int[1] - first interval (after a failing grade)

PLS[] - post-lapse interval matrix

Lapses - total number of memory lapses (failing grades) scored by the item

R - retrievability at the moment of the lapse

Source: https://supermemo.guru/wiki/Post-lapse_stability

What if the post-lapse stability for some cards became greater than the previous stability?

We should think of another way to solve this problem.

L-M-Sherlock commented 1 year ago

Also, the following parts from the SuperMemo website makes me think that the previous stability should not be considered in the calculation of post-lapse stability.

I have tried to remove the previous stability from the calculation of post-lapse stability. It will increase the loss. You can implement it for yourself. Or wait for me to publish the branch tomorrow.

user1823 commented 1 year ago

I have tried to remove the previous stability from the calculation of post-lapse stability. It will increase the loss.

You were right. My results:

	Log Loss	RMSE	R-squared
Original Optimizer	0.2241	0.0191	0.8728
post_lapse_stability_bonus	0.2242	0.0201	0.8593
Independent of prev stability	0.2306	0.0264	0.6294

w: [1.0574, 1.6845, 5.0451, -1.5322, -1.0032, 0.0001, 1.8611, -0.1568, 1.224, 2.7261, -0.0135, 0.2, 1.3968]

Intervals:

My code:

init_w = [1, 1, 5, -0.5, -0.5, 0.2, 1.4, -0.12, 0.8, 2, -0.2, 0.2, 1]
'''
w[0]: initial_stability_for_again_answer
w[1]: initial_stability_step_per_rating
w[2]: initial_difficulty_for_good_answer
w[3]: initial_difficulty_step_per_rating
w[4]: next_difficulty_step_per_rating
w[5]: next_difficulty_reversion_to_mean_speed (used to avoid ease hell)
w[6]: next_stability_factor_after_success
w[7]: next_stability_stabilization_decay_after_success
w[8]: next_stability_retrievability_gain_after_success
w[9]: next_stability_factor_after_failure
w[10]: next_stability_difficulty_decay_after_success
w[11]: next_stability_stability_gain_after_failure
w[12]: next_stability_retrievability_gain_after_failure
For more details about the parameters, please see: 
https://github.com/open-spaced-repetition/fsrs4anki/wiki/Free-Spaced-Repetition-Scheduler
'''

class FSRS(nn.Module):
    def __init__(self, w):
        super(FSRS, self).__init__()
        self.w = nn.Parameter(torch.FloatTensor(w))
        self.zero = torch.FloatTensor([0.0])

    def forward(self, x, s, d):
        '''
        :param x: [review interval, review response]
        :param s: stability
        :param d: difficulty
        :return:
        '''
        if torch.equal(s, self.zero):
            # first learn, init memory states
            new_s = self.w[0] + self.w[1] * (x[1] - 1)
            new_d = self.w[2] + self.w[3] * (x[1] - 3)
            new_d = new_d.clamp(1, 10)
        else:
            r = torch.exp(np.log(0.9) * x[0] / s)
            new_d = d + self.w[4] * (x[1] - 3)
            new_d = self.mean_reversion(self.w[2], new_d)
            new_d = new_d.clamp(1, 10)
            # recall
            if x[1] > 1:
                new_s = s * (1 + torch.exp(self.w[6]) *
                             (11 - new_d) *
                             torch.pow(s, self.w[7]) *
                             (torch.exp((1 - r) * self.w[8]) - 1))
            # forget
            else:
                new_s = self.w[9] * torch.pow(new_d, self.w[10]) * torch.exp((1 - r) * self.w[12])
        return new_s, new_d

    def loss(self, s, t, r):
        return - (r * np.log(0.9) * t / s + (1 - r) * torch.log(1 - torch.exp(np.log(0.9) * t / s)))

    def mean_reversion(self, init, current):
        return self.w[5] * init + (1-self.w[5]) * current

class WeightClipper(object):
    def __init__(self, frequency=1):
        self.frequency = frequency

    def __call__(self, module):
        if hasattr(module, 'w'):
            w = module.w.data
            w[0] = w[0].clamp(0.1, 10)
            w[1] = w[1].clamp(0.1, 5)
            w[2] = w[2].clamp(1, 10)
            w[3] = w[3].clamp(-5, -0.1)
            w[4] = w[4].clamp(-5, -0.1)
            w[5] = w[5].clamp(0, 0.5)
            w[6] = w[6].clamp(0, 2)
            w[7] = w[7].clamp(-0.2, -0.01)
            w[8] = w[8].clamp(0.01, 1.5)
            w[10] = w[10].clamp(-2, -0.01)
            module.w.data = w

def lineToTensor(line):
    ivl = line[0].split(',')
    response = line[1].split(',')
    tensor = torch.zeros(len(response), 2)
    for li, response in enumerate(response):
        tensor[li][0] = int(ivl[li])
        tensor[li][1] = int(response)
    return tensor

L-M-Sherlock commented 1 year ago

I guess your post-lapse stability is larger than you think. Maybe we need to change the issue. Just ignore the post-lapse stability and requestRetention, and set a fix post-lapse interval for your case.

L-M-Sherlock commented 1 year ago

I replace the power function to log function. It will predict lower post-lapse stability than before:

https://colab.research.google.com/github/open-spaced-repetition/fsrs4anki/blob/Expt/improve-post-lapse-stability/fsrs4anki_optimizer.ipynb

user1823 commented 1 year ago

I replace the power function to log function. It will predict lower post-lapse stability than before: https://colab.research.google.com/github/open-spaced-repetition/fsrs4anki/blob/Expt/improve-post-lapse-stability/fsrs4anki_optimizer.ipynb

I tried this function. My results:

	Log Loss	RMSE	R-squared
Original Optimizer	0.2241	0.0191	0.8728
post_lapse_stability_bonus	0.2242	0.0201	0.8593
Independent of prev stability	0.2306	0.0264	0.6294
post_lapse_stability_bonus_log	0.2245	0.0212	0.8439

So, all of these three approaches increased the log loss and RMSE.

w: [1.0574, 1.6845, 5.0521, -1.3947, -1.1878, 0.0002, 1.7234, -0.0659, 1.08, 1.8265, -0.3857, 1.2447, 0.4589]

Intervals:

Expertium commented 1 year ago

So I tested 3 different versions:

The offset version (https://colab.research.google.com/github/open-spaced-repetition/fsrs4anki/blob/Expt/improve-post-lapse-stability/fsrs4anki_optimizer.ipynb)
v3.17.1 where I just remove S^w_11.
v3.17.1 where I remove S^w_11 and replace it with torch.pow(new_l, -self.w[13]), where new_l is the number of lapses. None of them improved performance

user1823 commented 1 year ago

I tried clamping new_s for post-lapse stability like this:

new_s = (self.w[9] * torch.pow(new_d, self.w[10]) * torch.pow(s, self.w[11]) * torch.exp((1 - r) * self.w[12])).clamp(0.01, 4)

For most decks, the difference was negligible. For one deck, RMSE went down by 13%, and for another deck, it went up by 25%. Overall, it did not improve performance.

I also tried replacing the power function that determines how much the previous value of S affects the new value with a log function, like this:

But that also didn't improve performance.

Originally posted by @Expertium in https://github.com/open-spaced-repetition/fsrs4anki/issues/239#issuecomment-1544092926

user1823 commented 1 year ago

I noted that SuperMemo uses the item difficulty for calculating the stability increase (on recall) but it just uses the number of lapses for calculating the post-lapse stability.

So, perhaps, we should try using the number of lapses instead of difficulty in the post-lapse stability function.

Expertium commented 1 year ago

@user1823 I tested your idea

I noted that SuperMemo uses the item difficulty for calculating the stability increase (on recall) but it just uses the number of lapses for calculating the post-lapse stability.

So, perhaps, we should try using the number of lapses instead of difficulty in the post-lapse stability function.

I replaced D with lapses, like this:

            # forget
            else:
                new_s = self.w[9] * torch.exp(self.w[10] * new_l) * torch.pow(
                    s, self.w[11]) * torch.exp((1 - r) * self.w[12])

It didn't improve performance.

user1823 commented 1 year ago

I tried clamping new_s for post-lapse stability like this:
new_s = (self.w[9] * torch.pow(new_d, self.w[10]) * torch.pow(s, self.w[11]) * torch.exp((1 - r) * self.w[12])).clamp(0.01, 4)

@Expertium, I tried this idea of yours and also implemented this in the helper add-on (though I set maximum limit to 6 instead of 4).

I am using it since a few days and I found out that I had actually forgotten many of the cards that became due again because of this change.

So, this means that our thinking that FSRS is giving an unduly high post-lapse stability was correct.

And despite the optimizer showing slightly higher loss for this version, I think that I would use this until we find out a better solution.

Expertium commented 1 year ago

So, this means that our thinking that FSRS is giving an unduly high post-lapse stability was correct.

We thought FSRS underestimates stability, no? Look at this (my collection, v3.17.1) Calibration graph (entire collection, v3 17 1) Here it underestimates R (the blue line is above the orange line for most values of predicted R). And if R is underestimated, that means S is underestimated as well.

user1823 commented 1 year ago

We thought FSRS underestimates stability, no?

I don't know about stabilities for other reviews. But, in my opinion, FSRS is overestimating the post-lapse stability.

Do you remember that Woz said the post-lapse stability usually oscillates in the range of 1-4 days? But, FSRS usually gives a very high post-lapse stability. So, this means that FSRS is overestimating the post-lapse stability.

L-M-Sherlock commented 1 year ago

Here it underestimates R (the blue line is above the orange line for most values of predicted R). And if R is underestimated, that means S is underestimated as well.

The calibration is drawn from all reviews. We can draw the calibration graph for reviews whose latest rating is again.

Expertium commented 1 year ago

We can draw the calibration graph for reviews whose latest rating is again.

That could be helpful!

L-M-Sherlock commented 1 year ago

plot_brier(dataset[dataset['r_history'].str.endswith('1')]['p'], dataset[dataset['r_history'].str.endswith('1')]['y'], bins=40)

Here, the dataset['r_history'].str.endswith('1') means the latest rating is again.

Expertium commented 1 year ago

I used it in fsrs4anki_optimizer_alpha.ipynb RMSE seems to be about the same for both.

I think that instead of looking at reviews where the grade was "Again", we should look at reviews that come immediately after that. In other words, we need to look at reviews where the most recent grade is "Hard", "Good" or "Easy", and the second most recent grade is "Again".

L-M-Sherlock commented 1 year ago

In other words, we need to look at reviews where the most recent grade is "Hard", "Good" or "Easy", and the second most recent grade is "Again".

But they are not post-lapse stability.

user1823 commented 1 year ago

the second most recent grade is "Again".

Not second most recent but third most recent because the next review after the lapse is on the same day (with default relearning steps).

Expertium commented 1 year ago

In other words, we need to look at reviews where the most recent grade is "Hard", "Good" or "Easy", and the second most recent grade is "Again".

But they are not post-lapse stability.

Maybe I'm missing something. So here's how I understand it 1) FSRS predicts some S using the regular (not post-lapse) formula 2) User presses "Again" 3) FSRS predicts some S using the post-lapse formula 4) User presses something So if we want to know how well the post-lapse stability is estimated, we need to look at reviews that come after the user pressed "Again"

Expertium commented 1 year ago

the second most recent grade is "Again".

Not second most recent but third most recent because the next review after the lapse is on the same day (with default relearning steps).

I'm pretty sure intraday reviews are excluded from optimization

L-M-Sherlock commented 1 year ago

4. User presses something So if we want to know how well the post-lapse stability is estimated, we need to look at reviews that come after the user pressed "Again"

You are right. My words are misleading. The dataset['r_history'].str.endswith('1') means the latest rating in the history is again. The dataset['y'] reflects the rating that come after the user pressed "Again".

Expertium commented 1 year ago

I replaced '1' with '3' in y plot_brier(dataset[dataset['r_history'].str.endswith('1')]['p'], dataset[dataset['r_history'].str.endswith('3')]['y'], bins=40)

L-M-Sherlock commented 1 year ago

I replaced '1' with '3' in y plot_brier(dataset[dataset['r_history'].str.endswith('1')]['p'], dataset[dataset['r_history'].str.endswith('3')]['y'], bins=40)

The two filter conditions should be consistent.

Expertium commented 1 year ago

If you're saying that changing the number in str.endswith('4')]['y'] shouldn't affect the graph, then I have bad news - it affects the graph a lot.

user1823 commented 1 year ago

the second most recent grade is "Again".

Not second most recent but third most recent because the next review after the lapse is on the same day (with default relearning steps).

I'm pretty sure intraday reviews are excluded from optimization

@L-M-Sherlock, can you confirm this?

open-spaced-repetition / fsrs4anki

FSRS gives a large post-lapse stability #246