Closed user1823 closed 8 months ago
- 0.8331
Replace this item with 0.5 and reschedule the card. You will get smaller post-lapse stability.
- 0.8331
Replace this item with 0.5 and reschedule the card. You will get smaller post-lapse stability.
This is a workaround, not a solution. What is the point of optimization when we have to manually set the parameters?
So it is an issue related to the optimizer. Maybe we need to modify the formula of post-lapse stability or just add upper limit for this weight.
What is the point of optimization when we have to manually set the parameters?
If FSRS is a neutral network, it is impossible to set the parameters manually.
- 0.8331
Replace this item with 0.5 and reschedule the card. You will get smaller post-lapse stability.
By the way, even after replacing 0.8331 by 0.5, the stability for this card would decrease from 72.87 days to 11.07 days (which still doesn't fall within the range of 1-4 days).
Maybe we need to modify the formula of post-lapse stability or just add upper limit for this weight.
I think that just adding an upper limit would not be sufficient as in the above case.
If FSRS is a neural network, it is impossible to set the parameters manually.
Manually replacing the parameters in the scheduler code (just like you advised me to do) can be called as setting the parameters manually. Right?
In my above comment, I meant to ask what the benefit of optimization is if we are going to replace the optimized parameters with some value that we "think" is right.
By the way, even after replacing 0.8331 by 0.5, the stability for this card would decrease from 72.87 days to 11.07 days (which still doesn't fall within the range of 1-4 days).
Why should the stability fall within the range of 1-4 days?
Why should the stability fall within the range of 1-4 days?
Read this:
https://supermemo.guru/wiki/Post-lapse_stability says
It has been shown long ago that the length of the first post-lapse optimum interval is best correlated with the number of memory lapses recorded for the item. Even then, post-lapse interval usually oscillates in the range of 1-4 days for the default forgetting index of 10%. The correlation between lapses and the PLS is not very useful in adding to the efficiency of learning. Some competitive spaced repetition software, as well as SuperMemo in its first years, experimented with re-learning hypotheses based on ancient wisdoms of psychology, e.g. by halving intervals after a memory lapse. Current data shows clearly that this approach is harmful, as it slows down the identification of leeches. Such an approach to handling forgotten items is a form of irrational procrastination.
Woz mentioned the data proving that the large post-lapse stability is harmful. But he didn't publish the data. In my research, there is not such a limit. I think the point is whether the post-lapse stability given by FSRS is inaccurate.
Could you check the file stability_for_analysis.tsv
generated by the optimizer? Mind the stability of rows whose r_history
ends with 1
. It is the post-lapse stability calculated from your revlog.
I don't know how to interpret that file. So, I am sharing it (and others that might be useful) here. stability_for_analysis.tsv.zip revlog_history.tsv.zip prediction.tsv.zip
Also, this time, the optimizer yielded slightly different parameters (I don't know why) for the same .apkg file. The new parameters are: [1.0579, 1.6852, 5.0241, -1.2664, -1.1763, 0.0002, 1.7156, -0.0903, 1.0729, 1.7057, -0.4962, 0.8255, 0.4975]
The following is an example showing that the post-lapse stability calculated by FSRS helper is too large:
In this case, after the first lapse, FSRS gave this card an interval of 20 days. When the card came for review, I could not recall it and had to press Again
for the second time.
This inaccurate estimation of post-lapse stability affected me in two ways:
Again
ratings rather than two).For example, this row shows that the post-stability is 7.8 when you press one again
, six good
, and one again
. It is calculated from your revlog. It is not the prediction of FSRS.
Maybe it is better to remove $S$ and $w_{11}$ from $S^\prime_f(D,S,R) = w9\cdot D^{w{10}}\cdot S^{w{11}}\cdot e^{w{12}\cdot(1-R)}$.
For example, this row shows that the post-stability is 7.8 when you press one
again
, sixgood
, and oneagain
. It is calculated from your revlog. It is not the prediction of FSRS.
What do you mean to say here? Does this support the stability calculated by FSRS or does it show that the stability calculated by FSRS is larger than it should be?
I am asking this because the above example (shared by me) can't be easily compared with this because my example has 7 Good
ratings (not 6).
By the way, how can the optimizer determine the actual post-lapse stability when none of my cards would have been reviewed at this interval?
Maybe it is better to remove S and w7 from Sr′(D,S,R)=S⋅(ew6⋅(11−D)⋅Sw7⋅(ew8⋅(1−R)−1)+1).
Which S? S is used twice on the RHS of the equation.
By the way, I have no idea what should be done here to improve the calculation of the post-lapse stability.
Also, I can't comment on the suggestions you make (apart from just saying that the post-lapse intervals have increased/decreased and so, it looks worse/better).
Which S? S is used twice on the RHS of the equation.
The S before the lapse.
Maybe it is better to remove S and w7 from Sr′(D,S,R)=S⋅(ew6⋅(11−D)⋅Sw7⋅(ew8⋅(1−R)−1)+1).
By the way, this equation is for recall stability and we are currently talking about post-lapse stability.
My fault. I correct it.
Which S? S is used twice on the RHS of the equation.
The S before the lapse.
The S without power or the S with power.
I mean do you want the equation to look like $S^\prime_r(D,S,R) = (e^{w_6}\cdot (11-D)\cdot S \cdot(e^{w_8\cdot(1-R)}-1)+1)$
or this: $S^\prime_r(D,S,R) = S\cdot(e^{w_6}\cdot (11-D)\cdot (e^{w_8\cdot(1-R)}-1)+1)$ ?
Edit:
My fault. I correct it.
After this comment of yours, I think that this question is now obsolete.
Before: $S^\prime_f(D,S,R) = w9\cdot D^{w{10}}\cdot S^{w{11}}\cdot e^{w{12}\cdot(1-R)}$ After: $S^\prime_f(D,R) = w9\cdot D^{w{10}}\cdot e^{w_{12}\cdot(1-R)}$
Then, regardless of how large the stability was before the lapse, the post-lapse stability would not be affected.
Before: Sf′(D,S,R)=w9⋅Dw10⋅Sw11⋅ew12⋅(1−R) After: Sf′(D,R)=w9⋅Dw10⋅ew12⋅(1−R)
Then, regardless of how large the stability was before the lapse, the post-lapse stability would not be affected.
After making this change, the post-lapse stability would definitely be independent of the previous stability.
But now the question is whether it is the right way to solve this issue. Is this approach supported by theoretical considerations or by experiments?
But now the question is whether it is the right way to solve this issue. Is this approach supported by theoretical considerations or by experiments?
Just because the post-lapse stability in SuperMemo is independent of the previous stability. We can implement and test it.
There is one problem with this approach. What if the post-lapse stability for some cards became greater than the previous stability?
There is one problem with this approach. What if the post-lapse stability for some cards became greater than the previous stability?
I design a new experimental formula for the post-lapse stability: https://colab.research.google.com/github/open-spaced-repetition/fsrs4anki/blob/Expt/improve-post-lapse-stability/fsrs4anki_optimizer.ipynb
I replace $S^{w{11}}$ with $(S+offset)^{w{11}} - offset^{w{11}}$ because $S^{w{11}} > S$ when $S < 1$:
After this improvement, the loss and $w_{11}$ both decrease. It means the model become more accurate and the post-lapse stability is less affected by the stability before the lapse.
Could you test it with your data?
For me, this version seems to perform worse than the original one.
Log Loss | RMSE | R-squared | |
---|---|---|---|
Original Optimizer | 0.2241 | 0.0191 | 0.8728 |
post_lapse_stability_bonus | 0.2242 | 0.0201 | 0.8593 |
w: [1.0574, 1.6845, 5.0394, -1.376, -1.2214, 0.0002, 1.7266, -0.0738, 1.0821, 1.8622, -0.3466, 0.6787, 0.5199]
Also, though the post-lapse intervals have decreased, they are still quite large.
Also, the following parts from the SuperMemo website makes me think that the previous stability should not be considered in the calculation of post-lapse stability.
In the ideal case, for simple memories, forgetting results in a reset of estimated stability back to near-zero. In theory, only difficult items made of composite memories may show a substantial decrease in the costs of re-learning, however, even that does not show in data.
SuperMemo uses a separate matrix for post-lapse stabilities: PLS[] with Lapse and Retrievability dimensions. The first interval after scoring a failing grade is then determined as follows:
Int[1]:=PLS[Lapses,R]
where:
- Int[1] - first interval (after a failing grade)
- PLS[] - post-lapse interval matrix
- Lapses - total number of memory lapses (failing grades) scored by the item
- R - retrievability at the moment of the lapse
Source: https://supermemo.guru/wiki/Post-lapse_stability
What if the post-lapse stability for some cards became greater than the previous stability?
We should think of another way to solve this problem.
Also, the following parts from the SuperMemo website makes me think that the previous stability should not be considered in the calculation of post-lapse stability.
I have tried to remove the previous stability from the calculation of post-lapse stability. It will increase the loss. You can implement it for yourself. Or wait for me to publish the branch tomorrow.
I have tried to remove the previous stability from the calculation of post-lapse stability. It will increase the loss.
You were right. My results:
Log Loss | RMSE | R-squared | |
---|---|---|---|
Original Optimizer | 0.2241 | 0.0191 | 0.8728 |
post_lapse_stability_bonus | 0.2242 | 0.0201 | 0.8593 |
Independent of prev stability | 0.2306 | 0.0264 | 0.6294 |
w: [1.0574, 1.6845, 5.0451, -1.5322, -1.0032, 0.0001, 1.8611, -0.1568, 1.224, 2.7261, -0.0135, 0.2, 1.3968]
Intervals:
My code:
init_w = [1, 1, 5, -0.5, -0.5, 0.2, 1.4, -0.12, 0.8, 2, -0.2, 0.2, 1]
'''
w[0]: initial_stability_for_again_answer
w[1]: initial_stability_step_per_rating
w[2]: initial_difficulty_for_good_answer
w[3]: initial_difficulty_step_per_rating
w[4]: next_difficulty_step_per_rating
w[5]: next_difficulty_reversion_to_mean_speed (used to avoid ease hell)
w[6]: next_stability_factor_after_success
w[7]: next_stability_stabilization_decay_after_success
w[8]: next_stability_retrievability_gain_after_success
w[9]: next_stability_factor_after_failure
w[10]: next_stability_difficulty_decay_after_success
w[11]: next_stability_stability_gain_after_failure
w[12]: next_stability_retrievability_gain_after_failure
For more details about the parameters, please see:
https://github.com/open-spaced-repetition/fsrs4anki/wiki/Free-Spaced-Repetition-Scheduler
'''
class FSRS(nn.Module):
def __init__(self, w):
super(FSRS, self).__init__()
self.w = nn.Parameter(torch.FloatTensor(w))
self.zero = torch.FloatTensor([0.0])
def forward(self, x, s, d):
'''
:param x: [review interval, review response]
:param s: stability
:param d: difficulty
:return:
'''
if torch.equal(s, self.zero):
# first learn, init memory states
new_s = self.w[0] + self.w[1] * (x[1] - 1)
new_d = self.w[2] + self.w[3] * (x[1] - 3)
new_d = new_d.clamp(1, 10)
else:
r = torch.exp(np.log(0.9) * x[0] / s)
new_d = d + self.w[4] * (x[1] - 3)
new_d = self.mean_reversion(self.w[2], new_d)
new_d = new_d.clamp(1, 10)
# recall
if x[1] > 1:
new_s = s * (1 + torch.exp(self.w[6]) *
(11 - new_d) *
torch.pow(s, self.w[7]) *
(torch.exp((1 - r) * self.w[8]) - 1))
# forget
else:
new_s = self.w[9] * torch.pow(new_d, self.w[10]) * torch.exp((1 - r) * self.w[12])
return new_s, new_d
def loss(self, s, t, r):
return - (r * np.log(0.9) * t / s + (1 - r) * torch.log(1 - torch.exp(np.log(0.9) * t / s)))
def mean_reversion(self, init, current):
return self.w[5] * init + (1-self.w[5]) * current
class WeightClipper(object):
def __init__(self, frequency=1):
self.frequency = frequency
def __call__(self, module):
if hasattr(module, 'w'):
w = module.w.data
w[0] = w[0].clamp(0.1, 10)
w[1] = w[1].clamp(0.1, 5)
w[2] = w[2].clamp(1, 10)
w[3] = w[3].clamp(-5, -0.1)
w[4] = w[4].clamp(-5, -0.1)
w[5] = w[5].clamp(0, 0.5)
w[6] = w[6].clamp(0, 2)
w[7] = w[7].clamp(-0.2, -0.01)
w[8] = w[8].clamp(0.01, 1.5)
w[10] = w[10].clamp(-2, -0.01)
module.w.data = w
def lineToTensor(line):
ivl = line[0].split(',')
response = line[1].split(',')
tensor = torch.zeros(len(response), 2)
for li, response in enumerate(response):
tensor[li][0] = int(ivl[li])
tensor[li][1] = int(response)
return tensor
I guess your post-lapse stability is larger than you think. Maybe we need to change the issue. Just ignore the post-lapse stability and requestRetention
, and set a fix post-lapse interval for your case.
I replace the power function to log function. It will predict lower post-lapse stability than before:
I replace the power function to log function. It will predict lower post-lapse stability than before: https://colab.research.google.com/github/open-spaced-repetition/fsrs4anki/blob/Expt/improve-post-lapse-stability/fsrs4anki_optimizer.ipynb
I tried this function. My results:
Log Loss | RMSE | R-squared | |
---|---|---|---|
Original Optimizer | 0.2241 | 0.0191 | 0.8728 |
post_lapse_stability_bonus | 0.2242 | 0.0201 | 0.8593 |
Independent of prev stability | 0.2306 | 0.0264 | 0.6294 |
post_lapse_stability_bonus_log | 0.2245 | 0.0212 | 0.8439 |
So, all of these three approaches increased the log loss and RMSE.
w: [1.0574, 1.6845, 5.0521, -1.3947, -1.1878, 0.0002, 1.7234, -0.0659, 1.08, 1.8265, -0.3857, 1.2447, 0.4589]
Intervals:
So I tested 3 different versions:
torch.pow(new_l, -self.w[13])
, where new_l
is the number of lapses.
None of them improved performanceI tried clamping new_s for post-lapse stability like this:
new_s = (self.w[9] * torch.pow(new_d, self.w[10]) * torch.pow(s, self.w[11]) * torch.exp((1 - r) * self.w[12])).clamp(0.01, 4)
For most decks, the difference was negligible. For one deck, RMSE went down by 13%, and for another deck, it went up by 25%. Overall, it did not improve performance.
I also tried replacing the power function that determines how much the previous value of S affects the new value with a log function, like this:
But that also didn't improve performance.
Originally posted by @Expertium in https://github.com/open-spaced-repetition/fsrs4anki/issues/239#issuecomment-1544092926
I noted that SuperMemo uses the item difficulty for calculating the stability increase (on recall) but it just uses the number of lapses for calculating the post-lapse stability.
So, perhaps, we should try using the number of lapses instead of difficulty in the post-lapse stability function.
@user1823 I tested your idea
I noted that SuperMemo uses the item difficulty for calculating the stability increase (on recall) but it just uses the number of lapses for calculating the post-lapse stability.
So, perhaps, we should try using the number of lapses instead of difficulty in the post-lapse stability function.
I replaced D with lapses, like this:
# forget
else:
new_s = self.w[9] * torch.exp(self.w[10] * new_l) * torch.pow(
s, self.w[11]) * torch.exp((1 - r) * self.w[12])
It didn't improve performance.
I tried clamping new_s for post-lapse stability like this:
new_s = (self.w[9] * torch.pow(new_d, self.w[10]) * torch.pow(s, self.w[11]) * torch.exp((1 - r) * self.w[12])).clamp(0.01, 4)
@Expertium, I tried this idea of yours and also implemented this in the helper add-on (though I set maximum limit to 6 instead of 4).
I am using it since a few days and I found out that I had actually forgotten many of the cards that became due again because of this change.
So, this means that our thinking that FSRS is giving an unduly high post-lapse stability was correct.
And despite the optimizer showing slightly higher loss for this version, I think that I would use this until we find out a better solution.
So, this means that our thinking that FSRS is giving an unduly high post-lapse stability was correct.
We thought FSRS underestimates stability, no? Look at this (my collection, v3.17.1) Here it underestimates R (the blue line is above the orange line for most values of predicted R). And if R is underestimated, that means S is underestimated as well.
We thought FSRS underestimates stability, no?
I don't know about stabilities for other reviews. But, in my opinion, FSRS is overestimating the post-lapse stability.
Do you remember that Woz said the post-lapse stability usually oscillates in the range of 1-4 days? But, FSRS usually gives a very high post-lapse stability. So, this means that FSRS is overestimating the post-lapse stability.
Here it underestimates R (the blue line is above the orange line for most values of predicted R). And if R is underestimated, that means S is underestimated as well.
The calibration is drawn from all reviews. We can draw the calibration graph for reviews whose latest rating is again
.
We can draw the calibration graph for reviews whose latest rating is
again
.
That could be helpful!
plot_brier(dataset[dataset['r_history'].str.endswith('1')]['p'], dataset[dataset['r_history'].str.endswith('1')]['y'], bins=40)
Here, the dataset['r_history'].str.endswith('1')
means the latest rating is again
.
I used it in fsrs4anki_optimizer_alpha.ipynb
RMSE seems to be about the same for both.
I think that instead of looking at reviews where the grade was "Again", we should look at reviews that come immediately after that. In other words, we need to look at reviews where the most recent grade is "Hard", "Good" or "Easy", and the second most recent grade is "Again".
In other words, we need to look at reviews where the most recent grade is "Hard", "Good" or "Easy", and the second most recent grade is "Again".
But they are not post-lapse stability.
the second most recent grade is "Again".
Not second most recent but third most recent because the next review after the lapse is on the same day (with default relearning steps).
In other words, we need to look at reviews where the most recent grade is "Hard", "Good" or "Easy", and the second most recent grade is "Again".
But they are not post-lapse stability.
Maybe I'm missing something. So here's how I understand it 1) FSRS predicts some S using the regular (not post-lapse) formula 2) User presses "Again" 3) FSRS predicts some S using the post-lapse formula 4) User presses something So if we want to know how well the post-lapse stability is estimated, we need to look at reviews that come after the user pressed "Again"
the second most recent grade is "Again".
Not second most recent but third most recent because the next review after the lapse is on the same day (with default relearning steps).
I'm pretty sure intraday reviews are excluded from optimization
4. User presses something So if we want to know how well the post-lapse stability is estimated, we need to look at reviews that come after the user pressed "Again"
You are right. My words are misleading. The dataset['r_history'].str.endswith('1') means the latest rating in the history is again
. The dataset['y']
reflects the rating that come after the user pressed "Again".
I replaced '1' with '3' in y
plot_brier(dataset[dataset['r_history'].str.endswith('1')]['p'], dataset[dataset['r_history'].str.endswith('3')]['y'], bins=40)
I replaced '1' with '3' in y plot_brier(dataset[dataset['r_history'].str.endswith('1')]['p'], dataset[dataset['r_history'].str.endswith('3')]['y'], bins=40)
The two filter conditions should be consistent.
If you're saying that changing the number in str.endswith('4')]['y']
shouldn't affect the graph, then I have bad news - it affects the graph a lot.
the second most recent grade is "Again".
Not second most recent but third most recent because the next review after the lapse is on the same day (with default relearning steps).
I'm pretty sure intraday reviews are excluded from optimization
@L-M-Sherlock, can you confirm this?
For me, the post-lapse stability calculated by FSRS is usually very high. Take the example of the following card:
Here, the post-lapse stability is 73 days, which is not within the range of 1-4 days (as suggested by SuperMemo).
https://supermemo.guru/wiki/Post-lapse_stability says
Note that I did my reviews on AnkiDroid and then rescheduled them using the FSRS helper.
Environment