[Feature Request] Improve Calculation of Retreivability

brishtibheja commented 4 weeks ago

Using difficulty_asc as a sort order, I noticed having more of my lapses occur at later portions of the review session despite retrievability being the same. In Anki's stats screen, searches like prop:d>0.9 and prop:d<0.9 prop:d>0.8 and looking at the graphs makes it more evident.

I find FSRS consistently underestimating R for more easier cards such that average retention is higher than set DR for cards low in difficulty. This is in contrast with cards in prop:d>0.9 where retention is almost 10 percentage points below DR. It's erring in both directions.

Previously, @richard-lacasse has also reported similar experience.

One initial idea to solve this might be by trying to incorporate D in the formula for forgetting curve. Has this been tried before in some way?

Expertium commented 4 weeks ago

One initial idea to solve this might be by trying to incorporate D in the formula for forgetting curve. Has this been tried before in some way?

Nope. It would complicate things a lot. It's not compatible with how the initial S is estimated, it would screw up the "A 100 day interval will become x days." thingy and it would make stability less interpretable.

user1823 commented 3 weeks ago

R should only depend on S and time elapsed. D affects R indirectly by affecting S.

So, the actual issue here is that S is not calculated accurately for high and low D values.

richard-lacasse commented 3 weeks ago

I'm not sure if this is actually a problem, or just the algorithm tempering its speed that it will adapt to cards that are subjectively very easy or very hard. You don't want to overfit the model. It might end up being better that it doesn't "jump to conclusions" as it adjusts the difficulty. The variance of subjective difficulty is so broad, that there's always going to be cards that are scheduled incorrectly, but those are the cards FSRS will learn the most information from each time.

That being said, if it is a problem I agree with @user1823.

So, the actual issue here is that S is not calculated accurately for high and low D values.

I'm guessing you'd want to change how D is handled in the Stability equation.

Expertium commented 3 weeks ago

I'm guessing you'd want to change how D is handled in the Stability equation.

Tried that too. I couldn't find anything that improved the results.

brishtibheja commented 3 weeks ago

I was just thinking because I often have backlogs, maybe that affected the stats for high difficult cards. It's possible. But, still that wouldn't mean cards in the prop:d<0.9 prop:d>0.8 range will look like this:

Screenshot_2024-10-27-19-59-52-02_a9eef3a2a561b80d5c76daebd0f9a14c

I am going through a backlog of 2k cards this month which I got from rescheduling. My DR is set to .85.

I think we should make RMSE (bins) create the bins according to difficulty too. I gave a quick look at how it's done and didn't seem to find anything like this.

Expertium commented 3 weeks ago

I think we should make RMSE (bins) create the bins according to difficulty too. I gave a quick look at how it's done and didn't seem to find anything like this.

We can't make the binning method depend on D, S or R (well, IIRC we can, it's just painfully slow). The binning depends on: 1) Total number of reviews 2) Interval length 3) Number of lapses (Agains)

I'd say that the number of lapses is a good proxy for D

brishtibheja commented 3 weeks ago

I'd say that the number of lapses is a good proxy for D

Yes, you might be right. So we should see metrics improving if R prediction is improved for low/high D cards.

Actually, why is it not something like Pass/Fail ratio though? That sounds better to me.

brishtibheja commented 3 weeks ago

R should only depend on S and time elapsed. D affects R indirectly by affecting S.

@user1823 That cannot achieve all the effects we would possibly want.

Consider cards with a stability of 7d. Assuming, Anki has correctly assigned the stability, you remember around 90% of the material a week later. Then a month and a half later though, do we expect to remember the highly difficult cards at all. In some disciplines, it's possible you forget almost all the difficult cards a month later but you still retain some of the relatively easier cards. (I think this happens naturally, but also because of reasons like getting more real-life encounters with easier stuff or inherent reviews of easier material while doing other harder anki cards).

I think how you'd make changes that take that into account is by differing the formula for R on the basis of what value D has taken. As D rises, say the curve for R gets steeper and steeper.

Re: making S meaningful

That can be done if all the forgetting curves for the same S intersects at some point. Then, you can possibly still say for the S equals time it takes for R to reach .90 etcetra.

user1823 commented 3 weeks ago

In some disciplines, it's possible you forget almost all the difficult cards a month later but you still retain some of the relatively easier cards.

This is just another way of saying that the stability of some cards is less than the others.

brishtibheja commented 3 weeks ago

I don't get it. In my example, stability was same because R was really 90% after a week for both the easy and harder cards.

user1823 commented 3 weeks ago

How is it possible that R decreases at the same rate for both cards in the first week but later decreases faster for one of the cards?

If one card is harder, the R should decrease faster in the first week too, which means that its R after one week can't be equal to that of the other card.

Expertium commented 3 weeks ago

It's kind of possible. https://www.desmos.com/calculator/9pbylwb5yu See how R falls much faster if the function is exponential? You may be surprised what happens if we zoom in to the beginning, when S is les than or equal to 1. All three curves are practically indistinguishable.

So yeah, theoretically we could change the shape of the curve based on D, but as I said earlier

It's not compatible with how the initial S is estimated, it would screw up the "A 100 day interval will become x days." thingy and it would make stability less interpretable.

brishtibheja commented 3 weeks ago

How is it possible that R decreases at the same rate for both cards in the first week but later decreases faster for one of the cards?

I'm sure we'll need evidence for that but for now only experience guides me. E.g. I remember lot of the easy stuff I learned in Spanish but have forgotten almost all the hard stuff though I learned them around the same time.

If one card is harder, the R should decrease faster in the first week too

You failed the easy card and it's S became 1 week. I don't see why the Stability can't be the same. You'd need to start learning the cards at different times.

brishtibheja commented 3 weeks ago

it would make stability less interpretable.

I think I answered this but I don't have any solution for the other two. Maybe in the pop-up case, we can show an estimate like "intervals will increase by 13%" which would be the average.

Expertium commented 3 weeks ago

E.g. I remember lot of the easy stuff I learned in Spanish but have forgotten almost all the hard stuff though I learned them around the same time.

That just means that stability was different.

user1823 commented 3 weeks ago

You failed the easy card and it's S became 1 week.

I am not talking about the stability given by FSRS. I am saying that the actual S for those cards is different and this is the problem - FSRS is not able to calculate S with 100% accuracy.

theoretically we could change the shape of the curve based on D

Well, long ago (when we were developing FSRS 4), I said that we are introducing a power forgetting curve in FSRS only because we are unable to accurately calculate S. But, forgetting is exponential in nature. So, if we are somehow able to make the calculation of S very accurate, we will start using the exponential forgetting curve again.

L-M-Sherlock commented 1 day ago

If we change the shape of the curve based on D, given the same S, we'll obtain a family of forgetting curves intersecting at the point (S, 90%). However, they are non-overlapping.

Khajah, M. M., Lindsey, R. V., & Mozer, M. C. (2014). Maximizing Students’ Retention via Spaced Review: Practical Guidance From Computational Models of Memory. Topics in Cognitive Science, 6(1), 157–169. https://doi.org/10.1111/tops.12077

open-spaced-repetition / fsrs4anki

[Feature Request] Improve Calculation of Retreivability #703