open-spaced-repetition / py-fsrs

Python Package for FSRS
https://pypi.org/project/fsrs/
MIT License
145 stars 23 forks source link

[Question] Migration to FSRS from an alternative algorithm #20

Closed l3kn closed 11 months ago

l3kn commented 11 months ago

I'm not sure if this is the best place to ask this question but it seems like a useful feature to have in this package.

For the past few years, I've been using my own spaced repetition system with an algorithm based on SM2 and modified in some of the same ways as the one in Anki.

For each card in my collection, I have stored the full review history in a way that would allow me to recover both the ease and interval at the time of each review in its history. The algorithm used a list of fixed learning intervals like Anki. One problem is that I never computed or stored something similar to the state (new, learning, relearning, review) but I think I can recover that from the history file.

Now I'm wondering what would be the best way to convert some of these cards to FSRS. Simply replaying the history starting with the initial FSRS state seems wrong because the first (learning) intervals are different. Looking through some of the Anki source code, there seem to be two approaches, one computing FSRS parameters based on the current ST2 ease and interval and one that preprocesses the review log (revlog) in some way and works on that.

Is there a recommended way of doing this kind of conversion?

A similar issue is how an implementation of FSRS outside of Anki might be used to recompute the FSRS “state” of a card after changing the parameters. Again this would involve some kind of re-processing of the card's review log.

L-M-Sherlock commented 11 months ago

I don't know why your first intervals are different. For the first learning, the elapsed time should be zero.

l3kn commented 11 months ago

The hardcoded intervals my algorithm is using are 15 min, 1 day, 6 days.

From my understanding and assuming a card is rated as good two times, FSRS would schedule it at after 10 minutes (new -> learning), then with a interval that depends on the weights (learning -> review).

At this point, it would take the real elapsed interval into account when computing the next difficulty, stability and interval.

I think I see part of the mistake in my thinking now. It seems like the only difference would be the first interval (fixed at 15min in my algorithm, 5 or 10 or 15 minutes with FSRS unless the rating is easy). If the first rating (on a new card) is easy, my algorithm (and Anki's too, probably) skips directly to an interval that is at least a day long.

Would this mean that simply replaying the card's history on an initially empty FSRS state would be fine? I think switching existing cards to FSRS would be better in the long run, so I'm willing to accept slightly suboptimal spacing behavior for the first few reviews of those cards.

L-M-Sherlock commented 11 months ago

I think you misundertand the review history. For the first learning, FSRS only considers its rating.

l3kn commented 11 months ago

Maybe I'm still missing some aspect. I pointed out the difference in the first (hardcoded) intervals as a potential source of inaccuracy in the conversion. This inaccuracy is probably irrelevant and will diminish with further reviews of the same card.

Regarding my modified SM2 algorithm, I'll experiment with a few cards to see how the predicted intervals differ.

For the second issue of rescheduling FSRS cards when the parameters/weights change, would you consider that as part of the scope of this python library? I think it would be useful to have a reference implementation of this logic outside of Anki and with some guidance on how the process should work, I can try to implement it myself and open a pull request at some time.

L-M-Sherlock commented 11 months ago

I think it would be useful to have a reference implementation of this logic outside of Anki

I had developed an add-on for Anki to reschedule cards. Maybe it's helpful to you:

https://github.com/open-spaced-repetition/fsrs4anki-helper/blob/2.1.66/schedule/reschedule.py

l3kn commented 11 months ago

Thanks, I didn't notice there was a version of this file with more code in it.

l3kn commented 11 months ago

I've looked at the code you mentioned and I think I understand most of it.

From the short description in the review, I assume this code can also be used to reschedule cards when the FSRS weights are changed. In this case, I would assume that if I reschedule a card (for the sake of simplicity, one that was always scheduled with the most recent version of FSRS) without changing the weights (or making some change, then reverting it and rescheduling again), the state of the card (stability and difficulty) will stay the same.

If I understand the code correctly (in particular https://github.com/open-spaced-repetition/fsrs4anki-helper/blob/2.1.66/schedule/reschedule.py#L456), there might be rare scenarios where this would not be true.

One example would be a card that is rated “hard” once per day. With the current code of https://github.com/open-spaced-repetition/py-fsrs, such a card would never leave the learning phase and its difficulty and stability would stay constant after the first rating.

For the same card, the rescheduling logic (as I understand it) would detect an interval greater than one day and update the difficulty and stability based on that.

Am I misunderstanding the rescheduling code or is this the desired behavior?

L-M-Sherlock commented 11 months ago

Yeah. If a user pressed hard once per day for a new card, it would never leave the learning phase. It's unexpected. But I don't know how to fix it. Maybe we need to ignore the state, and only consider whether the review is done in the same day. The behavior of the helper add-on is intended.

L-M-Sherlock commented 11 months ago

Here is related code in Anki:

https://github.com/ankitects/anki/blob/77cb3220c5e83206725b1f3f2fbdb536dc180ec2/rslib/src/scheduler/answering/mod.rs#L363-L398

l3kn commented 11 months ago

Regarding the helper add-on, extracting more information (i.e. changing stability and difficulty) for reviews after a longer (>= 1 day) interval seems reasonable but it feels weird to me that rescheduling a cards that already uses FSRS might change its stability or difficulty, even if the result might be more accurate.

The best alternative I can think of, and the one I'll try to use for my SM2 cards, would be to use something similar to the python code for the initial conversion to FSRS, store the initial stability and difficulty this gives and when rescheduling later, re-run FSRS on the history starting from the time of conversion and with the previously computed stability and difficulty.

By “Maybe we need to ignore the state, and only consider whether the review is done in the same day.”, do you mean changing the FSRS algorithm so it handles these long intervals differently? That seems like a good alternative but I can't judge how often that would apply to the datasets FSRS is trained and evaluated on, or how it affects the accuracy.

L-M-Sherlock commented 11 months ago

Ideally, this lib should work as https://github.com/open-spaced-repetition/fsrs-rs

l3kn commented 11 months ago

Yeah. If a user pressed hard once per day for a new card, it would never leave the learning phase. It's unexpected. But I don't know how to fix it. Maybe we need to ignore the state, and only consider whether the review is done in the same day. The behavior of the helper add-on is intended.

My current understanding of the algorithm and conversion/training logic is that only long intervals (greater than one day) are considered relevant when it comes to updating the difficulty and stability of a card.

Where the algorithm and the conversion logic differ is when a card in the "learning" or "relearning" state is reviewed after an interval of more than one day (or within minutes but on different days, which might be separate issue). I'm not sure if you are willing to make such a large change to the algorithm but it seems reasonable to have it update the difficulty and stability not only for cards in the review state but for all reviews with an interval longer than a day (or some different threshold).

L-M-Sherlock commented 11 months ago

It takes time. I'm busy for FSRS-4.5 recently.