[Question] - Githubissues

topherbuckley commented 7 months ago

[x] I have checked the FAQ and could not find an answer to my question
[x] I have read the wiki and still felt confused
[x] I have searched for similar existing questions here

Question Hello again,

Trying this out again after seeing a fair bit of progress on this and anki itself. Thanks for your hard work!

I had posted an issue in the past here, that may or may not have been fully resolved since, but just wanted to reference it as I had sent you my anki collection for testing in case you wanted to test against it again. (Though I'm happy to send again).

As I was afraid of a giant back log like before, I opted to use the default OFF setting for the "Reschedule cards on change". I started answering a few cards, and was watching the card info as I answered to see if the updated intervals were making sense. For the first few, it appeared normal, with somewhat longer intervals that previously (which I expected), but then I ran into one car where answering Good resulted in the interval decreasing. Is this expected? Here is the history for that card with the last answer being the only one performed after enabling FSRS.

Here are the deck options used after enabling FSRS:

Anki specs below: Version ⁨24.04.2 (82caffec)⁩ Python 3.9.18 Qt 6.6.2 PyQt 6.6.1

topherbuckley commented 7 months ago

Had a few more examples in the meantime, and this one seemed particularly extreme:

L-M-Sherlock commented 7 months ago

It's possible when you switch from SM-2 to FSRS, or increase your desired retention.

topherbuckley commented 7 months ago

Ok, thank you for the quick response. I assume after the initial decrease, it should continue to increase with Good responses though right? I'll keep using it and report back here if I see any decreases after a 2nd response using this.

topherbuckley commented 7 months ago

So after a few days I'm getting the general sense that any intervals that were fairly long (e.g. > 3months) are reduced after turning on FSRS. What is the logic there? Why do these cards that were obviously easy for me to remember in the past, given such low initial stability and retrievability? e.g. the card below had an interval of 2.9 months for Good.

topherbuckley commented 6 months ago

I'm going to go ahead and reopen this to request some attention to this as I'm continuing to see the same trend, which is going to lead me to have a huge number of cards due within the next few months rather than spread through years. I was under the assumption using this would lead to less total review time, but unless the setting of the initial stability and retrievability can be better explained, this is still a no go for me.

L-M-Sherlock commented 6 months ago

Why do these cards that were obviously easy for me to remember in the past, given such low initial stability and retrievability?

What's your parameters for this deck? And what's your previous true retention and current desired retention?

Besides, if your decks have various difficulty (cramming vocabularies vs. immersed learning), I recommend using different presets and parameters.

topherbuckley commented 6 months ago

What's your parameters for this deck? And what's your previous true retention and current desired retention?

The current parameters and the ones used prior to enabling FSRS are shown in my initial comment to this issue. Where would I find the stats to answer the question about "true retention"? I left the desired retention at the default of 0.9.

Besides, if your decks have various difficulty (cramming vocabularies vs. immersed learning), I recommend using different presets and parameters.

I do this already, but this issue is focusing on my giant immersed learning collection. Here is the stats for that deck

anki-stats-2024-05-07@14-55-38.pdf

If you see anything in there that strikes you as odd/unexpected, please let me know know and I can try to help identify the cause.

This is a collection of subdecks, where I used to have separate deck options for each, but gave up trying to manage them all independently after finding anki had really messed up a lot of my intervals and ease values due to a bug (long story). So I defaulted everything back to one deck option before moving to FSRS in hopes that it would avoid such hassles hereafter. Maybe this was a mistake and I should go back and try to retrain on each subdeck? The content of each sub deck is not significantly different in difficulty just sources (single books, textbooks, etc.). I don't really cram for anything, and keep a pretty consistent pace for reviews, and only add in new cards when my review count is well below what I can handle in a single day. Of course some content within each individual subdeck is much easier than others, but I think that is expected right?

topherbuckley commented 6 months ago

Also just curious, is there any documentation that better discusses how these initial stability and retrievability values are calculated based on past non-FSRS review history? I'm only seeing the calculations assuming we are starting from 0 here unless I'm misunderstanding something.

L-M-Sherlock commented 6 months ago

0.8377, 1.8393, 5.9837, 96.4561, 5.8618, 1.1533, 1.0899, 0.0038, 1.9983, 0.5099, 1.2695, 2.5595, 0.0605, 0.4141, 1.0193, 0.0073, 3.1996

I find that your initial stabilities are very different. The initial stability (good) is 5.9837 and the initial stability (again) is 0.8377. That explains why those cards' intervals are reduced. Because their first ratings are again.

Here is an visualizer: https://open-spaced-repetition.github.io/anki_fsrs_visualizer/

topherbuckley commented 6 months ago

Thank you for your reply and the link to the visualizer. After playing around with that a bit, I think I may understand the situation better than before. Can you please confirm my understanding? So to calculate the intervals after switching from the old scheduler to FSRS you take the answer history of values [1-4], and plug that into S'_r calculation and model what the interval would be had I given all the same answers under the FSRS scheduler. Did I get that right? Does this also imply that this does not account for the previous scheduler's time intervals and how those relate to the answers? For example, answering 3 after an interval difference from 6.23 months to 1.98 years should indicate I have a very firm grasp on the card, but in the FSRS calculation, this will not look very significant as it would only be modeled as me answering 3 between a 1 month and 2 month (very rough guess) interval. Is that more or less correct? If that is the case, I don't see any issue with the model itself when used from 0 answers, but this calculation for the initial intervals after switching to FSRS I would think needs to somehow account for the significance between answers and previously scheduled intervals. Or maybe this is somehow accounted for in the value of t within the calculation for R?

The initial stability (good) is 5.9837 and the initial stability (again) is 0.8377.

Comparing this to your default values of 0.4 and 2.4, It looks like they are just scaled by a factor of 2, but I'm still not quite grasping the implications of this as I don't see a way to compare two plots from different parameter values. Without understand all the details of the paper, is there any generalization you can make as to why the training would result in such values compared to the default ones?

With regards to the visualizer, the y-axis is stability right? What are the percentages above each point? What about the values you see when hovering over each point? Understanding these may give me a bit more insight into what you are trying to tell me :sweat_smile:

Because their first ratings are again.

[Edit] Looking at this again, I think what I wrote below and this quote are beside the point now. I'm not really confused as to how the algorithm calculates intervals, but more confused/concerned as to how these intervals are calculated in spite of previous interval history.

Isn't this normal expected behavior though? Meaning, when I see a brand new card, I need to see it a few times before I remember it at all. At least that is the meaning of the initial "again" answers for me. Thereafter, it appears I had a pretty good memory of it, so although I'm superficially understanding why this is happening from the math you implement, but it seems completely counterintuitive to me to set the initial intervals after the switch to FSRS as they are now. If anything I'd expect the interval to get even longer that it was under the previous scheduler, not dramatically shorter (for the card I posted the history for).

I guess a more simple and frank question would be regardless if this aligns with the math or not, is this the expected/desired outcome? Thanks again for your time and thoughts.

L-M-Sherlock commented 6 months ago

So to calculate the intervals after switching from the old scheduler to FSRS you take the answer history of values [1-4], and plug that into S'_r calculation and model what the interval would be had I given all the same answers under the FSRS scheduler. Did I get that right? Does this also imply that this does not account for the previous scheduler's time intervals and how those relate to the answers?

FSRS also considered the previous scheduler's time intervals.

For example, answering 3 after an interval difference from 6.23 months to 1.98 years should indicate I have a very firm grasp on the card, but in the FSRS calculation, this will not look very significant as it would only be modeled as me answering 3 between a 1 month and 2 month (very rough guess) interval. Is that more or less correct?

FSRS will output a longer interval if your previous interval is 1.98 years than 6.23 months. But the new interval is not linearly related to previous interval. The previous intervals affect the stability via the R component of DSR model.

is there any generalization you can make as to why the training would result in such values compared to the default ones?

The initial stabilities are optimized from your first forgetting curves for the four first ratings. Assuming your previous intervals of new cards for again and good are 1 day, and the true retention of them are 0.8 and 0.9, the fitted initial stabilities will be 0.417 and 1 days.

What are the percentages above each point?

The percentage is difficulty.

is this the expected/desired outcome

It's expected. But I wonder the evaluation results of your parameters. If it's too high, I recommend using different parameters for your decks.

topherbuckley commented 6 months ago

FSRS will output a longer interval if your previous interval is 1.98 years than 6.23 months.

Maybe I'm misunderstanding what you mean, but FSRS output an initial interval of 2.9 months for Good after the 1.98 year interval (i.e. < 6.23 months).

The initial stabilities are optimized from your first forgetting curves for the four first ratings. I think I'd have to read the paper to understand what exactly is optimized, but you mentioning that it only takes the first four ratings into account is definitely of interest to me. Since most cards I review are unknown at the start, I often pres Again within these first four ratings, but my actual retention starts sometime thereafter. So it would seem its more important to consider the reviews after this period compared to the initial 4 no?

I would like to return the parameters to their defaults to see what the initial interval after the 1.98 year interval would be according to FSRS to confirm what you are saying will reduce the interval or not. As I already answered that card with Easy (4), do you know of a way to delete this initial FSRS based review from that card so I can actually compare them? Or do you know of a way to easily compare the initial Good interval of a given card under a given set of parameters and answer history? I'm not seeing the actual intervals in the visualizer, only the Difficulty and Stability, right?

L-M-Sherlock commented 6 months ago

Maybe I'm misunderstanding what you mean, but FSRS output an initial interval of 2.9 months for Good after the 1.98 year interval (i.e. < 6.23 months).

Initial interval is the interval for new cards. It's impossible that a new cards has been reviewed 1.98 years ago.

As I already answered that card with Easy (4), do you know of a way to delete this initial FSRS based review from that card so I can actually compare them?

This add-on could delete the review history: https://ankiweb.net/shared/info/1398071003

topherbuckley commented 6 months ago

Initial interval is the interval for new cards. It's impossible that a new cards has been reviewed 1.98 years ago.

I see where we are having a misunderstanding. When I said "initial interval of 2.9 months for Good" I meant the first interval I saw generated from FSRS while reviewing (i.e. the interval I saw while reviewing after first switching to FSRS, but with my existing review history). I did not mean the initial interval used by FSRS. Sorry, that was confusing. So in the screenshot I had 14 reviews in the history, so if the first interval when the card is new is referred to as I_0, I was referring to I_14 as this was the first interval generated by FSRS for that card.

This add-on could delete the review history: https://ankiweb.net/shared/info/1398071003

Thats not quite what I was looking for, I was looking for a way to just delete the last review. Looks like I can do this manually with an sqlite editor like sqlite3, so I'll play around with that and come back here after I check the difference in I_14 with my, and your default parameters to confirm whether or not the parameters are the cause of this very short I_14, or if the algorithm in general would lead to such a short I_14. The result will help me narrow down my issue better.

L-M-Sherlock commented 6 months ago

OK. In the view of FSRS, it only considers the first review per day for the same card. So this card's r_history is 1,1,3,3,3,2,3,3, and its t_history is 0,3,1,1,9,25,50,186.

If we ignore the t_history, the stability will be ~10 days:

https://huggingface.co/spaces/open-spaced-repetition/fsrs4anki_previewer

L-M-Sherlock commented 6 months ago

I implement a new feature to calculate the memory state history from your real r_history and t_history:

topherbuckley commented 6 months ago

Thank you. Nice to have that previewer. And thank you for taking the time to add the memory state history feature, though I'm not quite sure I understand where this delta_ts is being used or how it affects the output. Shouldn't we see the same values of delta_ts somewhere in the interval history calculations on the right?

In the view of FSRS, it only considers the first review per day for the same card.

Good to know.

So this card's r_history is 1,1,3,3,3,2,3,3, and its t_history is 0,3,1,1,9,25,50,186.

So this does not include the most recent interval (1.98 years). i.e. the t_history between 2022-05-22 and now (the time I am reviewing now)? This seems important to me, as otherwise it is ignoring what I'd consider the most significant interval.

To put it another way, if I press Good after the existing 1.98 year interval, I expect the next interval to be longer than 1.98 years, not longer than 186 days. Even worse, and maybe I'm misunderstanding the output in the "test sequences", it looks like the calculated interval in place of 1.98 years is 10 days? I'm still not seeing the logic as to why this is expected behavior.

L-M-Sherlock commented 6 months ago

So this does not include the most recent interval (1.98 years). i.e. the t_history between 2022-05-22 and now (the time I am reviewing now)? This seems important to me, as otherwise it is ignoring what I'd consider the most significant interval.

Oops, I forget to append it into the delta_ts sequence. Here is the updated result:

The last stability is 92.46. If your desired retention is 0.9, the interval will be ~92 days.

topherbuckley commented 6 months ago

Ok, so rerunning that with the default weights given here

I see a stability of around 295.8, which I suppose it less shocking than 92. I also played around with the retention value and it did not change this final stability at all. Are you sure it is being used in the back-end?

Also other than resetting all parameters to the default ones, is there any way to set them manually in any intelligent way? (i.e do they have any non latent meanings for each?). As I mentioned before, I don't think the intervals are so far off what I'd expect for shorter intervals, so I'd think tuning one of the parameters that influences the slope/gradient of one of the exponentials directly would be suspect here (w_10 and w_14). Is that logical?

Also I noticed the default weights in the visualizer: 0.5614, 1.2546, 3.5878, 7.9731, 5.1043, 1.1303, 0.823, 0.0465, 1.629, 0.135, 1.0045, 2.132, 0.0839, 0.3204, 1.3547, 0.219, 2.7849

are different than those listed in here 0.4, 0.6, 2.4, 5.8, 4.93, 0.94, 0.86, 0.01, 1.49, 0.14, 0.94, 2.18, 0.05, 0.34, 1.26, 0.29, 2.61

Is one set recommended over the other at this point?

Also, I just wanted to also post the results of the Evaluate button in the FSRS Anki options for this deck:

Log loss: 0.2537, RMSE(bins): 3.42%.

Are either of these surprisingly high? I guess I'd expect them to be if the interval proposed by FSRS is 92 days and the one prior was >1.98 years.

The initial stabilities are optimized from your first forgetting curves for the four first ratings. Finally I wanted to revisit this point. Do you have the code/math for this handy somewhere? Are you just doing gradient decent on the sum of the errors of the FSRS and previously assigned intervals? I'd like to take a look now that I'm better understanding the situation.

L-M-Sherlock commented 6 months ago

I see a stability of around 295.8, which I suppose it less shocking than 92. I also played around with the retention value and it did not change this final stability at all. Are you sure it is being used in the back-end?

The 295.8 is stability instead of interval, so the retention value doesn't affect it.

Also other than resetting all parameters to the default ones, is there any way to set them manually in any intelligent way? (i.e do they have any non latent meanings for each?). As I mentioned before, I don't think the intervals are so far off what I'd expect for shorter intervals, so I'd think tuning one of the parameters that influences the slope/gradient of one of the exponentials directly would be suspect here (w_10 and w_14). Is that logical?

It's not recommend to set the parameters manually.

Are either of these surprisingly high?

They are really small. The average log loss is 0.33 and the average RMSE(bins) is 0.053.

Finally I wanted to revisit this point. Do you have the code/math for this handy somewhere? Are you just doing gradient decent on the sum of the errors of the FSRS and previously assigned intervals? I'd like to take a look now that I'm better understanding the situation.

Here is the code to optimize the parameters related with initial stability:

https://github.com/open-spaced-repetition/fsrs-optimizer/blob/b1502092c213f2c652499dec40a8cc2c339d37c4/src/fsrs_optimizer/fsrs_optimizer.py#L813-L874

L-M-Sherlock commented 6 months ago

Any other questions?

open-spaced-repetition / fsrs4anki

[Question] #643