Closed user1823 closed 9 months ago
My guess is that the forgetting curve's shape is changed in FSRS-4.5. The new curve is sharper for R > 90% and flatter for R < 90%. In your case, the desired retention is 94%, which is higher than 90%, so FSRS-4.5 will schedule a shorter interval than before.
But, if the true retention was close to 0.94 earlier, the new curve should calculate a higher stability than before.
So, the effects should cancel each other and the intervals should be roughly the same.
The actual value of w[3] was 35 but I increased it to 45 manually.
What about tuning it to 60? And did you press easy
frequently? The w[16]
seems to reach its upper limit.
And did you press easy frequently?
Easy
in about 300 cards.The total number of reviews is greater than 100k. So, I don't think that this explains the situation.
What about tuning it to 60?
Changing the w[3] to 60 and w[16] to 7 decreased the total number of due cards by only about 10 cards.
I have a strong feeling that the new algorithm is less accurate for me.
As I said earlier, re-optimizing the parameters with the 23.10.1 also gave me a backlog, though not as big as 23.12.1 gave me. So, I decided to work through the backlog given by reoptimizing with 23.10.1. As I was working through the backlog, I noted that I was getting almost every card correct (with Review Sort Order set to "Relative Overdueness"). So, this meant that the new parameters were suboptimal.
After doing >200 reviews, I optimized again in 23.10.1 and my due count fell from 1000 to 100. However, reoptimizing with 23.12.1 is still giving me 2000+ due cards. So, I think that it is less accurate for me. As for the log loss and RMSE, they change so much between the optimizations that I don't think we should rely on them for my collection.
Metrics (all evaluations done on today's collection):
According my analysis of the benchmark results, FSRS-4.5 is more accurate than FSRSv4 for 81.7% collections.
from pathlib import Path
import json
import numpy as np
model = "FSRSv4"
metric = "RMSE(bins)"
m1 = []
result_dir = Path(f"./result/{model}")
result_files = list(result_dir.glob("*.json"))
result_files.sort(key=lambda x: int(x.stem), reverse=False)
for result_file in result_files:
with open(result_file, "r") as f:
result = json.load(f)
m1.append(result[model][metric])
print(np.mean(m1))
model = "FSRS-4.5"
metric = "RMSE(bins)"
m2 = []
result_dir = Path(f"./result/{model}")
result_files = list(result_dir.glob("*.json"))
result_files.sort(key=lambda x: int(x.stem), reverse=False)
for result_file in result_files:
with open(result_file, "r") as f:
result = json.load(f)
m2.append(result[model][metric])
print(np.mean(m2))
better = 0
for (x, y) in zip(m1, m2):
if y < x:
better += 1
print(better)
print(len(m1))
print(better / len(m1))
Then, maybe some other change like https://github.com/open-spaced-repetition/fsrs-rs/commit/a9cc36a207e8861a4b7a383b9d3fae4b9d74c2b8 is the cause.
In the comparison between FSRS-rs and FSRS v4, the percentage is 71.3%.
I think that you need to analyse things more deeply in order to find the issue. If you don't have enough time to do so now, it's fine. But, it would be great if you perform a proper analysis whenever you have the time.
For now, I am sticking to FSRS v4.
Do you try the python optimizer? Is it only related to the Rust optimizer?
Py Optimizer v4.19.2 | Py Optimizer v4.20.4 |
---|---|
w = 1.2187, 1.8588, 18.5804, 65.7669, 4.3881, 1.7984, 2.0913, 0.0, 1.7866, 0.1608, 1.2135, 1.4601, 0.1772, 0.6982, 0.0114, 0.0, 4.0 | w = 1.2109, 2.0110, 21.5144, 35.2091, 4.4859, 1.7484, 2.1434, 0.0, 1.8002, 0.1564, 1.1763, 1.3683, 0.1759, 0.7184, 0.0118, 0.0, 4.0 |
Log loss: 0.2110, RMSE(bins): 0.99%. | Log loss: 0.2110, RMSE(bins): 0.94% |
Loss after training: 0.2119, RMSE: 0.0123 | Loss after training: 0.2119, RMSE: 0.0115 |
Due = 1363 cards | Due = 1388 cards |
So, they are quite similar to each other, but in between the Rust Optimizer in 23.12.1 and 23.10.1
I am not happy with these results either. The reason: As I mentioned above, when I started completing the backlog given by the optimizer, I was getting almost all cards correct. So, I think that I should not have any backlog (or a very small backlog).
In the above table, the third row contains metrics calculated by Anki and the fourth row contains the metrics calculated by Py optimizer for the same weights.
By the way, I think that the problem with the Python optimizer is that it always produces w[7] = 0 for my collection. In contrast, the Rust optimizer gives a small but non-zero value (e.g. 0.0193, 0.0088, etc.). This means that with the parameters given by the Python optimizer, the difficulty of my cards can NEVER decrease (because I don't use Easy for review cards). So, if this is fixed, I guess that the Python optimizer would work fine for me.
As an experiment, in the parameters given by Py Optimizer v4.19.2, I replaced w[7] by 0.0193 and then rescheduled. By doing this, the due count decreased from 1363 cards to 559 cards.
Note: All the testing that I did with the Python optimizer was in Anki 23.10.1. So, there can be some inaccuracy in the number of due cards with parameters obtained using the Py v4.20.4 optimizer. But the inaccuracy won't be too large that I should reinstall 23.12.1 just to check the number of due cards.
I guess that I have found the issue.
In some cards, I rated Again
in a Filtered deck (with rescheduling) only a few days after I rated them Good
. In my opinion, the Good
rating was because of interference from other related cards that I reviewed on the same day or adjacent days. This seems to have confused FSRS even though such ratings were present in only 8 cards.
You can find such cards in the deck file shared in the first post of this issue by using the following search query in the Browser:
cid:1700673098704,1691514827314,1689521450583,1684167598943,1672238572726,1661483066910,1661483683057,1664457812204
I used the following in the Anki Debug Console to delete those unexpected Good
ratings:
mw.col.db.execute("DELETE from revlog where cid = 1700673098704 and id > 1701801000000 and id < 1702233000000")
mw.col.db.execute("DELETE from revlog where cid = 1691514827314 and id > 1697740200000 and id < 1699122600000")
mw.col.db.execute("DELETE from revlog where cid = 1689521450583 and id > 1701369000000 and id < 1701714600000")
mw.col.db.execute("DELETE from revlog where cid = 1684167598943 and id > 1690223400000 and id < 1690569000000")
mw.col.db.execute("DELETE from revlog where cid = 1672238572726 and id > 1690223400000 and id < 1691173800000")
mw.col.db.execute("DELETE from revlog where cid = 1661483066910 and id > 1692729000000 and id < 1693247400000")
mw.col.db.execute("DELETE from revlog where cid = 1661483683057 and id > 1688149800000 and id < 1688495400000")
mw.col.db.execute("DELETE from revlog where cid = 1664457812204 and id > 1689359400000 and id < 1689791400000")
Then, on re-optimizing and rescheduling, Anki 23.12.1 gave me 597 due cards which is much better than the previous 2300+.
Thanks for the report. Sounds like the optimizer is sensitive to these reviews? Did you forget them completely? Maybe the better solution is just burying them instead of pressing again.
Did you forget them completely? Maybe the better solution is just burying them instead of pressing again.
Yes, I forgot them. Also, they were not due. I reviewed them in a filtered deck, just to tell Anki that I have forgotten them.
By the way, according my recent analysis, it's not a good idea to re-optimizer frequently.
Based on the experiments, if we re-optimize every 2000 reviews, the new parameters are better than the old one in 71% cases. But if we do that every 1000 reviews, the percentage drops down to 63%.
By the way, I have performed the above-mentioned deletion of revlogs in my main profile as well as my test profile.
Optimization on my main profile still gives me 2000+ cards. Parameters: 1.2517, 3.9538, 22.1976, 35.3524, 4.9177, 1.4497, 1.5342, 0.0114, 1.8261, 0.1589, 1.1108, 1.6823, 0.1184, 0.6283, 0.4755, 0.0207, 4.0000
Optimization on my test profile (which contains yesterday's collection + above-mentioned change) gives me 587 cards when applied to my main profile. Parameters: 1.2502, 3.9521, 22.2139, 35.3634, 5.1795, 1.3051, 1.3681, 0.0108, 1.7835, 0.1059, 1.0901, 2.1143, 0.1215, 0.5364, 0.3711, 0.0060, 4.0000
So, it is consistent with your observation that it is not a good idea to re-optimize frequently. But, it also seems to be a serious issue.
If we re-optimize for every 4000 reviews, the new parameters are better in 85% cases.
If we re-optimize for every 4000 reviews, the new parameters are better in 85% cases.
If frequent optimization can lead to adverse effect on user's parameter, shouldn't FSRS then prevent user for doing this, similar to how there is a 1000 reviews limit for the initial optimization?
That's an anki decision. Not an issue with the algorithm
I updated to Anki 23.12.1 and reoptimized my FSRS parameters. It gave me the following parameters:
(The actual value of w[3] was 35 but I increased it to 45 manually. This decreased the log loss and RMSE. Anyway, it doesn't explain the following observations.)
Then, I rescheduled all my cards and got a backlog of 2300+ cards!
Future Due Graph:
Then, I decided to download Anki 23.10.1 and see what parameters it produces. It gave me the following parameters:
Then, I rescheduled all my cards and only about 830 cards were due (out of which 400 were already due, even before I started doing any of the above).
Future Due Graph:
So, FSRS v4.5 decreased the log loss and RMSE. However, it gave me a huge backlog (FSRS has never given me such a huge backlog till now; even when I switched to FSRS from SM-2, it gave me a backlog of about 900 cards.)
Also, my true retention has not been very different from my desired retention (0.94).
So, is it possible that there is some issue with the new algorithm?
If you need my deck, here it is: Test.zip (change file extension to .apkg)