Anki FSRS Visualizer not matching up with Py-fsrs

SnowdayAurelion commented 4 months ago

Hello,

I'm trying to use the Anki FSRS Visualizer to fine-tune the perfect algorithm for my Python code that uses Python FSRS, but I'm having a problem. The results that show in the Anki FSRS Visualizer do not match the results from my Python code. I'm particularly looking at the interval data.

Here are my preferences:

0.4000, 0.5000, 1.0000, 2.0000, 1.0000, 0.9400, 0.8600, 0.0100, 1.3900, 0.1400, 0.9400, 2.1800, 0.0500, 0.3400, 1.2600, 0.2500, 1.5200

request_retention and maximum_interval remain unchanged.

For example, when I have difficulty Easy, it works. The interval is 2 days as expected. Note: the difficulty value in my graph does not represent the same difficulty as the values in the fsrs. For me 1 means Easy and 3 means Hard.

But it does not work suddenly for difficulty Good (which creates an interval of 1 day)

I don't think there's an error with my code because the fact that the Easy difficulty worked but not the Good shows that there's an inconsistency elsewhere.

To rule out the error of it being an estimation error, where maybe it could be adding 0.9 days or something, I also tried parameters that made the Good difficulty beyond 1 day and it still did not work.

Last example is when I try two Good sessions in a row, I should get 5 days but I get an interval of 1 day.

If I were to add another Good session it would then be 5 days which is very confusing because now that's the result I want but I had to place an extra study session

By the way, another variable that could be causing this error is how the date studied of my study session does not line up with the recommended due date. But I made sure that the date of the study session does line up.

So I have one Good session, and the algorithm recommends studying in 1 day. So I placed the second Good session one day after, so I expect the algorithm to ask to study again in 5 days. But it doesn't, it says 1 day.

Another problem that I cannot verify is that maybe the algorithm for Anki Visualizer and FSRS Python are not the same. I think that's the most possible problem. Another problem is that maybe making the initial difficulty (Good) to the minimum value 1 broke something, but probably not since it works perfectly in the Visualizer.

Any ideas or help would be really appreciated. It's frustrating to see the perfect spacing algorithm in front of me but cannot apply it in my code. I really like the FSRS algorithm, and I really want to use it!

By the way, here is my Python code if it needs to be checked. At line 86-91 is where I use a loop to create the card. That has the largest possibility for error, although it seems to been working fine before I tried to fine-tuned the algorithm. So maybe no error there.

import os
import fsrs
import datetime as dt
import pandas as pd
import plotly.express as px
from StudySessions import Study_Session

class Spacing_Table:
    def __init__(self, vault_path):
        self.path = vault_path
        self.spacing_table = self.load_spacing_table()

        self.Study_Sessions = Study_Session(vault_path)
        self.schedule()

    def load_spacing_table(self):
        # where Spacing Table.md is held
        table_path = os.path.join(self.path, r"CBT Plan\iCanStudy\FSRS Spacing Table.md")

        with open(table_path, "r", encoding='utf-8') as file:
            data = file.readlines()

            # Collect Rows
            raw_rows = []
            # filtered rows
            rows = []

            # each line will be read as a row
            for line in data[3:]:
                raw_rows.append(line.split("|")[1:-1])

            # remove rows without an ID
            for row in raw_rows:
                if len(row)>1: # if not extra row from outside of table
                    if row[0].strip():
                        rows.append(row)

            # Collect Columns
            columns = data[1].split("|")[1:-1]
            columns = [word.strip() for word in columns] #remove extra spaces

            # Make Pandas DataFrame
            spacing_table = pd.DataFrame(rows, columns=columns)

            # spacing_table.to_excel(r"C:\Users\light\Downloads\FRSR.xlsx")

        return spacing_table

    def dates_and_difficulties(self, topic):
        # collect data
        index = self.find_study_session_index(topic, "Topic")
        encoded_study_session = self.spacing_table["Encoded"][index].replace("❌","")
        topic = self.spacing_table["Topic"][index].strip()
        difficulty = self.Study_Sessions.metadata(encoded_study_session, "difficulty")
        date = self.Study_Sessions.metadata(encoded_study_session, "date")
        subject = self.Study_Sessions.metadata(encoded_study_session, "subject")

        # Create dates based on difficultly and date
        f = fsrs.FSRS() # algorithm
        card = fsrs.Card() # create card

        def difficulty_to_rating(difficulty):
            if difficulty == 1:
                return fsrs.Rating.Easy
            if difficulty == 2:
                return fsrs.Rating.Good
            if difficulty == 3:
                return fsrs.Rating.Hard

        due_dates = []
        difficulties = []
        days_last_studied = []
        lapses = []

        session_count = 0
        for study_session in self.spacing_table.iloc[index][1:]: # For every study session for a particular topic
            if "[" not in study_session: # ignore empty cells
                break

            study_session = study_session.replace("❌","") # remove ❌ to prevent errors
            # print(study_session)
            date = self.Study_Sessions.metadata(study_session, "date")
            difficulty = self.Study_Sessions.metadata(study_session, "difficulty")

            card_schedules = f.repeat(card, date) # possible schedules
            card = card_schedules[difficulty_to_rating(difficulty)].card # iterations of next due date
            due_dates.append(card.due) # save due dates of topic
            difficulties.append(difficulty) # save difficulty
            days_last_studied.append((card.due - card.last_review).days) # days from last review until due date
            lapses.append(session_count)    # session number
            session_count += 1

        def filter_empty(element):
            if "[" in element:
                return True
            else:
                False

        filtered_row = list(filter(filter_empty, self.spacing_table.iloc[index])) # keep only study sessions

        # if latest study session has "❌", then it's not complete. Return second most recent date
        if "❌" in filtered_row[-1]: # check for latest study sesh
            if len(filtered_row) == 1: # if encoding is not complete
                return None, None, None
            return due_dates[-2], difficulties[-2], subject, days_last_studied[-2], lapses[-2]
        else:
            return due_dates[-1], difficulties[-1], subject, days_last_studied[-1], lapses[-1] # return latest date and difficulty

    def schedule(self):
        data = {}

        for i in range(len(self.spacing_table)): # for every row
            row = self.spacing_table.iloc[i]
            row = list(row)

            # Get topic
            topic = row[0]

            # Get date
            # Get difficulty
            date, difficulty, subject, days, lapse = self.dates_and_difficulties(topic)

            if date: # only append if encoding is complete
                data[topic] = [date,difficulty,subject, days, lapse] # get latest date or difficulty

        # graph data
        self.graph(data)

    def graph(self,data):
        dataframe = []

        # convert data into proper dataframe
        for topic, date_difficulty in data.items():
            row = {
                "Topic": topic,
                "Start": date_difficulty[0],
                "Finish": date_difficulty[0] + dt.timedelta(days=1),
                "Difficultly": date_difficulty[1],
                "Subject": date_difficulty[2],
                "Days since studied": date_difficulty[3],
                "Lapse": date_difficulty[4]
            }

            dataframe.append(row)

        data = pd.DataFrame(dataframe)
        colours = self.get_subject_colours("Subject Colours.txt")
        fig = px.timeline(data, x_start="Start", x_end="Finish", y="Topic", title='Retrieval Sessions', color = "Subject", color_discrete_map=colours, hover_data=
                          {"Topic":True,
                              "Start":True,
                              "Difficultly":True,
                              "Days since studied":True,
                              "Lapse":True,
                              "Finish":False})
        fig.update_xaxes(range=[dt.date(2024,5,8),dt.date(2024,8,12)])
        fig.show()

    def get_subject_colours(self, file_path):
        colours = {}
        with open(file_path, "r") as file:
            data = file.readlines()

            for line in data:
                subject = line.split(",")[0]
                colour = line.split(",")[1].strip()

                colours[subject] = colour

        return colours

    def find_study_session_index(self, topic, column = "ID"):
        """
        find the index for the row of the study session
        """        
        for index, row in self.spacing_table.iterrows():
            if row[column].lower().strip() == topic.lower().strip():

                return index

def main():
    spacing_table = Spacing_Table(r"C:\Users\light\Documents\Alex's Vault")
    spacing_table

main()

Thank you for your time or advice, Aurelion

ishiko732 commented 4 months ago

In py-fsrs/go-fsrs/ts-fsrs, when determining the card as State.New, the ratings Rating.Again, Rating.Hard, and Rating.Good are calculated based on the current time(now) plus a certain interval, which is usually less than 1 day. https://github.com/open-spaced-repetition/py-fsrs/blob/5d36ec32bcac350b489b78dd6c20b26a38582d09/src/fsrs/fsrs.py#L31-L39

SnowdayAurelion commented 4 months ago

Hello, thank you for the share. It 100% seems to be in the right direction, but I'm still facing issues.

I tried to code the due date for the Good and due date for the Hard in the same way the due date for Easy was calculated

if card.state == State.New:
    self.init_ds(s)
    s.again.due = now + timedelta(minutes=1)

    hard_interval = self.next_interval(s.hard.stability) # new lines
    s.hard.due = now + timedelta(days=hard_interval)

    good_interval = self.next_interval(s.good.stability) # new lines
    s.good.due = now + timedelta(days=good_interval)

    easy_interval = self.next_interval(s.easy.stability)
    s.easy.scheduled_days = easy_interval
    s.easy.due = now + timedelta(days=easy_interval)

but it did not work, and then I tried to hard-code it

if card.state == State.New:
    self.init_ds(s)
    s.again.due = now + timedelta(minutes=1)

    # hard_interval = self.next_interval(s.hard.stability) # new lines
    s.hard.due = now + timedelta(days=0)

    # good_interval = self.next_interval(s.good.stability) # new lines
    s.good.due = now + timedelta(days=1)

    easy_interval = self.next_interval(s.easy.stability)
    s.easy.scheduled_days = easy_interval
    s.easy.due = now + timedelta(days=easy_interval)

which fixed the problem for New cards, but did not fix the problem for new iterations afterwards. I checked my code and it does cycle between the states Learning and Review, so the iterations should work.

For example, here is the printed information on iterations that follow that recommend the next due date. So I made the next review date of the study session be the same as the algorithm's recommended due date

Session: 0
State: 0
Card's Difficulty Metric: 0
My Difficulty Metric: 2
Stability: 0
Date studied: 2024-05-19
Due date: 2024-05-20
---

Session: 1
State: 1
Card's Difficulty Metric: 1.0
My Difficulty Metric: 2
Stability: 1.0
Date studied: 2024-05-20
Due date: 2024-05-21
---

Session: 2
State: 2
Card's Difficulty Metric: 1.0
My Difficulty Metric: 2
Stability: 1.0
Date studied: 2024-05-21
Due date: 2024-05-26
---

Session: 3
State: 2
Card's Difficulty Metric: 1.0
My Difficulty Metric: 2
Stability: 4.957026011186596
Date studied: 2024-05-26
Due date: 2024-06-16
---

So Session: 0 is good because it's a 1-day interval for Good (I mean I hard-coded it), but then Session: 1 is wrong because it says a 1-day interval which should be 5. But then all of a sudden the code starts working afterwards??

Session: 2 is the 5-day interval, and then Session: 3 is a 21-day interval. This perfectly follows the algorithm shown on the Anki FSRS Visualizer? So why is the algorithm one session off??

Either way, this is progress, so I'm hyped to hear any more ideas or solutions!

ishiko732 commented 4 months ago

If the ratings are [Rating.Good, Rating.Good, Rating.Good, Rating.Good], then the due times should be [0, 1, 5, 21] respectively. However, the anki_fsrs_visualizer does not seem to be considering the special handling for the State.New case correctly, as it is producing a different result of [1, 5, 21, 74](The correct due time should be 75, which is what I get from the results of ts-fsrs and py-fsrs).

xiety commented 4 months ago

The difference in 74.08 and 75.04 was due to rounding of the interval in days, and recalculation of retention to 0.89899

ishiko732 commented 4 months ago

The difference in 74.08 and 75.04 was due to rounding of the interval in days, and recalculation of retention to 0.89899

Okay, I've confirmed the data is consistent now.

xiety commented 4 months ago

I've added option to use ts-fsrs lib.

I would switch to ts-fsrs completely, but It handles the first review of cards differently from my implementation of The Algorithm article.

open-spaced-repetition / anki_fsrs_visualizer

Anki FSRS Visualizer not matching up with Py-fsrs #3