open-spaced-repetition / fsrs-rs

FSRS for Rust, including Optimizer and Scheduler
https://crates.io/crates/fsrs
BSD 3-Clause "New" or "Revised" License
185 stars 19 forks source link

Strange training assert #233

Closed sineptic closed 1 month ago

sineptic commented 1 month ago

When I want to compute params why it is necessary to have item WITH EXACTLY ONE long term review and if it doesn't exist(or all items contain more) you library return NotEnoughData???

to reproduce this you must have more than 7 items, because otherwise DEFAULT_PARAMETERS returned.

items partitioned at the dataset.rs:236

L-M-Sherlock commented 1 month ago

Please see: https://github.com/open-spaced-repetition/fsrs-rs/pull/172

sineptic commented 1 month ago

but your commit not change it i think

L-M-Sherlock commented 1 month ago

why it is necessary to have item WITH EXACTLY ONE long term review and if it doesn't exist(or all items contain more) you library return NotEnoughData

Because only long-term reviews' results are treated as the labels during optimization.

sineptic commented 1 month ago

but why == 1 , not >= 1?

L-M-Sherlock commented 1 month ago

but why == 1 , not >= 1?

Because the pretrainset only need items whose long term reviews ==1.

sineptic commented 1 month ago

Maybe I don't understand smth. What is pretrainset? Or what I must do to compute parameters?

L-M-Sherlock commented 1 month ago

FSRS-5 has 19 parameters. The first four parameters are the initial stability, which is optimized via fitting forgetting curves in the pretrainset, where the data only have one long-term reviews.

sineptic commented 1 month ago

Ok, but isnt this car amount small? why not use just first result of all?

L-M-Sherlock commented 1 month ago

The error is raised here:

https://github.com/open-spaced-repetition/fsrs-rs/blob/189501e3e6133e2d57ced346b4aa5ea6685edea2/src/pre_training.rs#L203-L205

So I change my previous reasoning. Your pretrainset doesn't have data with one long-term review.

sineptic commented 1 month ago

https://github.com/open-spaced-repetition/fsrs-rs/blob/189501e3e6133e2d57ced346b4aa5ea6685edea2/src/dataset.rs#L234-L236

this split items, and do it in strange way. Why you can't use other cards? Why it's necessary to have exactly 1 review?

L-M-Sherlock commented 1 month ago

Why it's necessary to have exactly 1 review?

If the item doesn't have any long-term review, it should be removed, but I didn't remove it in the lib because I assume the user will filter out these data. FSRS is a long-term memory predictor, and every item should have one long-term review at least.

sineptic commented 1 month ago

At least!!! but you code check exactly 1.

use fsrs::{FSRSItem, FSRSReview, DEFAULT_PARAMETERS, FSRS};

fn main() {
    let fsrs = FSRS::new(Some(&DEFAULT_PARAMETERS)).unwrap();

    let item = FSRSItem {
        reviews: vec![
            FSRSReview {
                rating: 1,
                delta_t: 0,
            },
            FSRSReview {
                rating: 1,
                delta_t: 1,
            },
            FSRSReview {
                rating: 1,
                delta_t: 1,
            },
        ],
    };
    let items = vec![item; 140];

    let _best_params = fsrs.compute_parameters(items, None).unwrap();
}

This return error

L-M-Sherlock commented 1 month ago

This returns error because the pretrainset is still empty. All items have two long-term reviews and none of them include one long-term review.

I recommend reading this function, which build the dataset from Anki's collection:

https://github.com/open-spaced-repetition/fsrs-rs/blob/189501e3e6133e2d57ced346b4aa5ea6685edea2/src/convertor_tests.rs#L90-L127

https://github.com/open-spaced-repetition/fsrs-rs/blob/189501e3e6133e2d57ced346b4aa5ea6685edea2/src/convertor_tests.rs#L154-L181

The test file: revlog.csv

sineptic commented 1 month ago

I dont understand how it's related. Also tests require collection.anki21 file.

sineptic commented 1 month ago

I read whole description of algorithm, but don't found pretrainset. I currently can't understand why we can't make item with 1 long-term repetition by removing second.

L-M-Sherlock commented 1 month ago

Also tests require collection.anki21 file.

You can remove anki21_sample_file_converted_to_fsrs() from the list to only run the test in csv.

https://github.com/open-spaced-repetition/fsrs-rs/blob/189501e3e6133e2d57ced346b4aa5ea6685edea2/src/training.rs#L479

I read whole description of algorithm, but don't found pretrainset.

Algorithm only include the information about how the algorithm works. It doesn't explain how to optimize it.

I currently can't understand why we can't make item with 1 long-term repetition by removing second.

What do you mean by "removing second"?

sineptic commented 1 month ago

Algorithm only include the information about how the algorithm works. It doesn't explain how to optimize it.

So I don't understand what is pretrainset.

What do you mean by "removing second"?


vec![
FSRSReview {
rating: 1,
delta_t: 0,
},
FSRSReview {
rating: 1,
delta_t: 1,
},
FSRSReview {
rating: 1,
delta_t: 1,
},
]

// Treat as

vec![ FSRSReview { rating: 1, delta_t: 0, }, FSRSReview { // First long-term repetition rating: 1, delta_t: 1, }, / tail... / ]

L-M-Sherlock commented 1 month ago

So I don't understand what is pretrainset.

The pretrainset is used to optimized the first four parameters of FSRS, which are related to the four initial stabilities corresponding to four ratings of the first review.

L-M-Sherlock commented 1 month ago
vec![
    FSRSReview {
        rating: 1,
        delta_t: 0,
    },
    FSRSReview {
        rating: 1,
        delta_t: 1,
    },
    FSRSReview {
        rating: 1,
        delta_t: 1,
    },
]

// Treat as

vec![
    FSRSReview {
        rating: 1,
        delta_t: 0,
    },
    FSRSReview { // First long-term repetition
        rating: 1,
        delta_t: 1,
    },
    /*
        tail...
    */
]

The function treats every vec as one sample of the dataset. So you need to make a copy of the original vec and then remove the second reviews by yourself.

sineptic commented 1 month ago

but why library doesn't do this?

L-M-Sherlock commented 1 month ago

but why library doesn't do this?

Because this library is only used by Anki. These works have been done in Anki side. So I don't have motivation to implement this in the library.

sineptic commented 1 month ago

OK, thank you for work.

sineptic commented 1 month ago

These works have been done in Anki side.

Is it true, that when I add this (1 long-term repetition) items, other(not first 4 parameters) stay the same?

L-M-Sherlock commented 1 month ago

Is it true, that when I add this (1 long-term repetition) items, other(not first 4 parameters) stay the same?

Yes, because the optimization are divided into two parts: pre-training and training. If you only provide items with one long-term reviewed, the training part will be skipped.

sineptic commented 1 month ago

Is there any other things, that implemented on Anki side to use this library, or this is only issue?

L-M-Sherlock commented 1 month ago

If I remember correctly, they have been implemented in convertor_tests.rs for test use. You can refer the code.

sineptic commented 1 month ago

@L-M-Sherlock, I create this function

fn extract_first_long_term_reviews<'a>(
    items: impl IntoIterator<Item = &'a FSRSItem>,
) -> Vec<FSRSItem> {
    items
        .into_iter()
        .filter_map(|i| {
            let a = i
                .reviews
                .iter()
                .take_while_inclusive(|r| r.delta_t < 1)
                .cloned() // Close my open PR to make it .copied() and 1.5x faster
                .collect_vec();
            if a.last()?.delta_t < 1 || a.len() == i.reviews.len() {
                return None;
            }
            Some(FSRSItem { reviews: a })
        })
        .collect()
}

It uses Itertools and it runs in 1.5s for 10'000'000 items.