open-spaced-repetition / srs-benchmark

A benchmark for spaced repetition schedulers/algorithms
https://github.com/open-spaced-repetition/fsrs4anki/wiki
65 stars 9 forks source link

[Feature Request] Group users into single dataset #17

Closed jamesal1 closed 10 months ago

jamesal1 commented 11 months ago

Currently each user is treated as an isolated dataset and the results are taken by a weighted average by various schemes.

It would be helpful to allow models that learns from one user and can apply it to others. even without card data. A simple one would just take the current FSRS or LSTM benchmark and apply regularization to the parameters relative to the individual mean (or the mean can be its own parameter).

Expertium commented 11 months ago

Aside from using default parameters as a starting point for optimization and choosing reasonable ranges for parameters, there is no way (that I can think of) to utilize the parameters of user A (or multiple users) to train FSRS on user B's data. The default parameters are chosen by running FSRS on all collections, recording the optimal values, and taking the median. Btw, if you are curious about the distributions of parameters, check this out: https://github.com/open-spaced-repetition/fsrs-benchmark/tree/main/plots Just simply grouping all reviews into a single dataset wouldn't me meaningful. Also, I don't know what you mean by "apply regularization to the parameters relative to the individual mean", please explain it in detail.

jamesal1 commented 11 months ago

For each parameter x_i where i is the user, apply the regularization loss l_2*(x_i - x)**2. x can either be the median like it is now, or it can be a free parameter.

Using the median as a default or to determine the range is less effective than regularization, and you can also use a validation set to optimize the l_2 coefficient, etc.

Expertium commented 11 months ago

Most parameters need "hard" bounds, because, for example, negative values or values outside of [0, 1] wouldn't make sense in the formulas. In fact, I don't think there are any parameters in FSRS that can span from -∞ to ∞. Our current optimizer supports L2 regularization (it's called weight_decay in the documentation). I can benchmark it to see if it helps to decrease RMSE.

jamesal1 commented 11 months ago

Sure, having a hard bound could still be useful with regularization. You'll have to offset each parameter by the default values, otherwise it will regularize all parameters towards 0 which isn't good. Thanks for having a look.

L-M-Sherlock commented 11 months ago

Grouping all users into one single dataset for training will run out of my device's RAM.

And it will be biased by users who have more reviews.

Sometime I even want to use mode as the default values instead of median.

Expertium commented 11 months ago

Sometime I even want to use mode as the default values instead of median.

Well, it's time to go deep down the rabbit hole of estimating the mode of a continuous variable.

I know 3 ways of doing that: half-range mode, half-sample mode, and kernel density estimation. The first one is based on a simple principle: take (x_max - x_min)/2, use it as a sliding window and slide across the sample until you find the densest range. Repeat this process within that range. The second one is similar: divide the sample into two groups with an equal number of elements, and find the group with the smallest value of x_max - x_min. The last one is based on creating an empirical probability density function. So which one is better? No idea. Here's the code, have fun: Modes.zip

L-M-Sherlock commented 11 months ago

At least for the interval of good rating, the mode is smaller than median.

Expertium commented 11 months ago

@L-M-Sherlock I think we should use the median in cases where the mode is the min (or max) allowed value, like for w_0: image Here using the mode would just make w_0 = 0.1 But when the mode is not the min/max value, I think it makes sense to use mode. For example, here: image Do you have all of the parameters of all users saved? If so, can you give them to me via a Google Drive link (.json files from the result folder are fine too)? I'll calculate the new default parameters using the median in some cases and the mode in other cases. So here's my idea: we will do a dry run of 3 sets of default parameters: 1) Median parameters (already done) 2) Mode parameters (I'll calculate them myself) 3) Hybrid set where some values are modes and other values are medians

And then we'll see which set results in the lowest RMSE during the dry run.

L-M-Sherlock commented 11 months ago

I saved all parameters in the .json files. You can refer to this code: https://github.com/open-spaced-repetition/fsrs-benchmark/blob/main/analysis.py

And then we'll see which set results in the lowest RMSE during the dry run.

I think it's not a problem about RMSE. It is a problem for new users. Using median for w[0], w[1], w[2] and w[3] means the default first intervals are too long for half learners (too short for the rest, too). But users often complain that the first intervals are too long instead of too short.

Expertium commented 11 months ago

I saved all parameters in the .json files. You can refer to this code

Yes, but I need the files themselves, and I don't want to run the benchmark myself if you have all the parameters saved. EDIT: nevermind, I can just download them from here: https://github.com/open-spaced-repetition/fsrs-benchmark/tree/main/result/FSRSv4

Expertium commented 11 months ago

Seems like mode estimation will have to wait: https://github.com/open-spaced-repetition/fsrs4anki/issues/461#issuecomment-1849374856

Expertium commented 11 months ago

So I did some preliminary testing on a smaller dataset, and it seems like I was right - the mode isn't useful in cases where it doesn't arise naturally, and instead arises as an artifact of clamping. Here's an example: w 15

Expertium commented 11 months ago

w 14 Interestingly, in this case the mode is in the middle.

Expertium commented 11 months ago

Btw, in order to calculate the mode, I use all three estimators (HRM, HSK, KDE), and then take the average of the two closest ones.

Expertium commented 11 months ago

I'll make a new issue about modes and all of this because it's technically unrelated to the current issue.