Closed L-M-Sherlock closed 7 months ago
next_day_starts_at = 4
It's 5. It's not super important, but still.
Can you explain what this code is doing? I'm havign a hard time understanding it.
Also, seems like you've added some new metrics:
I'm assuming E50 is the median error (based on bins) and E90 is the 90 percentile (also based on bins). What's ICI?
What's ICI?
You can see this issue:
Ok, but what about the code itself? Does it just run the optimizer on every single deck?
ici = np.mean(np.abs(observation - p))
I don't think that's what the paper suggests. In the paper, the values are weighted by the empirical density function of
the predicted probabilities.
So basically, right now we are using the number of reviews in each bin as weights. For ICI, we should use probability density. I think I could do that with FFTKDE, I'll try to tinker with it later and maybe I'll submit a PR.
I don't think that's what the paper suggests.
Did you check the appendix of the paper?
So basically, right now we are using the number of reviews in each bin as weights.
ICI does't require any bins.
Ok, but what about the code itself? Does it just run the optimizer on every single deck?
It filter out the decks containing >=1000 reviews and generate deck level's parameters for each one and predict one by one. We can evaluate the average error after joining them. And then optimize FSRS in the joined dataset and evaluate it with the collection level's parameters.
Did you check the appendix of the paper?
That's weird, in the paper they clearly say that it should be weighted.
ICI does't require any bins.
I meant RMSE, sorry, my wording wasn't clear. I was trying to say "RMSE uses n reviews in each bin as weights, but since ICI is continuous, it should use a continuous counterpart - probability density".
The probability density has been in the array of p
.
p
is predicted probability, observation
is smoothed using lowess. What I'm saying is that, if I interpreted the paper correctly, then instead of this:
ici = np.mean(np.abs(observation - p))
it should be this:
ici = np.average(np.abs(observation - p), weights=pdf(p))
Where pdf(p)
is an empirical probability density function. Remember, not all values of p
are equally likely to occur. This is why for RMSE bins are used.
observation is smoothed using lowess
Here the lowess has applied pdf to observation
. Because lowess is locally weighted scatterplot smoothing.
Ok, my bad then.
The paper is correct, independently of using lowess
in $f
$.
Using the notation from the paper, we don't know $\phi
$. We can only observe an empirical distribution $\hat{\Phi}_n
$ from the predicted probabilities.
There is a result that says that
$\mathbb{E}_{\hat{\Phi}_n}[f(X)] = \frac{1}{n}\displaystyle\sum_{i=1}^n f(x_i)
$
where $X \sim P
$ and $x_1, \dots, x_n
$ observations from $P
$.
In our case, the lhs is $\displaystyle\int_0^1 f(x) d\hat{\Phi}_n
$ which approximates $\displaystyle\int_0^1 f(x) \phi(x) dx
$; the rhs is np.mean(np.abs(observation - p))
.
See for example https://math.stackexchange.com/q/1267634
@L-M-Sherlock there is something I want you to investigate. Try selecting different thresholds, like 1000 reviews, 2000 reviews, 4000 reviews, etc., and seeing how well FSRS performs if all subdecks with <threshold reviews inherit the parent's parameters. The goal is to see whether there is such a thing as an optimal threshold. If the threshold is too low, it may not be a good idea to run FSRS on all decks, since a lot of them will have very few reviews, and we know that RMSE decreases as n(reviews) increases. But if the threshold is too high, we might end up grouping together decks with very different material. So there probably exists an optimal threshold.
Try selecting different thresholds, like 1000 reviews, 2000 reviews, 4000 reviews, etc., and seeing how well FSRS performs if all subdecks with <threshold reviews inherit the parent's parameters.
Assuming the threshold is 1000, the decks and their sizes are
deck size
A::1 1000
A::2 2000
A::3 500
How to separate them? Which parameters should A::3
use?
If A3 has a parent deck, it should use the parameters of the parent deck. If not, then use the global parameters, which can be obtained by running the optimizer on the entire collection.
OK. I guess the best way here is to optimize FSRS in all level of decks and save all parameters in a table for following tests.
@Expertium, I did an experiment in your collection:
https://github.com/open-spaced-repetition/fsrs-when-to-separate-presets/blob/main/split-vs-concat-top-to-bottom.ipynb
It reduces 16% RMSE(bins) via optimizing in deck level. I guess you would be interested.