With a low lookahead, i.e. frequent optimisation, pretrain does great on even n=8 (number of reviews trained on is 8).
The issue is that we can't plot the IQR of this easily, or rather it's meaning is limited.
For example, if we were plotting accuracy instead of loss, and we used lookahead=1 (size of the testing data) (this is the ideal lookahead - how well does optimised FSRS perform immediately after optimisation), we would have an IQR between 0 and 1. Every accuracy is 0% or 100%.
For the graphs in metric_over_size.ipynb: n=8 has lookahead 3. Hence we have issues plotting the IQR, but the mean is very accurate and shows good results.
For the first set of graphs in minimum_limit.ipynb: n=8 has lookahead 10. Hence, IQR will be more representative of the actual results, but early test losses will be less accurate as we aren't optimising for a while.
For the second set of graphs in minimum_limit.ipynb: n=8 has lookahead 100. IQR is now less representative because we're optimising on 8 reviews then testing on 100 (100 reviews with a single optimisation after 8 reviews!). Early test losses aren't very accurate because of this.
For the third set of graphs in minimum_limit.ipynb: n=8 has lookahead 1000. The losses of n=8 are now meaningless. (1000 reviews with a single optimisation after 8 reviews!)
To summarise: The analyses suggest pretrain works great on all n. It's just hard to show it through IQR because a larger test dataset means less accurate losses. Lookahead 10 shows it quite well, and mean log loss vs mean rmse shows it even better. (Log loss penalises being very wrong more than RMSE).
With a low lookahead, i.e. frequent optimisation, pretrain does great on even n=8 (number of reviews trained on is 8). The issue is that we can't plot the IQR of this easily, or rather it's meaning is limited. For example, if we were plotting accuracy instead of loss, and we used lookahead=1 (size of the testing data) (this is the ideal lookahead - how well does optimised FSRS perform immediately after optimisation), we would have an IQR between 0 and 1. Every accuracy is 0% or 100%.
To summarise: The analyses suggest pretrain works great on all n. It's just hard to show it through IQR because a larger test dataset means less accurate losses. Lookahead 10 shows it quite well, and mean log loss vs mean rmse shows it even better. (Log loss penalises being very wrong more than RMSE).