Open jpaillard opened 1 month ago
Looking foward to it. I'm surprised of the dependence on n_train though.
Regarding the dependance on $n{train}$ I think it is worth pointing out that in the paper "A general framework for inference on algorithm-agnostic variable importance", Williamson et al, 2020 they also introduce (in equation 5) a variance correction term, very similar to the one mentioned above, also with a dependance on $n{train}$ and $n_{test}$ to obtain confidence intervals.
Interesting, thx for pointing this.
Provide a function compute the Nadeau and Bengio’s corrected t-test: $t = \frac{\frac{1}{kr} \sum{i=1}^k \sum{j=1}^r x{ij}}{\sqrt{ (\frac{1}{kr} + \frac{n{test}}{ntrain} )\hat\sigma^2} }$. $k$: number of folds, $r$: number of repetitions of the cross-fitting, $n{train}$: number of samples used for training , $n_{test}$: number of samples used for testing, $\hat\sigma$ observed uncorrected variance.
This would allow to compute p-values from cross-fitted variable importance estimators for instance. I would see this function going in the
stat_tools.py
module.Nadeau, C., & Bengio, Y. (2000). Inference for the generalization error. In Advances in neural information processing systems. Also described in the sklearn user guide: https://scikit-learn.org/1.5/auto_examples/model_selection/plot_grid_search_stats.html#comparing-two-models-frequentist-approach