Open lorentzenchr opened 1 year ago
I'm thinking of a new function compute_score
as the compute_bias
. It's just simple t-tests, see the code of compute_bias.
@lorentzenchr thanks for clarifying.
To put it in pseudo code, compute_score
should take as argument at least score_per_obs
and at some point call
score_per_obs_de_meaned = score_per_obs - np.mean(score_per_obs)
scipy.special.stdtr(len(score_per_obs) - 1,
-np.abs(score_per_obs_de_meaned / stderr(score_per_obs))
I ignored the weights for the time being. What do you think?
I would use the model predictions instead of the score per obs, pretty much a blend of decompose
and compute_bias
:
def compute_score(
y_obs,
y_pred,
feature,
weights,
scoring_function,
functional,
level,
n_bins,
):
I'm thinking of a new function
compute_score
as thecompute_bias
. It's just simple t-tests, see the code of compute_bias.
the t-test in compute_bias is testing whether the bias per observation has 0 mean, right? What would be the null hypothesis in the compute_score
case?
Otherwise to give the user a sense of the uncertainty one could return a confidence interval on the statistical risk, which would use (among other things like the empirical risk) the t-student percentile at the desired confidence level.
I guess uncertainty / confidence intervals would be enough. As you say, for bias there is a universal reference, i.e. zero, for scores all pairwise comparison are options, that's way too many.
@lorentzenchr do you have any reference for implementing this? This feature sounds very useful and I would be happy to contribute