Add sample uncertainty to score decompose

lorentzenchr / model-diagnostics

Tools for diagnostics and assessment of (machine learning) models

https://lorentzenchr.github.io/model-diagnostics/

MIT License

28 stars 4 forks source link

Add sample uncertainty to score decompose #72

Open lorentzenchr opened 1 year ago

m-maggi commented 11 months ago

@lorentzenchr do you have any reference for implementing this? This feature sounds very useful and I would be happy to contribute

lorentzenchr commented 11 months ago

I'm thinking of a new function compute_score as the compute_bias. It's just simple t-tests, see the code of compute_bias.

m-maggi commented 11 months ago

@lorentzenchr thanks for clarifying. To put it in pseudo code, compute_score should take as argument at least score_per_obs and at some point call

score_per_obs_de_meaned = score_per_obs - np.mean(score_per_obs)
scipy.special.stdtr(len(score_per_obs) - 1, 
                     -np.abs(score_per_obs_de_meaned / stderr(score_per_obs))

I ignored the weights for the time being. What do you think?

lorentzenchr commented 11 months ago

I would use the model predictions instead of the score per obs, pretty much a blend of decompose and compute_bias:

def compute_score(
    y_obs,
    y_pred,
    feature,
    weights,
    scoring_function,
    functional,
    level,
    n_bins,
):

m-maggi commented 10 months ago

I'm thinking of a new function compute_score as the compute_bias. It's just simple t-tests, see the code of compute_bias.

the t-test in compute_bias is testing whether the bias per observation has 0 mean, right? What would be the null hypothesis in the compute_scorecase? Otherwise to give the user a sense of the uncertainty one could return a confidence interval on the statistical risk, which would use (among other things like the empirical risk) the t-student percentile at the desired confidence level.

lorentzenchr commented 9 months ago

I guess uncertainty / confidence intervals would be enough. As you say, for bias there is a universal reference, i.e. zero, for scores all pairwise comparison are options, that's way too many.