raamana / confounds

Conquering confounds and covariates: methods, library and guidance
https://raamana.github.io/confounds
Apache License 2.0
37 stars 14 forks source link

Performance score stratified by confound #10

Open raamana opened 4 years ago

raamana commented 4 years ago

utils.score_stratified_by_confound()

Helper to summarize the performance score (accuracy, MSE, MAE etc) for each level or variant of confound. This is helpful to assess any bias towards a particular value when confounds are categorical (such as site or gender). So if the MSE (of target) for Females is much lower compared to Males, then it may indicate a potential bias of the model towards Females (due to imbalance in size?)

raamana commented 4 years ago

@dinga92 , this is related to comparisons we discussed today (figure in your poster).

dinga92 commented 4 years ago

I wouldn't call it a bias, but the function is useful. What about continuous confound variables?

raamana commented 4 years ago

Good question- Quantizing them is one option, but I haven’t thought about in serious detail yet. Let’s get it done for categorical first, like gender, site etc.

mnarayan commented 4 years ago

@raamana is this like a partial dependence type function, one covariate at a time? I would call it something like dependence on a covariate. The covariate could be a source of bias, or simply a moderator (just like covid risk varies with age). The fact that there some type of trend that differs from a flat line would make it important to consider.

raamana commented 4 years ago

Probably similar, I jotted this down many months ago, so I don't recall the particular paper/application that prompted me to think of this.

but I think even a simpler form would help: imagine a bar plot of a metric for different levels of categorical confounder (site, gender etc). In a way, further breaking down the plot from Richard's poster, into different levels of Age (young vs old), Education (highly educated vs. not etc) etc

raamana commented 4 years ago

perhaps we don't need to make it too generic, let's start with concrete applications and real datasets, and evolve it from there as we need them to..

vis helpers to create this from Manjari's slides and provide the result of H0 would be helpful already: between-within-site-H0