kamilest / meds-evaluation

5 stars 2 forks source link

Group stratification #4

Open Jeanselme opened 3 months ago

Jeanselme commented 3 months ago

For fairness metrics, we might want to have an additional vector and stratify to compute fairness. I believe we can start with simple group difference (when the given vector only contains two unique values) and all pair-wise group differences (when more than 2 groups).

Oufattole commented 1 month ago

I think this is a great idea, I was thinking more concretely this could be realized with 2 parquet files, one with model predictions/outputs and one with the patient subgroup assignments at different prediction times:

  1. Model outputs: The meds-evaluation label schema parquet file which has the predictions.
  2. Subgroup assignments: A meds label schema parquet file (specifically which is output by aces) which has the categorical_value field filled out indicating the subgroup a patient is in.

Nice part: The nice thing about having a separate file for subgroup assignments is that you can generate many different subgroup assignments and run your evaluation script only changing the path to the subgroup assignment file. This allows evaluating many different subgroups.

Complex not so nice part: The Model outputs have a subject_id and prediction_time, and you want the subgroup assignments parquet (which also has subject_id and prediction_time) to align with this, so you can just do a join operation on those two columns between the model outputs and subgroup assignments. This may be challenging to deal with if they don't align. Do you want to do a polars join_asof operation (instead of a join) so for each row in the model_outputs parquet, you take the most recent prior row in the subgroup assignments parquet, and if there is no prior row for the subject, you just enter a null subgroup assignment?

Thoughts @Jeanselme @kamilest @mmcdermott @abinithago

abinithago commented 1 month ago

Instead of using join_asof, we could also modify the aces task yaml file that generated the task labels and define a subgroup predicate within the aces task yaml file such that you get both task labels in the boolean_value field and subgroup identities in the categorical_value field. From both these fields, we can use join operation to calculate the task metrics for subgroup of interest.

kamilest commented 1 month ago

I would support having separate files because when we extend the benchmark to support multi-class classification tasks we would need to reimplement a lot of the logic to deal with the overloaded categorical_value field.

mmcdermott commented 1 month ago

I don't think we should put subgroup identities in the categorical values field @abinithago -- for one, it sort of violates the assumptions of the schema, and for another, that data is actually independent of the task prediction step, and can be stored once per dataset rather than on every extracted task. I think the idea of taking the single model output (predictions on all samples for a given task) and splitting it into separate files per subgroup via some kind of join operation, potentially as @Oufattole proposed, is likely the right way to go here. @kamilest, is that what you were suggesting as well, or did I misunderstand your comment?

Jeanselme commented 1 month ago

I believe we were more thinking about saving the different subgroup identities in separate file as you are suggesting. Then this will be an input to the evaluation function to compute the different metrics. I don’t think we should split the prediction files as the computation of some metrics might need access to the different subgroups.