Validate coarse-grained samples

annakrystalli commented 2 weeks ago

Currently, samples must strictly match the compound task ID set expectations and cannot handle coarser-grained compound task ID sets. However, in our documentation we state that this should be allowed.

This will need more work to implement but it would be good to understand the purpose of this and details of implamentation.

Questions:

Can different teams be submitting completely different coarser grained samples (i.e. completely different compound idxs?)
Must the compound idx be consistent at least through an individual sample?
Does this affect the way we infer sample dependence (e.g. https://github.com/Infectious-Disease-Modeling-Hubs/hubData/issues/6)? i.e. will different samples be allowed to have different dependence structure? Will it affect plotting in any way? https://github.com/Infectious-Disease-Modeling-Hubs/hubVis/issues/31
If target is a compound task id, would it ever make sense to have a sample associated with more than one target? Would that be a multivariate output of a model that e.g. predicts hospitalisation + deaths together?

annakrystalli commented 2 weeks ago

@elray1 , @nickreich and @LucieContamin would appreciate your thoughts!

LucieContamin commented 2 weeks ago

Please find here my answers to your questions:

My understanding is that in a same round, teams can submit files with different compound ids but they should at least include the minimal level expected from the hub. For example, if a hub has a round defined with 6 task id columns: "origin_date", "scenario_id", "location", "target", "age_group" and "horizon" with "compound_taskid_set": ["origin_date", "scenario_id", "location", "target"]. A team can submit a file with "compound_taskid_set": ["origin_date"] and another with "compound_taskid_set": ["origin_date", "scenario_id", "target"]. However, a file with "compound_taskid_set": ["origin_date", "age_group"] should not be accepted.
I am not sure I totally understand the question here. For me, an individual sample should have a consistent compound id that match the minimal level expected from the hub. And all the samples should follow the same compound id level.
In my experience, to have different model output file for the same round with different level of "sample dependence" does impact the visualization but it's easy to adapt to it but I guess it depends what you want to do with the model output (same for data processing). Once you infer the structure of the samples of the files, it's easy to adapt as necessary, especially as all the model output should at least follow the same minimal structure.
I don't understand that one, so will wait the answers from the others!

annakrystalli commented 3 days ago

I would like to start work on this and have come up with a mechanism to implement it. However, there are still things I am confused about.

Specifically, I can see problems with:

My understanding is that in a same round, teams can submit files with different compound ids but they should at least include the minimal level expected from the hub. For example, if a hub has a round defined with 6 task id columns: "origin_date", "scenario_id", "location", "target", "age_group" and "horizon" with "compound_taskid_set": ["origin_date", "scenario_id", "location", "target"]. A team can submit a file with "compound_taskid_set": ["origin_date"] and another with "compound_taskid_set": ["origin_date", "scenario_id", "target"]. However, a file with "compound_taskid_set": ["origin_date", "age_group"] should not be accepted.

My worry with this is that if a team submits two files for the same round_id (ignoring issues of file naming for now!) with different compound_taskid_sets, when it comes to accessing data or determining the compound_taskid_set from samples, it if they have used integers (say from 1:300) in both files, it could be difficult to ensure that rows with the same output type id but different "compound_taskid_set" are not mixed up as some of the finer grained samples will inevitably match up to the values of some of the coarser grained samples so if, by change, the output type id also coincides, then the compound_taskid_set will be impossible to tell apart.

We could specify that such hubs should require a character output type id but I'm not sure how we would enforce that to avoid the above situation.

elray1 commented 3 days ago

I agree with Lucie's answers to 1 through 3. Clarifying the discussion about 1, I think Lucie was talking about submissions from 2 different teams, or submissions from 1 team in 2 rounds, rather than 2 submissions from a single team for the same round.

For 4, I think it's fair to treat target as just another task id variable for our purposes here. It could make sense to include target in the compound_taskid_set specification, in which case we're allowing for models to produce separate predictions for each target level (e.g., for hospitalizations and deaths separately, or for flu, covid, and rsv separately), or to leave target out of the compound_taskid_set, in which case the hub would be asking for joint predictions of hospitalizations and deaths, or of flu, covid and rsv.

annakrystalli commented 3 days ago

I agree with Lucie's answers to 1 through 3. Clarifying the discussion about 1, I think Lucie was talking about submissions from 2 different teams, or submissions from 1 team in 2 rounds, rather than 2 submissions from a single team for the same round.

Thanks @elray1!

OK great. So just to confirm, you feel samples with different compound_taskid_sets will be clearly distinguishable from each other correct? i.e. once all data is accessed as an arrow dataset

elray1 commented 3 days ago

One more thought about the third question, "Does this affect the way we infer sample dependence (e.g. https://github.com/hubverse-org/hubData/issues/6)? i.e. will different samples be allowed to have different dependence structure?"

I think that for a single submission and target, we should expect all samples to have the same dependence structure. The only exception to this I can think of would be an ensemble that combines samples from models that had different approaches to dependence. But even in a case like that, I think it would be reasonable to say the ensemble should update the sample indices to get a consistent representation of dependence.

LucieContamin commented 3 days ago

I agree with Lucie's answers to 1 through 3. Clarifying the discussion about 1, I think Lucie was talking about submissions from 2 different teams, or submissions from 1 team in 2 rounds, rather than 2 submissions from a single team for the same round.

That is what I was trying to say, thanks for clarifiying.

For 4, Now I think I understand, I agree with Evan's answer.

elray1 commented 3 days ago

samples with different compound_taskid_sets will be clearly distinguishable from each other

Yeah, I think that's right. Those samples will have to come either from different models, or different rounds, or different task groups within a round. Any time we're working with samples, we'll have to be aware that the sample indices are only distinct within combinations of model_id, round (, and task group/target?? actually maybe we are guaranteeing that within a submisison file, the sample indices are different across different task groups?). But as long as we keep track of that, we should be able to distinguish between samples that are submitted with the same sample index. Does that seem right?

hubverse-org / hubValidations

Validate coarse-grained samples #88

Questions: