hubverse-org / hubEnsembles

Ensemble methods for combining hub model outputs.
https://hubverse-org.github.io/hubEnsembles/
Other
5 stars 2 forks source link

handle sample types in `linear_pool` #27

Closed elray1 closed 4 months ago

elray1 commented 1 year ago

We pretty much settled on the desired functionality here in discussion on issue #20, but I'm splitting implementation into a separate issue.

elray1 commented 1 year ago

More detailed ideas about handling samples. See comments here and below where we have talked about this before.

Previous decision: This function will not do any validations related to potentially different dependence structures represented by the component models. If a hub cares about enforcing that models use the same dependence structure for their samples, this will be specified in the hub's config, and will be checked at the time that model outputs are submitted to the hub, so we don't need to do validations related to this in this function.

There are two cases.

Case 1: collect component samples

If all three of the conditions in points below are satisfied, this function simply collects the samples from the component models and updates the sample indices to ensure that they are different for different component models:

  1. equal weights for all models,
  2. the same number of samples from each component model
  3. no limit on the number of samples the ensemble is allowed to produce

Case 2: do some sampling

However, if any of those conditions are not satisfied, we have to do something else:

Notes about a new function argument related to desired ensemble sample size

The sampling step in case 2 requires the user to specify how many ensemble samples they want in the output. So we need to add an argument to linear_pool allowing the user to specify this, e.g. n. But we want to allow for n = NULL, to say that if possible, the function should just collect the component model samples as in Case 1 above. Let's set n = NULL as the default, but then throw an error if we end up in case 2 and the user did not provide an integer value of n.

We can have 3 separate validations related to this:

  1. n must either be NULL or coercible to an integer
  2. If weights are provided, n must be an integer. Error text: "Component model weights were provided, so a number of ensemble samples n must be provided."
  3. If component models provided a different number of samples within any group defined by a combination of task ids, n must be an integer. Error text: "Component models provided differing numbers of samples within at least one forecast task id group, so a number of ensemble samples n must be provided."
elray1 commented 1 year ago

Couple of additional thoughts about this:

elray1 commented 4 months ago

I'm closing this issue in favor of #109 and #110