Closed elray1 closed 10 months ago
We'd like to think about how we might incorporate trimming here. Think about trimming either the most extreme models at each point on the horizontal axis, or the most extreme models at each point on the vertical axis. Would like to have something that's consistent across the different output types, or a way to specify which kind of trimming to do that validates that the combination of output type and trimming type is supported.
From Emily: here's the paper where trimming is proposed, and my paper on LOP vs. Vincenti:
Noting that after discussion with Emily today, we decided to build around distfromq for quantile interpolation, possibly adding functionality to distfromq for simpler, faster methods if the spline-based method is too slow.
I'm going to record some ongoing questions about how to handle samples in the linear pool ensemble function.
First, note that in the simplest case where we have (1) equal weights for all models, (2) the same number of samples from each component model, (3) no limit on the number of samples the ensemble is allowed to produce, and (4) no desire to enforce any particular (or consistent) dependence structure on the ensemble samples, the ensembling operation is straightforward: we simply collect all of the samples from component models and update the sample index (specified by the output_type_id
) to be distinct for samples from different component models. However, the operation is more challenging if any of these simplifying conditions are not satisfied. I'll describe considerations for each in the points below:
unequally-weighted models: If the models have unequal weights, I see two ways forward:
differing number of samples from component models: If the component models have equal weight but provide different numbers of samples, we effectively need to give the samples from those models different weight, and then we're back at the considerations under point 1 above.
limit on the number of samples the ensemble is allowed to produce: Suppose that a hub enforces a limit of 1,000 samples per model, and an ensemble combines predictions from 10 models. If each component model submits 1,000 samples, the naive aggregation strategy will result in an ensemble with 10,000 samples. A hub may decide that this is OK for ensemble models and allow that submission anyways, but I could also see a hub wanting to avoid very large file sizes. In this case, the only way forward I can think of is to take a subsample of the samples provided by each component model. Should this just be done at random, or is there a more systematic way forward?
desire to enforce a consistent dependence structure: Suppose we want our ensemble's samples to represent "trajectories", i.e. we want the samples to be draws from a joint distribution across forecast horizon within each combination of other task id variables like location. If each component model produces samples that capture at least that level of dependence, we are good to go, but if some of the component models provided samples from a separate marginal distribution at each horizon, there will be a problem. As a simple solution, we could just throw those samples out. As another alternative, we could potentially apply some approach like a Schaake shuffle to try to recover the desired dependence structure.
My basic question is: how many of these considerations should we address in this function? I think that a first vote is that 4 seems out of scope for this function, but we might want to do something about 1, 2, and 3?
I'm on board with all of Seb's suggestions here (noting for 2 that I think of resampling with replacement as a strategy for dealing with weighting as in 1.i.)
I agree with everything that has been stated so far. A few additional comments about how I've seen samples used/presented (most relevant to (3)).
output_type = sample
and an ensemble with output_type = quantile
, for example? This would presumably be a simple (likely separate?) function, or should we leave this to the user? good points, Emily. Some quick thoughts, essentially agreeing with you:
To make it so that we can merge in PR Infectious-Disease-Modeling-Hubs/hubUtils#26 soon with some good working functionality, I'm proposing that we move two pieces of functionality we've discussed for this function out to separate issues: handling of samples, and trimming. I have filed the separate issues Infectious-Disease-Modeling-Hubs/hubUtils#27 and Infectious-Disease-Modeling-Hubs/hubUtils#28 for these things.
We would like to have a linear opinion pool method. Here's a proposed function signature:
The operation will vary by the
output_type
:output_type
smean
,cdf
,pmf
, we can callhubEnsembles::simple_ensemble
directly using a (weighted) mean. Note: in the documentation, we should describe the motivation for this choice for themean
output_type
: the mean of a distributional mixture is the mean of the component distributions.output_type
sample
, the ensemble should collect the samples from all individual models. Theoutput_type_id
column values, containing sample indices, should be updated to ensure that samples from different individual models are given distinct sample indices. Note that the simple suggestion here only works if we have the same number of samples from each component model and the component models have equal weight. Otherwise, we would have to somehow represent weights for these samples.output_type
quantile
, The basic idea is to get an estimate of the cdf and use that. There are two reference implementations of this idea out there:LOP
method in @eahowerton 'sCombineDistributions
package: https://github.com/eahowerton/CombineDistributions/blob/main/R/LOP.Rdistfromq
package has functionality to estimate a cdf or quantile function from provided quantiles, and to generate samples from that distribution. See the vignette here. There is an example of using this for ensemble calculations here.