Closed nickreich closed 5 months ago
brief comment r.e. stacking -- stacking is a more general idea than linear pools, which includes (weighted) linear pools as a special case, but also weighted quantile averaging, etc. The idea of stacking is that you take the outputs/predictions from a group of "level 0" models (our component models) as the inputs to a "level 1" model (our ensemble method), using out-of-sample predictions from the level 0 models to train the level 1 model. I don't object to providing a reference to Wolpert, but that paper doesn't feel as relevant to me as the other things we're already citing like Stone -- Wolpert doesn't directly address anything probabilistic, but Stone is all about linear opinion pools. And Stone is also an earlier reference.
About your questions on the case study section:
flu_forecasts_raw.rds
dataset actually only contains forecasts for a single forecast date, 2023-05-15, simply to avoid it being larger than it already was. This likely isn't best practice but I wasn't sure what to do to avoid ending up with a really large file or having to pull the forecasts from zoltar or a GitHub repo every time the file was being knit, especially since the resulting ensemble forecasts had already been run and saved in other rds objectsI'd have to look at other "software demo" papers to know what the standards are here. It does seem complicated to "show all code" when there are quite a few bespoke steps to downloading and scoring the data. My initial take is that it's ok to have some of that processing excluded as long as
@nickreich - for your second suggestion, let me make sure I understand correctly. I think you are suggesting that we add a table that will connect the mathematical descriptions provided in section 2 with the implementation details provided in section 3 (and perhaps consolidate the corresponding discussion that is currently scattered throughout the text). I think this is a really good idea. I see two potential options for implementation:
Option 1: for a given function, list the corresponding mathematical operation | output_type |
simple_ensemble() |
linear_pool() |
---|---|---|---|
mean |
mean of individual model means | mean of individual model means | |
median |
mean of individual model means | NA | |
quantile |
quantile average | probability average | |
cdf |
probability average | probability average | |
pmf |
probability average | probability average |
Note, we could also add mathematical notation if it'd be useful.
Option 2: for a given mathematical operation, list the function that will perform it | output_type |
quantile average | probability average |
---|---|---|---|
quantile |
simple_ensemble() |
linear_pool() |
|
cdf |
NA | simple_ensemble() or linear_pool() |
|
pmf |
NA | simple_ensemble() or linear_pool() |
A couple thoughts from me:
hubEnsembles
functions and each output type, which would favor Option 1. For example, there's no clear analog of quantile average for mean
ouput type, which is why I haven't included mean
and median
output types in Option 2. pmf
or cdf
output types represent discrete variables, I think there also could be cases where these output types are used to discretize distributions of continuous random variables (just as we do with quantile
output type). In the latter cases, one can imagine performing a quantile average (perhaps forecasts of peak timing could be an example), but I don't think hubEnsembles
currently supports this. Is that right? In a case like this, would the user have to interpolate the CDFs externally before using hubEnsembles
? @elray1 and @lshandross curious your thoughts on this point as well. Thanks for this carefully laid-out response, with specific options. What about a slight modification to your "Option 1" to include a column for each of the three implemented options for aggregation functions, like this:
Option 1A: for a given function and arguments, list the corresponding mathematical operation | output_type |
simple_ensemble(..., agg_fun = "mean") |
simple_ensemble(..., agg_fun = "median") |
linear_pool() |
---|---|---|---|---|
mean |
mean of individual model means | median of individual model means | mean of individual model means | |
median |
mean of individual model medians | median of individual model medians | NA | |
quantile |
mean at each quantile level | median at each quantile level | average probability at each x | |
cdf |
mean cdf value at specified x's | median cdf value at specified x's | average probability at each x | |
pmf |
mean pmf value for each bin | median pmf value for each bin | mean pmf value for each bin |
I changed the language in the table a bit in hopes of making it a bit more readable without notation, but I'm not sure it's an improvement. Specifically, I was finding it hard to read "quantile average" and "probability average" in the tables and get a picture immediately of what those operations were. I'm not sure that my proposed text is better or more accurate.
I definitely like one of the two takes on Option 1 over Option 2 — I feel like listing the output_type as the first column makes the table more understandable and easy to follow. I also like Nick's addition to show the difference between two different aggregation functions for simple_ensemble
and some of the language changes. However, I think we should be more explicit that simple_ensemble(..., agg_fun="mean")
yields the same results as linear_pool
for the cdf output types
I'm ok with either orientation, agree with Emily's statement of the pros and cons.
some thoughts about language in 1a since it seems like that's the preferred option so far: can we aim for some formulaic language like one of the below, where
options for formulas using the above terms could be like:
I think that's a helpful suggestion @elray1. I also think @lshandross has a good point, that with more verbiage we risk losing the bigger picture a bit. It seems that there are two important conceptual ideas to convey with this table: (1) multiple functions give the same result for cdf and pmf output types; (2) the linear_pool()
function outputs the same (theoretical) result regardless of output_type
. Perhaps mixing in a bit of mathematical notation would help this jump out more?
Here's another version that incorporates the wording suggestion from @elray1 and tries to mix in some simple math:
Option 1B: output_type | simple_ensemble(..., agg_fun = "mean") | simple_ensemble(..., agg_fun = "median") | linear_pool() |
---|---|---|---|
mean | mean of individual model means | median of individual model means | mean of individual model means |
median | mean of individual model medians | median of individual model medians | NA |
quantile | mean of individual model target variable values at each quantile level, $F^{-1}_Q(\theta)$ | median of individual target variable values at each quantile level | mean of individual model target variable values at each quantile level, $F_{LOP}(x)$ |
cdf | mean of individual model quantile levels at each target variable value, $F_{LOP}(x)$ | median of individual model quantile levels at each target variable value | mean of individual model quantile levels at each target variable value, $F_{LOP}(x)$ |
pmf | mean of individual model quantile levels at each target variable value, $F_{LOP}(x)$ | median of individual model quantile levels at each target variable value | mean of individual model quantile levels at each target variable value, $F_{LOP}(x)$ |
I like this latest iteration on the table, including the addition of the notation. Although it did feel funny that there was not notation in the first 2 rows or the 2nd column. But I understand that this is because we don't have convenient/brief notation for these settings...
update -- for the cdf and pmf rows, to me it feels a bit clearer to write "mean of individual model probabilities at each ..."
I also like this latest iteration of the table and agree with @elray1's suggestion to use "mean of individual model probabilities at each..." for the cdf and pmf rows.
The cell describing a linear pool for the quantile output type seems a bit confusing to me since the words are the same as that for the simple_ensemble one with a mean aggregation function. I think it should read something more like "mean of individual model quantile levels at each target variable value" (and then it fits nicely with the cdf and pmf cells beneath it)
Good edits, thanks for catching my careless errors! Here's a new version:
output_type | simple_ensemble(..., agg_fun = "mean") | simple_ensemble(..., agg_fun = "median") | linear_pool() |
---|---|---|---|
mean | mean of individual model means | median of individual model means | mean of individual model means |
median | mean of individual model medians | median of individual model medians | NA |
quantile | mean of individual model target variable values at each quantile level, $F^{-1}_Q(\theta)$ | median of individual target variable values at each quantile level | mean of individual model target variable values at each quantile level, $F^{-1}_{LOP}(x)$ |
cdf | mean of individual model probabilities at each target variable value, $F_{LOP}(x)$ | median of individual model probabilities at each target variable value | mean of individual model probabilities at each target variable value, $F_{LOP}(x)$ |
pmf | mean of individual model probabilities at each target variable value, $f_{LOP}(x)$ | median of individual model probabilities at each target variable value | mean of individual model probabilities at each target variable value, $f_{LOP}(x)$ |
I agree it feels a bit strange that we only use notation in some cells. But I also agree it would probably be more effort/notation than it's worth to formalize something for every cell. A partial solution would be to remove the median column (but keep agg.fun = "mean"
in the header of the column that remains). The median column feels a bit redundant to me, but I also see it's purpose so I'm fine either way.
This has been a productive set of iterations! I think it's looking good! A few additional, very small, comments:
agg_fun = "mean"
in the header) and then we could say in a caption that using "median" would just replace the mean with median in each description.I like it. for quantile/linear_pool, the text description still doesn't feel quite right. It says, "mean of individual model target variable values at each quantile level". but that sounds more like a description of a quantile averaging/Vincent approach
You're right @elray1, good catch. Here's the version (I think) we're settling on.
output_type | simple_ensemble(..., agg_fun = "mean") | linear_pool() |
---|---|---|
mean | mean of individual model means | mean of individual model means |
median | mean of individual model medians | NA |
quantile | mean of individual model target variable values at each quantile level, $F^{-1}_Q(\theta)$ | mean of individual model quantile levels at each target variable value, $F^{-1}_{LOP}(x)$ |
cdf | mean of individual model cumulative probabilities at each target variable value, $F_{LOP}(x)$ | mean of individual model cumulative probabilities at each target variable value, $F_{LOP}(x)$ |
pmf | mean of individual model bin probabilities at each target variable value, $f_{LOP}(x)$ | mean of individual model bin probabilities at each target variable value, $f_{LOP}(x)$ |
One more thought related to @nickreich's suggestion - is it confusing that we're using "cumulative probabilities" in the cdf row and "quantile levels" in quantile row, but we mean the same thing?
@eahowerton I actually think that the text is correct as is. I always have to re-look at this page to make sure I get it right, but I think the format is that:
If the above is correct, then I think the table is good as is.
clarifying Emily's comment a little to make sure we're on the same page -- we have these two equations:
The variables $\theta$ and $x$ represent the same thing in these equations, but in the first we call $x$ a "target variable value" and $\theta$ a "cumulative probability", while in the second we call $x$ a "target variable value" in this table but often refer to it as a "quantile", and $\theta$ a "quantile level".
I think that no matter what we do here, it'll be confusing to someone. Maybe the best thing to do is to add something explaining this in the paper. For example, in the methods section, we have this sentence: "To define these two classes of methods, let (F(x)) be a cumulative density function (CDF) defined over values (x) of the target variable for the prediction, and (F^{-1}(\theta)) be the corresponding quantile function defined over quantile levels (\theta \in [0, 1])." Right after that, we could say something like, "Throughout this article, we may refer to $x$ as either 'a value of the target variable' or 'a quantile' depending on the context, and similarly we may refer to $\theta$ as either 'a quantile level' or 'a (cumulative) probability'."
Double checking the quantile/linear_pool text again -- I would read "mean of individual model quantile levels at each target variable value" as a description of the computation $\frac{1}{N} \sum_i Fi(x)$, which is how we compute the LOP's cdf $F{LOP}(x)$. But when the output type is "quantile", we invert that cdf to return some quantiles. This is why in an earlier comment I suggested the notation $F_{LOP}^{-1}(\theta)$, indicating that the output is going to be on the scale of the target, i.e., "an $x$". And revising my earlier attempt at a text description, maybe we want something like "Quantile of the distribution obtained by computing the mean of estimated individual model cumulative probabilities at each target variable value". This is a mouthful and I'm not sure how helpful it really is, but it's an attempt to sum up in one sentence the 3-step process of (1) interpolating/extrapolating from quantiles to a full cdf; (2) forming the LOP; (3) getting quantiles of that LOP distribution.
Thanks for the clarification, @elray1, this is what I had meant. Adding a sentence like you suggest seems like a good solution to me.
RE your second comment, I see your point. I also agree that trying to convey all of this in the table could be difficult. What do you think about putting some of those details in the table caption, with an asterisk or footnote of some kind in the table itself? I think how we decide to handle this depends on what we want the purpose of this table to be: (1) explain exactly what operations are happening when a function is implemented for a particular output type, or (2) give higher-level similarities and differences between the function operations for different output types. My vote would be for (2), but I am open to alternative opinions.
If we opt for something like (2), I wonder if it would be helpful in the caption (or somewhere in the text) to guide the reader through the relationships between rows and columns in this table. I'm thinking something like: "For probabilistic output types (quantile, cdf, pmf), the output type (rows) determines how the resulting ensemble distribution is summarized (as a quantile $F^{-1}(\theta)$, cumulative distribution function $F(x)$, or probability mass function $f(x)$). The function (columns) determines what kind of ensemble distribution is generated (quantile average, $FQ(x)$ or linear pool $F{LOP}(x)$).
I'm not sure this is beautifully written, but hopefully you get the idea.
I'm also inclined to agree with @eahowerton about option (2) of giving a higher level comparison in the table. We already discuss the need for extra steps in calculating a linear pool for quantile forecasts later in the paper, so perhaps a quick note in the table and reference to the correct subsection would be sufficient.
I also like the suggestion of guarding the reader through relationships between rows and columns in the table either in the caption or somewhere in the text (I don't have a strong preference of where it lives).
I like option (2) for the table too, and the caption suggestion.
I do think we should continue to think about what goes in the text for that particular table cell. I'm on board with not trying to capture all the detail in a brief statement, but I think we should also be careful to ensure that any description we put there is an accurate description of the methods that are used there (or somehow defers and points the reader to a methods description elsewhere). Right now, the text reads to me like a description of the cdf/LOP methods rather than the quantile/LOP methods.
Yes, I think you're right @elray1, it's important to distinguish that cell from the cdf/LOP methods. After trying to come up with some other options, I think the text you suggest may be as concise as we can get. So I'm happy to use it in the quantile/LOP cell.
Let me try to summarize what we've decided on in this discussion:
output_type | simple_ensemble(..., agg_fun = "mean") | linear_pool() |
---|---|---|
mean | mean of individual model means | mean of individual model means |
median | mean of individual model medians | NA |
quantile | mean of individual model target variable values at each quantile level, $F^{-1}_Q(\theta)$ | quantile of the distribution obtained by computing the mean of estimated individual model cumulative probabilities at each target variable value, $F^{-1}_{LOP}(x)$ |
cdf | mean of individual model cumulative probabilities at each target variable value, $F_{LOP}(x)$ | mean of individual model cumulative probabilities at each target variable value, $F_{LOP}(x)$ |
pmf | mean of individual model bin probabilities at each target variable value, $f_{LOP}(x)$ | mean of individual model bin probabilities at each target variable value, $f_{LOP}(x)$ |
In the caption of this table, include:
agg.fun = median
would replace the mean with median in each description for simple_ensemble()
Clarify terminology in the methods section. Add the second sentence suggested here (first sentence already in methods): "To define these two classes of methods, let (F(x)) be a cumulative density function (CDF) defined over values (x) of the target variable for the prediction, and (F^{-1}(\theta)) be the corresponding quantile function defined over quantile levels (\theta \in [0, 1]). Throughout this article, we may refer to as either 'a value of the target variable' or 'a quantile' depending on the context, and similarly we may refer to as either 'a quantile level' or 'a (cumulative) probability'."
Let me know if I've missed anything!
@lshandross I believe the first five comments in this list have been addressed. It seems you've been addressing the later comments along the way too, but didn't want to close the issue before checking with you.
the examples feel a bit rushed through. I think separate, complete tabular examples of a single quantile and something like 1-3 separate pmf predictions should be shown. (Currently only quantile is shown, and somewhat awkwardly, with just the top rows of a very large df, as opposed to a single example prediction.)
I suggest adding a table (or adding content to the output_type table shown in section 3.1) that shows, for each output_type, the default agg.fun of each ensemble function. Right now, this is described in passing in section 4.2 but I think having this clearly laid out and summarized would be helpful.
the tables and data frame summaries are not clearly labeled/numbered in all cases. It would be nice to be consistent about this. E.g. the output_type table in section 3.1 does not have a number. We might want the example data tabular displays to also be labeled/numbered? Depending on the final format (html vs pdf), there might be different ways to do this.
Another comment re: data table formatting, that may or may not end up being relevant depending on final file format. In some of the tables, I had two issues with the display/organization. (1) I think we should try to standardize on having model_id column first to show the "provenance" of the data, and then the task_id, then output_type, output_type_id, then value. (2) In some of the data shown, I couldn't see value or model_id. I think we could fix this in HTML by using something like Kable or the
df_print: paged
option in the yaml header at the top.a "linear pool" is also sometimes called "stacking" or "weighted density ensemble", right? Maybe worth saying this explicitly, with some references? (e.g. Wolpert 1992, "stacked generalization"; the Ray and Reich 2018 paper already cited)
when introducing/reading in the flu_forecasts_raw.rds dataset, I suggest summarizing some features of the dataset. E.g. how many dates, models, targets, locations, horizons, are represented? Just showing the top six rows of the dataset is not enough context. Were there inclusion/exclusion criteria for models? Were the same number of models included across all weeks? Noting that if any inclusion criteria was put in place, that could influence the results of the mean ensemble as models that are more unreliable in their forecasts (e.g. have outlying forecasts) are maybe also more likely to not be submitted every week. So if those unreliable models are filtered out it might make the mean ensemble look better than it might in real time without a post-hoc filtering of models in place.
point forecasts are referred to in section 5, but to go along with the hubverse nomenclature, we should refer to these as median forecasts.
We do such a good job showing all of the code, but then in the results section we omit the scoring code. Why?
need to define RWIS and RMAE in the paper.
Table 5.1 caption: suggest adding statement of how bolded numbers were chosen, as well as a statement about when lower is better, etc... Also, given that the differences are quite small and that the median-ensemble is not really a clear "best" model when looking at the figures below, maybe it's not worth boldfacing those numbers at all? As it really looks like we're "crowning a winner" (kind of unintentionally - the text is more ambivalent, but the boldface feels like a stamp of approval somehow) when the evidence for that feels fairly weak to me.
Figure 5.1: I'm torn about using fixed y scale here. It feels important to me to show how the large PIs really are quite different for some methods, but I also feel like it becomes hard to see the actual data and alignment of predictions with the data.
Figure 5.2: can the two legends be merged so we can see point/color on one legend?
Figures 5.2 & 5.3: I found the text to be too small to read and the different shapes not really distinguishable.