comments on manuscript - Githubissues

nickreich commented 7 months ago

the examples feel a bit rushed through. I think separate, complete tabular examples of a single quantile and something like 1-3 separate pmf predictions should be shown. (Currently only quantile is shown, and somewhat awkwardly, with just the top rows of a very large df, as opposed to a single example prediction.)
I suggest adding a table (or adding content to the output_type table shown in section 3.1) that shows, for each output_type, the default agg.fun of each ensemble function. Right now, this is described in passing in section 4.2 but I think having this clearly laid out and summarized would be helpful.
the tables and data frame summaries are not clearly labeled/numbered in all cases. It would be nice to be consistent about this. E.g. the output_type table in section 3.1 does not have a number. We might want the example data tabular displays to also be labeled/numbered? Depending on the final format (html vs pdf), there might be different ways to do this.
Another comment re: data table formatting, that may or may not end up being relevant depending on final file format. In some of the tables, I had two issues with the display/organization. (1) I think we should try to standardize on having model_id column first to show the "provenance" of the data, and then the task_id, then output_type, output_type_id, then value. (2) In some of the data shown, I couldn't see value or model_id. I think we could fix this in HTML by using something like Kable or the df_print: paged option in the yaml header at the top.
a "linear pool" is also sometimes called "stacking" or "weighted density ensemble", right? Maybe worth saying this explicitly, with some references? (e.g. Wolpert 1992, "stacked generalization"; the Ray and Reich 2018 paper already cited)
when introducing/reading in the flu_forecasts_raw.rds dataset, I suggest summarizing some features of the dataset. E.g. how many dates, models, targets, locations, horizons, are represented? Just showing the top six rows of the dataset is not enough context. Were there inclusion/exclusion criteria for models? Were the same number of models included across all weeks? Noting that if any inclusion criteria was put in place, that could influence the results of the mean ensemble as models that are more unreliable in their forecasts (e.g. have outlying forecasts) are maybe also more likely to not be submitted every week. So if those unreliable models are filtered out it might make the mean ensemble look better than it might in real time without a post-hoc filtering of models in place.
point forecasts are referred to in section 5, but to go along with the hubverse nomenclature, we should refer to these as median forecasts.
We do such a good job showing all of the code, but then in the results section we omit the scoring code. Why?
need to define RWIS and RMAE in the paper.
Table 5.1 caption: suggest adding statement of how bolded numbers were chosen, as well as a statement about when lower is better, etc... Also, given that the differences are quite small and that the median-ensemble is not really a clear "best" model when looking at the figures below, maybe it's not worth boldfacing those numbers at all? As it really looks like we're "crowning a winner" (kind of unintentionally - the text is more ambivalent, but the boldface feels like a stamp of approval somehow) when the evidence for that feels fairly weak to me.
Figure 5.1: I'm torn about using fixed y scale here. It feels important to me to show how the large PIs really are quite different for some methods, but I also feel like it becomes hard to see the actual data and alignment of predictions with the data.
Figure 5.2: can the two legends be merged so we can see point/color on one legend?
Figures 5.2 & 5.3: I found the text to be too small to read and the different shapes not really distinguishable.

elray1 commented 7 months ago

brief comment r.e. stacking -- stacking is a more general idea than linear pools, which includes (weighted) linear pools as a special case, but also weighted quantile averaging, etc. The idea of stacking is that you take the outputs/predictions from a group of "level 0" models (our component models) as the inputs to a "level 1" model (our ensemble method), using out-of-sample predictions from the level 0 models to train the level 1 model. I don't object to providing a reference to Wolpert, but that paper doesn't feel as relevant to me as the other things we're already citing like Stone -- Wolpert doesn't directly address anything probabilistic, but Stone is all about linear opinion pools. And Stone is also an earlier reference.

lshandross commented 7 months ago

About your questions on the case study section:

The suggestion of summarizing the features of the flu_forecasts_raw dataset makes sense. The only inclusion criteria for the analysis was that models forecasted for all 23 quantiles for any combinations of task ids we included. This basically means no models got excluded, so the mean ensemble is how it would have been calculated in real time
- Also, it should be noted that the flu_forecasts_raw.rds dataset actually only contains forecasts for a single forecast date, 2023-05-15, simply to avoid it being larger than it already was. This likely isn't best practice but I wasn't sure what to do to avoid ending up with a really large file or having to pull the forecasts from zoltar or a GitHub repo every time the file was being knit, especially since the resulting ensemble forecasts had already been run and saved in other rds objects
I omitted the scoring code since it didn't seem directly related to the hubverse/hubEnsembles, which seem to be the primary focus of this paper. But I am totally fine with taking the code from the separate r script in sticking it in if that's what we want.
I'm fine with removing the bold face in the table. It was a feature requested in Infectious-Disease-Modeling-Hubs/hubEnsembles-manuscript#1, but I see your point about it seeming like we're "crowning a winner"
For figure 5.1, should we try to do something with insets to show the true scale? Or add more plots within the figure? Not sure what the right answer is
I can work on the other figures to improve them

nickreich commented 7 months ago

I'd have to look at other "software demo" papers to know what the standards are here. It does seem complicated to "show all code" when there are quite a few bespoke steps to downloading and scoring the data. My initial take is that it's ok to have some of that processing excluded as long as

things are well-set-up for reproducibility, e.g., the code for the data extraction and scoring is available and any pre-processed data live in the repo.
we add a separate subsection of the paper that has a statement about reproducibility/software/data/links to repo.

eahowerton commented 6 months ago

@nickreich - for your second suggestion, let me make sure I understand correctly. I think you are suggesting that we add a table that will connect the mathematical descriptions provided in section 2 with the implementation details provided in section 3 (and perhaps consolidate the corresponding discussion that is currently scattered throughout the text). I think this is a really good idea. I see two potential options for implementation:

Option 1: for a given function, list the corresponding mathematical operation	`output_type`	`simple_ensemble()`
`mean`	mean of individual model means	mean of individual model means
`median`	mean of individual model means	NA
`quantile`	quantile average	probability average
`cdf`	probability average	probability average
`pmf`	probability average	probability average

Note, we could also add mathematical notation if it'd be useful.

Option 2: for a given mathematical operation, list the function that will perform it	`output_type`	quantile average
`quantile`	`simple_ensemble()`	`linear_pool()`
`cdf`	NA	`simple_ensemble()` or `linear_pool()`
`pmf`	NA	`simple_ensemble()` or `linear_pool()`

A couple thoughts from me:

I tend to think it is best to identify a mathematical operation that one would like to perform, and then find the corresponding function (this is better suited to the setup of Option 2). However, there is a more clear map between the hubEnsembles functions and each output type, which would favor Option 1. For example, there's no clear analog of quantile average for mean ouput type, which is why I haven't included mean and median output types in Option 2.
The structure of Option 2 also raises a bigger question. While I don't think it makes sense to perform a quantile average in cases where pmf or cdf output types represent discrete variables, I think there also could be cases where these output types are used to discretize distributions of continuous random variables (just as we do with quantile output type). In the latter cases, one can imagine performing a quantile average (perhaps forecasts of peak timing could be an example), but I don't think hubEnsembles currently supports this. Is that right? In a case like this, would the user have to interpolate the CDFs externally before using hubEnsembles? @elray1 and @lshandross curious your thoughts on this point as well.

nickreich commented 6 months ago

Thanks for this carefully laid-out response, with specific options. What about a slight modification to your "Option 1" to include a column for each of the three implemented options for aggregation functions, like this:

Option 1A: for a given function and arguments, list the corresponding mathematical operation	`output_type`	`simple_ensemble(..., agg_fun = "mean")`	`simple_ensemble(..., agg_fun = "median")`
`mean`	mean of individual model means	median of individual model means	mean of individual model means
`median`	mean of individual model medians	median of individual model medians	NA
`quantile`	mean at each quantile level	median at each quantile level	average probability at each x
`cdf`	mean cdf value at specified x's	median cdf value at specified x's	average probability at each x
`pmf`	mean pmf value for each bin	median pmf value for each bin	mean pmf value for each bin

I changed the language in the table a bit in hopes of making it a bit more readable without notation, but I'm not sure it's an improvement. Specifically, I was finding it hard to read "quantile average" and "probability average" in the tables and get a picture immediately of what those operations were. I'm not sure that my proposed text is better or more accurate.

lshandross commented 6 months ago

I definitely like one of the two takes on Option 1 over Option 2 — I feel like listing the output_type as the first column makes the table more understandable and easy to follow. I also like Nick's addition to show the difference between two different aggregation functions for simple_ensemble and some of the language changes. However, I think we should be more explicit that simple_ensemble(..., agg_fun="mean") yields the same results as linear_pool for the cdf output types

elray1 commented 6 months ago

I'm ok with either orientation, agree with Emily's statement of the pros and cons.

some thoughts about language in 1a since it seems like that's the preferred option so far: can we aim for some formulaic language like one of the below, where

(summary function) is either "mean" or "median"
(name of predicted functional) is either "mean", "median", "quantile value", "cdf value", or "pmf value"
(name of predicted function input) is not defined for mean/median, for quantile is one of "quantile level", "probability level", or "\tau" (iirc our notation in this paper, didn't double check), and for cdf/pmf is one of "target variable value" or "x", with a consistent choice for either using words or tau/x for the quantile and cdf/pmf

options for formulas using the above terms could be like:

"(summary function) of individual model (name of predicted functional)s [at each (name of predicted function input)]". (where the square-bracketed term is dropped for mean/median types)
"(summary function) of (name of predicted functional)s [at each (name of predicted function input)]"

eahowerton commented 6 months ago

I think that's a helpful suggestion @elray1. I also think @lshandross has a good point, that with more verbiage we risk losing the bigger picture a bit. It seems that there are two important conceptual ideas to convey with this table: (1) multiple functions give the same result for cdf and pmf output types; (2) the linear_pool() function outputs the same (theoretical) result regardless of output_type . Perhaps mixing in a bit of mathematical notation would help this jump out more?

Here's another version that incorporates the wording suggestion from @elray1 and tries to mix in some simple math:

Option 1B: output_type	simple_ensemble(..., agg_fun = "mean")	simple_ensemble(..., agg_fun = "median")	linear_pool()
mean	mean of individual model means	median of individual model means	mean of individual model means
median	mean of individual model medians	median of individual model medians	NA
quantile	mean of individual model target variable values at each quantile level, $F^{-1}_Q(\theta)$	median of individual target variable values at each quantile level	mean of individual model target variable values at each quantile level, $F_{LOP}(x)$
cdf	mean of individual model quantile levels at each target variable value, $F_{LOP}(x)$	median of individual model quantile levels at each target variable value	mean of individual model quantile levels at each target variable value, $F_{LOP}(x)$
pmf	mean of individual model quantile levels at each target variable value, $F_{LOP}(x)$	median of individual model quantile levels at each target variable value	mean of individual model quantile levels at each target variable value, $F_{LOP}(x)$

elray1 commented 6 months ago

I like this latest iteration on the table, including the addition of the notation. Although it did feel funny that there was not notation in the first 2 rows or the 2nd column. But I understand that this is because we don't have convenient/brief notation for these settings...

elray1 commented 6 months ago

update -- for the cdf and pmf rows, to me it feels a bit clearer to write "mean of individual model probabilities at each ..."

lshandross commented 6 months ago

I also like this latest iteration of the table and agree with @elray1's suggestion to use "mean of individual model probabilities at each..." for the cdf and pmf rows.

The cell describing a linear pool for the quantile output type seems a bit confusing to me since the words are the same as that for the simple_ensemble one with a mean aggregation function. I think it should read something more like "mean of individual model quantile levels at each target variable value" (and then it fits nicely with the cdf and pmf cells beneath it)

elray1 commented 6 months ago

in the pmf row, we should probably use a lower case f, right? $f_{LOP}(x)$
and for the quantile/LOP row, Li raises a good point -- I can't think of a succinct way to describe this, but I think the notation should be $F_{LOP}^{-1}(\theta)$. Maybe "Quantile of the distribution obtained by averaging estimated probabilities from each individual model." Which is... not that helpful of a statement, really?

eahowerton commented 6 months ago

Good edits, thanks for catching my careless errors! Here's a new version:

output_type	simple_ensemble(..., agg_fun = "mean")	simple_ensemble(..., agg_fun = "median")	linear_pool()
mean	mean of individual model means	median of individual model means	mean of individual model means
median	mean of individual model medians	median of individual model medians	NA
quantile	mean of individual model target variable values at each quantile level, $F^{-1}_Q(\theta)$	median of individual target variable values at each quantile level	mean of individual model target variable values at each quantile level, $F^{-1}_{LOP}(x)$
cdf	mean of individual model probabilities at each target variable value, $F_{LOP}(x)$	median of individual model probabilities at each target variable value	mean of individual model probabilities at each target variable value, $F_{LOP}(x)$
pmf	mean of individual model probabilities at each target variable value, $f_{LOP}(x)$	median of individual model probabilities at each target variable value	mean of individual model probabilities at each target variable value, $f_{LOP}(x)$

I agree it feels a bit strange that we only use notation in some cells. But I also agree it would probably be more effort/notation than it's worth to formalize something for every cell. A partial solution would be to remove the median column (but keep agg.fun = "mean" in the header of the column that remains). The median column feels a bit redundant to me, but I also see it's purpose so I'm fine either way.

nickreich commented 6 months ago

This has been a productive set of iterations! I think it's looking good! A few additional, very small, comments:

I support @eahowerton 's suggestion to drop the median column (keeping agg_fun = "mean" in the header) and then we could say in a caption that using "median" would just replace the mean with median in each description.
how about adding "cumulative" to the cdf description, e.g. "median of individual model cumulative probabilities at each target variable value"

elray1 commented 6 months ago

I like it. for quantile/linear_pool, the text description still doesn't feel quite right. It says, "mean of individual model target variable values at each quantile level". but that sounds more like a description of a quantile averaging/Vincent approach

eahowerton commented 6 months ago

You're right @elray1, good catch. Here's the version (I think) we're settling on.

output_type	simple_ensemble(..., agg_fun = "mean")	linear_pool()
mean	mean of individual model means	mean of individual model means
median	mean of individual model medians	NA
quantile	mean of individual model target variable values at each quantile level, $F^{-1}_Q(\theta)$	mean of individual model quantile levels at each target variable value, $F^{-1}_{LOP}(x)$
cdf	mean of individual model cumulative probabilities at each target variable value, $F_{LOP}(x)$	mean of individual model cumulative probabilities at each target variable value, $F_{LOP}(x)$
pmf	mean of individual model bin probabilities at each target variable value, $f_{LOP}(x)$	mean of individual model bin probabilities at each target variable value, $f_{LOP}(x)$

One more thought related to @nickreich's suggestion - is it confusing that we're using "cumulative probabilities" in the cdf row and "quantile levels" in quantile row, but we mean the same thing?

nickreich commented 6 months ago

@eahowerton I actually think that the text is correct as is. I always have to re-look at this page to make sure I get it right, but I think the format is that:

For the quantile output type the value column has "target variable values at each quantile level" where the quantile levels are present in the output_type_id column.
For the cdf output type the value column actually does have "cumulative probabilities" and the output_type_id column has a fixed set of target variable values.

If the above is correct, then I think the table is good as is.

elray1 commented 6 months ago

clarifying Emily's comment a little to make sure we're on the same page -- we have these two equations:

$F(x) = \theta$
$F^{-1}(\theta) = x$

The variables $\theta$ and $x$ represent the same thing in these equations, but in the first we call $x$ a "target variable value" and $\theta$ a "cumulative probability", while in the second we call $x$ a "target variable value" in this table but often refer to it as a "quantile", and $\theta$ a "quantile level".

I think that no matter what we do here, it'll be confusing to someone. Maybe the best thing to do is to add something explaining this in the paper. For example, in the methods section, we have this sentence: "To define these two classes of methods, let (F(x)) be a cumulative density function (CDF) defined over values (x) of the target variable for the prediction, and (F^{-1}(\theta)) be the corresponding quantile function defined over quantile levels (\theta \in [0, 1])." Right after that, we could say something like, "Throughout this article, we may refer to $x$ as either 'a value of the target variable' or 'a quantile' depending on the context, and similarly we may refer to $\theta$ as either 'a quantile level' or 'a (cumulative) probability'."

elray1 commented 6 months ago

Double checking the quantile/linear_pool text again -- I would read "mean of individual model quantile levels at each target variable value" as a description of the computation $\frac{1}{N} \sum_i Fi(x)$, which is how we compute the LOP's cdf $F{LOP}(x)$. But when the output type is "quantile", we invert that cdf to return some quantiles. This is why in an earlier comment I suggested the notation $F_{LOP}^{-1}(\theta)$, indicating that the output is going to be on the scale of the target, i.e., "an $x$". And revising my earlier attempt at a text description, maybe we want something like "Quantile of the distribution obtained by computing the mean of estimated individual model cumulative probabilities at each target variable value". This is a mouthful and I'm not sure how helpful it really is, but it's an attempt to sum up in one sentence the 3-step process of (1) interpolating/extrapolating from quantiles to a full cdf; (2) forming the LOP; (3) getting quantiles of that LOP distribution.

eahowerton commented 6 months ago

Thanks for the clarification, @elray1, this is what I had meant. Adding a sentence like you suggest seems like a good solution to me.

RE your second comment, I see your point. I also agree that trying to convey all of this in the table could be difficult. What do you think about putting some of those details in the table caption, with an asterisk or footnote of some kind in the table itself? I think how we decide to handle this depends on what we want the purpose of this table to be: (1) explain exactly what operations are happening when a function is implemented for a particular output type, or (2) give higher-level similarities and differences between the function operations for different output types. My vote would be for (2), but I am open to alternative opinions.

If we opt for something like (2), I wonder if it would be helpful in the caption (or somewhere in the text) to guide the reader through the relationships between rows and columns in this table. I'm thinking something like: "For probabilistic output types (quantile, cdf, pmf), the output type (rows) determines how the resulting ensemble distribution is summarized (as a quantile $F^{-1}(\theta)$, cumulative distribution function $F(x)$, or probability mass function $f(x)$). The function (columns) determines what kind of ensemble distribution is generated (quantile average, $FQ(x)$ or linear pool $F{LOP}(x)$).

I'm not sure this is beautifully written, but hopefully you get the idea.

lshandross commented 6 months ago

I'm also inclined to agree with @eahowerton about option (2) of giving a higher level comparison in the table. We already discuss the need for extra steps in calculating a linear pool for quantile forecasts later in the paper, so perhaps a quick note in the table and reference to the correct subsection would be sufficient.

I also like the suggestion of guarding the reader through relationships between rows and columns in the table either in the caption or somewhere in the text (I don't have a strong preference of where it lives).

elray1 commented 6 months ago

I like option (2) for the table too, and the caption suggestion.

I do think we should continue to think about what goes in the text for that particular table cell. I'm on board with not trying to capture all the detail in a brief statement, but I think we should also be careful to ensure that any description we put there is an accurate description of the methods that are used there (or somehow defers and points the reader to a methods description elsewhere). Right now, the text reads to me like a description of the cdf/LOP methods rather than the quantile/LOP methods.

eahowerton commented 6 months ago

Yes, I think you're right @elray1, it's important to distinguish that cell from the cdf/LOP methods. After trying to come up with some other options, I think the text you suggest may be as concise as we can get. So I'm happy to use it in the quantile/LOP cell.

Let me try to summarize what we've decided on in this discussion:

Add the following table:

output_type	simple_ensemble(..., agg_fun = "mean")	linear_pool()
mean	mean of individual model means	mean of individual model means
median	mean of individual model medians	NA
quantile	mean of individual model target variable values at each quantile level, $F^{-1}_Q(\theta)$	quantile of the distribution obtained by computing the mean of estimated individual model cumulative probabilities at each target variable value, $F^{-1}_{LOP}(x)$
cdf	mean of individual model cumulative probabilities at each target variable value, $F_{LOP}(x)$	mean of individual model cumulative probabilities at each target variable value, $F_{LOP}(x)$
pmf	mean of individual model bin probabilities at each target variable value, $f_{LOP}(x)$	mean of individual model bin probabilities at each target variable value, $f_{LOP}(x)$

In the caption of this table, include:
- a brief orientation of the reader to the relationship between the rows/columns, something like: "For probabilistic output types (quantile, cdf, pmf), the output type (rows) determines how the resulting ensemble distribution is summarized (as a quantile $F^{-1}(\theta)$, cumulative distribution function $F(x)$, or probability mass function $f(x)$). The function (columns) determines what kind of operation is performed, and in turn what ensemble distribution is generated (quantile average $F^{-1}{Q}(\theta)$, or linear pool $F{LOP}(x)$)."
- a mention of interpolation for quantile/linear_pool() cell and point to the relevant section where the details of this are discussed.
- a note that using agg.fun = median would replace the mean with median in each description for simple_ensemble()
Clarify terminology in the methods section. Add the second sentence suggested here (first sentence already in methods): "To define these two classes of methods, let (F(x)) be a cumulative density function (CDF) defined over values (x) of the target variable for the prediction, and (F^{-1}(\theta)) be the corresponding quantile function defined over quantile levels (\theta \in [0, 1]). Throughout this article, we may refer to as either 'a value of the target variable' or 'a quantile' depending on the context, and similarly we may refer to as either 'a quantile level' or 'a (cumulative) probability'."

Let me know if I've missed anything!

eahowerton commented 6 months ago

@lshandross I believe the first five comments in this list have been addressed. It seems you've been addressing the later comments along the way too, but didn't want to close the issue before checking with you.

hubverse-org / hubEnsemblesManuscript

comments on manuscript #23