Support for Average Adjusted Predictions

lang-benjamin commented 2 years ago

Currently, the package supports Adjusted Predictions (and Marginal Means) and thus essentially holds predictors at typical values such as their mean. This is a reasonable approach. But since the package is about marginal effects, I am of the opinion that there should be another approach available in this package as well, namely Average Adjusted Predictions. I made a similar comment in the margins package, see issue #144. In fact, it is my understanding that this approach is the default approach in stata, see for example https://www3.nd.edu/~rwilliam/stats/Margins01.pdf page 31. This approach is consistent with the corresponding AME estimate, see page 32 and 34 and also my comment in issue #144.

I therefore propose to add this alternative prediction method to this package.

vincentarelbundock commented 2 years ago

Thanks for the suggestion!

Does the counterfactual argument for datagrid work for you? See here for details:

https://vincentarelbundock.github.io/marginaleffects/articles/predictions.html#counterfactual-data-grid

library(tidyverse)
library(marginaleffects)

mod <- lm(mpg ~ hp * am * vs, mtcars)
predictions(mod, newdata = datagrid(am = 0:1, grid.type = "counterfactual")) |>
    group_by(am) |>
    summarize(across(c(predicted, std.error), mean))

## # A tibble: 2 × 3
##      am predicted std.error
##   <int>     <dbl>     <dbl>
## 1     0      18.1      1.54
## 2     1      22.8      1.41

lang-benjamin commented 2 years ago

Thank you, that is indeed what I was looking for. I just did not realize it. There is quite a lot of possibilities via predictions and datagrid, do you think it would make sense to introduce wrapper functions that collapse such kind of information automatically and that are even more convenient to use? For example a function AAP (average adjusted prediction) for the case above (actually, prediction::prediction is essentially already such a wrapper). Similarly for APM, APR, MEM, AME, MER.

vincentarelbundock commented 2 years ago

Thanks for the input and suggestions, I really appreciate it! This is early days for marginaleffects, so there's lots of room for improvement and opportunity to shape the future.

You're right: all of these quantities can be computed using a combination of the predictions/marginaleffects functions, the newdata argument, and the datagrid function.

The idea of thin wrappers around these is very interesting, and I'm definitely open to discussing that. I see why some users may find them convenient. But for the sake of argument, let me push back a bit with a counter.

The main reason I developed the modelsummary package is that I grew increasingly frustrated with the stargazer function. That function is super powerful, but it has 85(!!!) different arguments, so it's basically impossible to master; I always used to get lost in its documentation. I wanted a table-making package with sane user interface, so I wrote one.

That issue was about the proliferation of arguments in functions, but a very similar issue arises with the proliferation of functions in a package. So there's a trade-off, but both the proliferation of functions and the datagrid style abstraction can affect ease-of-use.

All this to say that while I am open to the idea, I want to think really hard about whether this is really necessary. How much typing/learning does a new wrapper save the user?

Two more specific notes:

Should summary.predictions behave in the same way as summary.marginaleffects and report the AAP (AME)?
What do you think of the new documentation?
- marginaleffects has a new Description and Examples section
- predictions has a new Description and Examples section
- marginaleffects vignette has sections on AME, MEM, MER
- predictions vignette has sections on AAP, APM, APR

Can we get away with improving the docs?

Side note: I know it's not quite standard, but I like "counterfactual" much better than Stata's "representative", because the values picked by users may be completely out of range.

lang-benjamin commented 2 years ago

All this to say that while I am open to the idea, I want to think really hard about whether this is really necessary. How much typing/learning does a new wrapper save the user?

Fair enough. My thinking was not to save code, but to make it easier for the user to get what he/she wants and to make it easier to remember. For APMs a simple predictions() call is enough, whereas for AAP additional arguments and summarize-steps are needed. Wrapper functions would take away the need for these additional steps. I would think that having dedicated wrapper functions handling one specific task makes profileration manageable.

1. Should `summary.predictions` behave in the same way as `summary.marginaleffects` and report the AAP (AME)?
Yes, I like this idea.
2. What do you think of the new documentation?
Thank you so much for the swift update. I think this highly improves the documentation and the understanding.

marginaleffects has a new Description and Examples section Just one technical comment, but I will provide more details on this further down: In the examples section you have the examples marginaleffects(mod, newdata = datagrid(hp = c(100, 110))) and marginaleffects(mod, newdata = datagrid(hp = c(100, 110), grid.type = "counterfactual")) Both are labelled with the same comment. However, the first uses the mean for all other variables, whereas the second uses predictor variables as observed in the data. I think the comments should be revised to make clear what the difference is.

predictions has a new Description and Examples section

In the introductory paragraph it reads "or Predictions at User-Specified Values of the regressors". What do you think about replacing this by "or Predictions at User-Specified Values of the regressors (often referred to as Adjusted Predictions at Representative values, APR)"?
In the examples, a new terminology pops up, i.e. the term "Conditional Adjusted Prediction". Looking at the man page it seems this is just plotting the APM (is this correct?) but has an argument condition that provides the possibility for grouping variables (x-axis and additional grouping variables). That is, the "conditional" is not referring to a specific type of prediction in the sense of some conditional probability but rather to grouping variables for plotting purposes. Therefore, I would not call the function "plot_cap" but rather "plot_apm". What do you think? Would it make sense to introduce plotting functions for AAP or APR?

   * `predictions` vignette has [sections on AAP, APM, APR](https://vincentarelbundock.github.io/marginaleffects/articles/predictions.html)

The definition of an "Adjusted Prediction" is fine, but maybe could be refined further. Since AAP is also possible to be computed, what about amending the definition to also refer to for example "or the observed values" (instead of just mean or factor levels). Not sure about the best way to do this. You make it clear in the sentence "This grid may or may not correspond to the actual observations in a dataset" but I would include a hint already in the main definition.
My main confusion comes from the following (by this I take the point up that I mentioned above). predictions (or marginaleffects) using newdata = datagrid(hp = 100) will plug in 100 for hp for every patient (right?), regardless of whether grid.type = "counterfactual" is set or not. In this sense the datagrid function (alone) allows for counterfactual behavior. Then, the additional argument grid.type = "counterfactual" only makes sure that observed values of the predictors will be used and not the mean value (and duplicates the data). Is my understanding correct? If so, the grid.type = "counterfactual" argument is a bit confusing because it does not control the counterfactual behavior, it defines what happens to the rest of the predictor variables; and plugging in the observed values is not counterfactual. I am not sure if I understood it completely, so please correct me if I am wrong.

Can we get away with improving the docs?

As said, it is much clearer now, thank you. The additional wrapper functions are not a must (due to the improved doc) but would be nice to have so that one does not have to remember the different arguments and do the additional grouping and summarize-steps.

vincentarelbundock commented 2 years ago

Super useful stuff!

TODO list:

[x] Improve labels of grid.type examples in ?marginaleffects
[x] Mention "APR" in the intro paragraph
[x] Add "observed values" to the definition of "Adjusted Predictions"
[ ] Resolve confusion about grid.type
[ ] What is a "conditional adjusted prediction"?
[x] summary.predictions and tidy.predictions calculate AAP
[ ] Decide what to do about wrapper functions.

vincentarelbundock commented 2 years ago

My main confusion comes from the following (by this I take the point up that I mentioned above). predictions (or marginaleffects) using newdata = datagrid(hp = 100) will plug in 100 for hp for every patient (right?), regardless of whether grid.type = "counterfactual" is set or not. In this sense the datagrid function (alone) allows for counterfactual behavior. Then, the additional argument grid.type = "counterfactual" only makes sure that observed values of the predictors will be used and not the mean value (and duplicates the data). Is my understanding correct? If so, the grid.type = "counterfactual" argument is a bit confusing because it does not control the counterfactual behavior, it defines what happens to the rest of the predictor variables; and plugging in the observed values is not counterfactual.

Sorry, but I'm not sure I understand what you mean here. Does this bit from the datagrid() docs clear things up?

grid.type: character

            • "typical": variables whose values are not explicitly
              specified by the user in ‘...’ are set to their mean or
              mode, or to the output of the functions supplied to
              ‘FUN.type’ arguments.

            • "counterfactual": the entire dataset is duplicated for
              each combination of the variable values specified in
              ‘...’. Variables not explicitly supplied to ‘datagrid()’
              are set to their observed values in the original dataset.

vincentarelbundock commented 2 years ago

Your interpretation of plot_cap() seems correct: They are adjusted predictions made over a specific grid. Here, "conditional" is basically synonymous with "adjusted", but we use a different word to underline that we want to look at adjusted predictions for different values of a "condition" variable.

I realize that this may lead to confusion, but it looks like the Stata docs for margins use "conditional adjusted predictions" and "conditional marginal effects" in the same way. Am I misreading their documentation?

conditional2

vincentarelbundock commented 2 years ago

Average Adjusted Predictions

library(marginaleffects)
mod <- lm(mpg ~ hp * wt, mtcars)
pred <- predictions(mod)
summary(pred)
#> Average Adjusted Predictions 
#>   Predicted
#> 1     20.09
#> 
#> Model type:  lm 
#> Prediction type:  response

I’m not exactly sure how Stata calculates standard errors and confidence intervals here. They do not simply take the average of the unit-level standard errors.

lang-benjamin commented 2 years ago

Sorry, but I'm not sure I understand what you mean here. Does this bit from the datagrid() docs clear things up?

The documentation is fine. My point was: Consider the following example call

mod <- glm(am ~ hp * wt, data = mtcars, family = binomial)
mariginaleffects(mod, newdata = datagrid(hp = 100) # or predictions(...) with the same input

This marginaleffects function call produces a counterfactual behaviour because hp is set to 100 for all subjects (or to any other arbitrary value). Maybe this is where I am wrong, but to me this is already counterfactual. But if this already produces something counterfactual a user wonders why there is an additional argument grid.type = "counterfactual" and may ask herself the question "what do I need to do if I want something counterfactual?". (Indeed the difference is how the other variables are handled and it gets clear when reading the documentation thoroughly)

lang-benjamin commented 2 years ago

I realize that this may lead to confusion, but it looks like the Stata docs for margins use "conditional adjusted predictions" and "conditional marginal effects" in the same way. Am I misreading their documentation?

I see. I need to check the notation from Stata (I am not that familiar with it). The question is whether we need to adhere to all their naming conventions.

lang-benjamin commented 2 years ago

I’m not exactly sure how Stata calculates standard errors and confidence intervals here. They do not simply take the average of the unit-level standard errors.

I was also thinking about this point. Not sure at the moment either. In any case, IMHO this is another reason why it makes sense to introduce the wrapper functions AAP etc. so that the user does not have to remember and worry about how to summarise results.

vincentarelbundock commented 2 years ago

mod <- glm(am ~ hp * wt, data = mtcars, family = binomial) mariginaleffects(mod, newdata = datagrid(hp = 100) # or predictions(...) with the same input
This `marginaleffects` function call produces a counterfactual behaviour because `hp` is 
set to 100 for all subjects (or to any other arbitrary value). Maybe this is where I am wrong, 
but to me this _is_ already counterfactual.

Aaah! I see where the confusion comes from. To me, this behavior is not "counterfactual". The command you gave does not "set hp to 100 for all individuals." Instead, it creates entirely hypothetical/synthetic/typical individuals with hp=100 and all other variables set to their means or modes.

My use of "counterfactual" is directly inspired by the Neyman-Rubin Potential Outcomes framework, where we usually write:

Y_i(0): outcome of individual i if it had exactly the same characteristics as now, but with treatment = 0. Y_i(1): outcome of individual i if it had exactly the same characteristics as now, but with treatment = 1.

When you call datagrid(treatment = 0:1, grid.type = "counterfactual"), you get something in the same spirit as the two quantities above.

vincentarelbundock commented 2 years ago

The question is whether we need to adhere to all their naming conventions.

The fact that Stata uses similar language does reassure me a bit, but you are absolutely right: we are not bound to use the same conventions at all.

That said, I still don't really see what the big problem is. The expressions "conditional marginal effects" and "conditional adjusted predictions" feel a bit "redundant" or "pleonastic", but they don't seem strictly "incorrect". As long as the docs don't leave any room for misinterpretation, I think we're fine. I'm open to any suggestions for improvement on that front, of course. Here's what we have now:

Plot Conditional Marginal Effects

Description:

     This function plots marginal effects (y-axis) against values of
     predictor(s) variable(s) (x-axis and colors). This is especially
     useful in models with interactions, where the values of marginal
     effects depend on the values of "condition" variables.

Plot Conditional Adjusted Predictions

Description:

     This function plots adjusted predictions (y-axis) against values
     of one or more predictors (x-axis and colors).

lang-benjamin commented 2 years ago

Aaah! I see where the confusion comes from. To me, this behavior is not "counterfactual". The command you gave does not "set hp to 100 for all individuals." Instead, it creates entirely hypothetical/synthetic/typical individuals with hp=100 and all other variables set to their means or modes.

After reading this, I realize that I was wrong (regarding all individuals). Thanks for pointing me towards this.

vincentarelbundock commented 2 years ago

Thanks for engaging so seriously with these issues, and for the really useful suggestions. Thanks to you, I feel that the vignettes and the documentation are now muuuch better. It was a really good idea to use the specific vocabulary from Stata to structure the documentation. See in particular:

https://vincentarelbundock.github.io/marginaleffects/articles/mfx.html https://vincentarelbundock.github.io/marginaleffects/articles/predictions.html

And also the examples section of the main documentation:

I read the thread again, and it sounds like from a pure functionality perspective, the only thing we are still missing is standard errors for Average Adjusted Predictions. I have opened a new issue to track progress on that front: https://github.com/vincentarelbundock/marginaleffects/issues/216

I thought seriously about the idea of include shortcuts for particular quantities, such as MEM, but ultimately decided against it. I want to the keep the interface ultra-simple, and feel that when a user has "groked" the functions, MEMs (and others) are trivial to compute.

Of course, there are good, valid arguments on the other side as well. At the end of the day, I just had to pick one...

I found the engagement very useful and productive, and hope that you won't hesitate to raise other issues in the future.

Thanks!

vincentarelbundock / marginaleffects

Support for Average Adjusted Predictions #192