Open lang-benjamin opened 4 years ago
I realised that what I have proposed above is exactly the predictive margins concept from stata. Although this is not ported into the margins
package I consider those predictions more consistent with the whole package. So, in the example above my suggestion is to still use prediction(fm, at = list(am = c(0, 1)))
(and I would actually reconsider adding the predictive margins option as part of this package directly).
Hi @langbe it's not going to be part of marigns, unfortunately, as I want the concepts to be in different packages for clarity. But you're right that this is what should be happening. cplot()
came before the prediction package supported Stata-style predictive margins. My plan is to deprecate cplot()
as it currently exists and replace it with a ggplot2-based plotting approach. As part of that, the new plotting interface will do as you suggest.
I've unfortunately had negative bandwidth for open source work due to the pandemic.
When calculating predictions with cplot()
what = "prediction"
results in predictions holding values of the data at their mean or mode. Although the question what we are trying to estimate is not unique, in the context of AME it seems more reasonable to me to produce different predictions. My proposal is based on the work from Bartus https://journals.sagepub.com/doi/pdf/10.1177/1536867X0500500303Let us assume we have a binary variable x_i. According to equation (5) the AME estimator can be written as AMEi = 1/n sum{k = 1}^n [F(beta x^k | x_i^k = 1) - F(beta x^k | x_i^k = 0)]
or equivalently
AMEi = 1/n sum{k = 1}^n F(beta x^k | xi^k = 1) - 1/n sum{k = 1}^n F(beta x^k | x_i^k = 0).
In the latter representation we may consider each of the two terms as (conditional) predicted value. The first is for x_i = 1 and the second for x_i = 0. Those individual predicted values are consistent with the corresponding AME estimate in the sense that the difference exactly equals the AME estimate. Currently, the difference of the two predicted values is not the same as the AME estimate.
For example
Applying the above mentioned formula we get:
I realize that
am_1
andam_0
are quite different than the current individual predictions (yvals), but I wanted to nevertheless propose to use those instead. Happy to hear your thoughts on that.Remarks:
Note that I was not sure whether the cplot call as above but with
what = "effect"
is what I was looking for. However, the result has zero observations in it, so I assume it is not.This should analogously generalize to categorical variables with more than 2 levels (and to continuous variables)