Entropy balancing with continuous treatment documentation

statzhero commented 2 years ago

I'm unclear in the package's documentation why the ebal method references Vegetabile et al. (2021) when it is based on Tübbicke (2022). (while you're there the citations' year need to be updated, but obvs just a typo.)

The d.moment argument talks about the Vegetabile finding, but do they apply to the moment condition of Tübbicke? It may well be, I'm just double-checking. It seems to me the moment conditions are slightly different. I think the documentation could be clearer re the moment conditions for continuous treatments, e.g. that they are based on Pearsons correlations that go to zero or possibly just adding the formula.

Anyways thanks for making it available.

ngreifer commented 2 years ago

Vegetabile et al. (2021) and Tübbicke (2022) describe the same method, which is estimating weights that ensure the covariates are uncorrelated with the treatment in the weighted sample and that the means of the treatment and covariates are equal in the weighted sample to what they are in the unweighted sample. Tübbicke described the method in simple terms with clear formulas, but Vegetabile investigated the importance of ensuring full distributional balance between the weighted sample and the original sample (i.e., beyond the means). The pre-prints for both of them were released about the same time, but I worked with Tübbicke to get an implementation in WeightIt before either one was published. I used a combination of code contributed by Tübbicke and code used in Vegetabile's entbal package to implement the method in WeightIt.

The moments controlled by moments relate to whether the squares, cubes, etc. of the covariates are fixed to be uncorrelated with the treatment. The moments controlled by d.moments relate to whether the variance, skew, etc. of the covariates and the treatment remain are fixed to be the same in the weighted sample as in the unweighted sample. Only Vegetabile describes anything like d.moments, but they add additional complexity by allowing the fixed moments of the treatment to differ from those of the covariates.

If someone wanted to understand what the method was doing, I would point them to Tübbicke. But for a more complete investigation, I would point them to Vegetabile. I might do the same for binary treatments by referring curious readers to Hainmueller and readers who wanted a deeper understanding to Zhao & Percival.

statzhero commented 2 years ago

Thank you, I guess footnote 5 in Tübbicke got me confused. Personally, I found the Vegetabile notation easier to follow and see what's going on.

ngreifer commented 2 years ago

Actually, I'm realizing that Tubbicke added footnote 5 in the published version but not in the original, probably in response to reviewers wanting their papers to be differentiated, which is why it seems that Vegetabile ignores Tübbicke's work related to balance higher moments of the treatment. The WeightIt implementation was completed before either author was aware of the other, so it is what it is now. This debate is kind of irrelevant in the face of energy balancing, which was explicitly developed to avoid having to make these decisions. See here for an implementation, which will make its way to WeightIt.

statzhero commented 2 years ago

Never heard of energy balancing but sounds fascinating - congrats. ~~How come it's not available via your package if you're the author on both?~~ I assume you are saying it will be very soon.

ngreifer commented 2 years ago

I am saying that :) I haven't gotten around to it yet. The implementation of energy balancing for continuous treatments also has a special history. I implemented it myself in WeightIt using my intuition, and it worked quite well. Meanwhile, Jared and Guanghua had derived a version of it using theory but were struggling to find a successful implementation. We decided to collaborate after I reached out to Jared and we developed a modification of my implementation that they then derived the theoretical properties of. Because my initial implementation was incorrect in some ways, I removed it from WeightIt and was waiting until our paper was finalized and published before restoring it. In that time, Jared wrote that package using a combination of his and my code, and this was meant to be a companion to the paper. I had originally planned to write my own implementation for WeightIt, but since his package seems to be working I may just call his package instead. This has been a low priority because I have other things I'm working on that are more important to me, but I should add this soon. It probably won't be too much effort.

simonschoe commented 1 year ago

@ngreifer any news on the support of continuous treatment variables or maybe a timeline? Waiting for that feature with great anticipation. 😉 Great work with the package, the workflows are super smooth!

ngreifer commented 1 year ago

It has been available for two weeks! https://ngreifer.github.io/WeightIt/news/index.html

simonschoe commented 1 year ago

It has been available for two weeks! https://ngreifer.github.io/WeightIt/news/index.html

@ngreifer Awesome, what a timing, thanks for the implementation! After reading the docs, there is one thing I am still not 100% certain about because the example stops after balancing: How would I go about estimating the marginal effects using marginaleffects::avg_comparisons() ? In partciular, what do I select for the newdata argument? Do I use the subset of my data, where the continuous treatment is > 0 (to obtain the treatment group)? Then, what is compared with each other in the output? (in my mind, its more straight forward for the binary or multinom. treatment where I can have all these pariwise comparisons)

ngreifer commented 1 year ago

There aren't great procedures for estimating the effects of continuous treatments specified in the literature. You would not use avg_comparisons() at all though, so your attempt is not right. Typically the estimand is the average dose-response curve (ADRF). You can estimate and plot the curve using marginaleffects + ggplot2 or using clarify. For both, yu start out by fitting a flexible linear model, which can include treatment-covariate interactions:

fit <- lm(Y ~ poly(A, 4) * (X1 + X2 + X3), data = data, weights = weights)

Using `marginaleffects` + `ggplot2`

library(marginaleffects); library(ggplot2)
p <- avg_predictions(fit, variables = list(A = seq(min(data$A), max(data$A), length.out = 51)),
                     vcov  = "HC3", wts = "weights")

ggplot(p, aes(x = A)) + geom_line(aes(y = estimate)) +
  geom_ribbon(aes(ymax = conf.high, ymin = conf.low), alpha = .3)

Using `clarify`

library(clarify)
s <- sim(fit, vcov = "HC3") |> sim_adrf("A", n = 51)
plot(s)

These will both produce a plot of the ADRF, and you can investigate the p and s objects to examine the point estimates and confidence intervals along the curve. There are more flexible models you can fit, too; for example, the independenceWeights package provides the weighted_kernel_est() function, which fits a flexible nonparametric model, and you would bootstrap it to get pointwise confidence intervals, which is what we do in the distance covariance weights paper.

simonschoe commented 1 year ago

@ngreifer thank you so much for taking the time and elaborating, I appreciate it! Just to clarify, two assumptions are implied by your example right?

We are interested in the ATE, because we are not restricting newdata to the treatment group.
The treatment can take on any value in the range of min(data$A) and max(data$A). Could I also apply the technique to ordinal treatments, e.g., on the scale seq(0, 10, by = 1)?

ngreifer commented 1 year ago

We're not interested in the ATE, we're interested in the ADRF. You can of course choose whatever population you want your ADRF to be estimated for, but you will need to decide whether whatever subset you use represents a meaningful group. If your treatment takes on 11 values (0 through 10), I would not say that estimating the ADRF for those with treatment values between 1 and 10 represents a meaningful effect, but maybe you would.

You can estimate the ADRF for ordinal treatments, but you can also treat the treatment as multi-category and use methods for nominal treatments. Teh difference is in whether you want to make smoothness assumptions and let information from one group affect the predicted potential outcomes from another group. If not, you need to estimate the expected potential outcome for each group separately; if so, you can use a model that has fewer parameters than there are groups.

Inspired by your question, I added a section to the Estimating Effects vignette on continuous treatments.

ngreifer / WeightIt