add support for optimal adjustment sets

jtextor / dagitty

Graphical analysis of structural causal models / graphical causal models.

GNU General Public License v2.0

286 stars 46 forks source link

add support for optimal adjustment sets #85

Open jtextor opened 10 months ago

jtextor commented 10 months ago

Dear Johannes

daggity is a great tool and seems to be used by many people. I use the algorithms in my own causal inference package tigramite.

I was wondering which among the many adjustment methods / sets are available for users and whether there is a default set. Recent research by Henckel, Rotnizky, myself and others has been on which adjustment set is optimal in the sense of minimal variance among all valid sets. I haven't found choices for these sets in the code. Would you be interested to implement the O-set for daggity, if not already there? In the work below I also generalized the Oset to ADMGs with bidirected edges indicating hidden variables.

arXiv.org/abs/2102.10324

Best wishes Jakob

malcolmbarrett commented 10 months ago

This sounds cool, but I don't like the name. Variance is just one aspect of selecting an adjustment set, and I think a user seeing "optimal" will think it's doing something else, e.g. optimally reducing bias (that's what I would think!). If you implement this, maybe something about minimizing variance could be in the name of the option/function.

jtextor commented 10 months ago

I think the reasoning is that all adjustment sets give the same guarantees when it comes to bias, so the only thing that distinguishes them is the variance. I agree that "optimal" may come over as being too strong as it's only optimal when used downstream with linear regression or similar methods.

malcolmbarrett commented 10 months ago

Yes, I reckon that's the idea. But that's true only if the data are perfect. Real life adjustment sets offer different levels of bias reduction despite being theoretically equivalent. Not to get off topic, but I'm moving away from minimal adjustment sets being sufficient in practice for that reason.

Anyway, it's still a cool idea. Looking forward to learning more about it.

jtextor commented 10 months ago

all adjustment set only offer asymptotically unbiased estimation even if valid. With finite data, lowering variance will on average get you closer to the true estimate. The O-set proposed by Jakob and others (initially described here https://jmlr.org/papers/volume21/20-175/20-175.pdf) is in fact not a minimal set; it is basically the adjustment set consisting of all parents of the outcome (or parents on mediating variables), this will actually include variables not related to the exposure at all (elsewhere called competing exposures). Initially this was shown to be variance-optimal for linear regression only but now this has been generalized to further classes of estimators by Rotnitzky and others.

From my understanding of the current theory, this would be the most sane choice for a "default" adjustment set in many circumstances.

malcolmbarrett commented 10 months ago

I have no problem with that idea. In fact, I like it. But again, that assumes that even the finite data are perfect. In real life, adjustment sets are not equal simply because different variables have different measurement quality. In fact, if the adjustment set you're talking about has some badly measured data, the real life variance might be worse.

I definitely agree, though, that an adjustment set like that is a better default that the minimal set. I'm really starting to think that minimal sets are only theoretically interesting for most use cases and not actually enough to get the right answer with real, messy data, variance aside.