Closed yuchen-zhu closed 4 years ago
Many thanks Caroline! I'll take a look later this week or next weekend the latest :)
Hey Limor, thanks for the comments! See below for some response.
- It might be helpful to have a short intro, saying something about the intuition behind the definition, or why should we care? Think wikipedia style, like "In causal inference, it is often the case that we want to estimate counterfactual or interventional quantities we do not actually observe. To do so, we first need to judge exchangeability holds, because...". This is of course optional, but I think some intuition around why we need this, how is it useful and maybe why that name was given could help?
Agreed.
- In the definition section, I think it might not be entirely clear to people less familiar what Y^x stands for. Perhaps it's worthwhile to reiterate briefly the definition of this notation? (I think it becomes clearer with the examples below, but might be worth it. Also, maybe just explain that X in this case means X as observed, and the counterfactual relates to an intervention? + (this is a tiny note, to me it looks like the notation currently read Y^{X} \ind X. Should it not be Y^{x} \ind X ? maybe it's my misunderstanding)
Good question, I find this confusing too. This also shows that it'd be good to have the counterfactual definition pinned down too. I think the clearest form might be "Y^x \ind X for each x" - will change.
- If you used any resources when writing these, would be good to add citations! If not, just further reading link or two at the end, to some textbook or a review paper would be nice. No worries if it's hard to find.
Agreed.
- The examples are super helpful! To my own biased mind, drawing out a DAG to follow was helpful. I wonder if it's worth adding a DAG for the example and counterexample?
Yeah was thinking that too! But then was lazy - I'll add some pictures.
- What do you think about making the connection to confounders and mediators clearer? (i.e. exchangeability doesn't hold for the first but does for the second b/c of these)
Yeah, will add some cf in the examples, thanks.
- In the counterexample, you say at some point "since P(X=0) = P(X=1) = 0.5", and it wasn't clear to me from the onset. Maybe that it's a randomized controlled study, therefore... ? Is that the case?
Oh good catch! Indeed we don't need the probabilities to be 0.5 for the examples to hold, will change this - thanks!
- Another stylistic comment -- the example and counterexample are enumerated (i.e. they start with "1. ....". Since both aren't really lists, should we take it out?
I'm neutral about this. What I was thinking is that we might want to follow the same format of "example(s)" and "counterexample(s)" for all the definitions, and some of them might have more than one. But happy to take out the enumeration for cases with only one example, if people think that looks better.
- There are a couple of TODOs left. I can help out with that, but let me know if you'd like to deal with those. Also, the connection to statistical exchangeability might be a bit loose (unless I'm missing something). We should either state there is not a real connection, clarify it, or just take it out altogether?
With the TODOs - yeah the first one is just a matter of creating a page for counterfactuals and then link to it, and the second one stems from a confusion of mine so could be worth discussing! With the connection to statistical definition - Hmm, I think there is a real connection - it's that the conditional counterfactuals are exchangeable variables in the statistical sense. Also, there was definitely first the statistical definition, then the causal one. Googling exchangeability I also only find the statistical definition and not the causal one unless I search specifically for 'exchangeability causal inference'. So it seems to me that the causal definition comes from the statistical one, if not equivalent. Because of this, I think we should point out the connection because 1. the connection itself is part of the motivation of the definition, and is part of the reason that it's called 'exchangeability', 2. I think it'll be quite unsatisfying to find two seemingly different definitions of the same term without understanding their connection 3. Indeed I haven't found many/any sources pointed out the connection in a satisfying way, when there clearly is a connection, so it seems to call for us to do it.
Perfect! Yeah, I don't mind too much about (7) and agree completely with your points in (8). For some reason the partial ordering induced by an SCM made me think exchangeability in the statistical sense (where order doesn't matter) seemed odd to me, but you're right, there is a connection between the two and some reference to the well known statistical definition should be there.
Want to make a round of edits and leave any ambiguities for discussion this coming Wed in the study group? And then we can merge right after?
Sounds great!
Further reading section:
One more:
P(Y^x | X = x ) = P(Y^x) if Y^x \independent X
Hey! made some new changes - sorry this took so long!
I left the todo note in connection to the statistical definition as that one requires some more work and I find that a bit hard today.
Any more feedback before merging welcome! If not, also happy to merge now and update later!
Thanks!!
Hey guys wanna review exchangeability? It is in exchangeability.ipynb