I think the trick in lines 17-20 is unnecessary--all that matters is that the joint is always featurized in the same order relative to the order the marginals are stacked in for all the training and test data. You can pick either order (probably x,y is more intuitive than y,x) and you get the same performance.
You don't do this in the causation_learning_theory code so maybe y'all have already figured this out but I figured I'd point out that this is still here.
Unless standardizing the order of the joint distribution in this way is somehow empirically superior to standardizing it via the order the marginals are in...?
I think the trick in lines 17-20 is unnecessary--all that matters is that the joint is always featurized in the same order relative to the order the marginals are stacked in for all the training and test data. You can pick either order (probably x,y is more intuitive than y,x) and you get the same performance.
You don't do this in the causation_learning_theory code so maybe y'all have already figured this out but I figured I'd point out that this is still here.