Discrepancy between 0.14.1 and 0.15.0

winston-zillow commented 4 months ago

I have a fixed dataset with ~200 covariates and a 6-category discrete treatment and trained CausalForestDML models in both v0.14.1 and v0.15.0 with identical configs and codes, but the results don't quite agree. For example, in one of the treatments that is considered "no harm", the v0.14.1 model estimates 9% of the treated having negative effects due to the treatment while the v0.15.0 estimates 42% of the treated having negative effects. Only 4.4% of the training sample have this treatment opted-in; but another treatment with 11% prevalence also exhibits this kind of discrepancy. (I weighted sample inversely to the prevalence of the treatments.) The v0.15.0 figure is also similar to what I would get if I just use the econml.grf.CausalForest.

My dataset is proprietary real-world data and I haven't tried to see if this can be shown in synthetic data. The results above can be reproduced in multiple runs in each versions.

Further I noticed that v0.15.0 was released a few days ago on Feb 14, 2024 but has no release tag.

I wonder what's the differences between these two versions?

My estimators is defined as

n_trees, n_subtrees = 128, 128 // 4
self.estimator = CausalForestDML(
            model_y=RandomForestRegressor(n_estimators=n_trees, max_depth=10, min_samples_leaf=10, n_jobs=-1),
            model_t=ExtraTreesClassifier(n_estimators=n_trees, max_depth=10, min_samples_leaf=10, n_jobs=-1),
            criterion='het',
            n_estimators=n_trees,
            discrete_treatment=True,
            categories='auto',
            treatment_featurizer=None,

            min_samples_leaf=10,
            max_samples=0.1,
            min_balancedness_tol=.3,
            max_depth=15,
            min_var_fraction_leaf=0.05,
            min_var_leaf_on_val=True,
            min_impurity_decrease = 0.0,
            inference=True, 
            fit_intercept=True, 
            subforest_size=n_subtrees,
            honest=True, 
            verbose=0, 
            n_jobs=_os.cpu_count())

# training
X.shape
# => (779744, 222)
self.estimator.fit(X=X, T=T, Y=y, sample_weight=sample_weight)

# eval inference
effects = self.estimator.const_marginal_effect(X.to_numpy())

kbattocchi commented 4 months ago

Hi Winston, Thanks for pointing out that we were missing a tag for our 0.15.0 release - I've added it. We added a number of new features to this release, but I don't see any obvious way that they would affect your code (and our example notebook generates virtually identical results for CausalForestDML before and after the change from 0.14.1 to 0.15.0). In the meantime I'll try to see if I can reproduce the issue, but if you can produce a synthetic dataset that demonstrates the issue that would save a lot of time.

winston-zillow commented 4 months ago

@kbattocchi I just found out that there is a small discrepancy in my sample weight inputs in my experiments using the 0.15.0 version. Fixing that, bothin produces identical results now. Thanks for adding the release notes.

py-why / EconML

Discrepancy between 0.14.1 and 0.15.0 #853