Categorical Covariates in Forest models

I am using DrOrthoForest to analyze CATE for different populations. Since DrOrthoForest does not support string categorical variable. I am turning them into integers to use as categorical variable.

# DR OrthoForest
Y = np.ravel(df[["target_y"]])
T = np.ravel(df[["treatment"]])
W = df[["income","month"]]
X = df[["sex", "age_group"]]

est = DROrthoForest(n_trees=100, max_depth=5, subsample_ratio=1,
                   propensity_model=GradientBoostingClassifier(),
                   model_Y=GradientBoostingRegressor())
est.fit(Y,T,X=X,W=W)

X_test = np.array(list(itertools.product([0,1], range(10))))
X_test.shape
infer = est.effect_inference(X=X_test)

I want to find CATE for each sex-age_group combination, say that age group is 10. So I am testing with [male(0), 10s(1)], [male(0), 20s(2)] ... [female(1), 50s(4)]. However, I noticed that the inference on excess combination also worked albeit with not so statistically significant result. (eg. [0, 6], [1, 10]) If X was set in the beginning, shouldn't inference only be available within the scope of input combinations? Or am I doing something wrong?

py-why / dowhy

Categorical Covariates in Forest models #471