Closed soodimilanlouei closed 3 years ago
Thanks for reaching out, are you using UpliftRandomForestClassifier
? Currently, uplift tree only supports classfication, it's on the road map to cover regression use case in the future.
Thanks for the response.
Yes, I'm using UpliftRandomForestClassifier
.
When I use UpliftTreeClassifier
for the continuous response variable, it raises an error (the tree is empty).
However, interestingly, when I run UpliftRandomForestClassifier
, it trains the model and I can plot the gain and lift graphs as well. I also can visualize one of the trees in the forest; however, p-values are always NaN. So, I assume I should not trust these results, right?
Correct, this is not from the accurate Regression tree implementation.
In this example, shouldn't the base learners for R-Learner be RandomForestClassifier
instead of RandomForestRegressor
since the response variable is binary (Conversion)?
I think you're right, thanks for flagging that! Ideally, should use synthetic_data()
to generate dataset with continuous target variable rather than binary in feature_selection.ipynb
example notebook for Regressors.
Another question that I have is regarding the arguments of BaseXClassifier
function. In this link, it says that:
_"outcomelearner (optional): a model to estimate outcomes in both the control and treatment groups. Should be a regressor." _"effectlearner (optional): a model to estimate treatment effects in both the control and treatment groups. Should be a classifier."
Shouldn't this be the other way around considering that we are dealing with a classification problem? outcome_learner to be a classifier and effect_learner to be a regressor?
In lines 670 and 672, function predict_proba
is called which is defined for classifiers and not for regressors and since the probability of belonging to class 1 is subtracted from the actual Y, the new response variables (d_c
and d_t
) is continuous and needs a regressor for fitting.
Your points make sense to me that the docstring for parameters of BaseXClassifier
needs some correction, predict_proba() invocation is for classfiers. @ppstacy @jeongyoonlee, or others, can you confirm that for an X-Learner classifier, the outcome learner (M1 and M2) is classifier and the effect learner (M3 and M4) is regressor?
Thanks @soodimilanlouei and @paullo0106. Yes, in BaseXClassifier
, outcome_learner
should be a classifier while effect_learner
should be a regressor. Please feel free to submit a PR. Thanks again!
the arguments documentation part was fixed in PR #251
I'm training a random forest model, where the response variable is continuous. When I look at one tree from the forest, the p-values are always NaN. Why is that?