Describe the bug
When using an UpliftTreeClassifier or UpliftRandomForestClassifier with honesty=True, stratified sampling always fails (because of an invalid call to train_test_split on this scikit-learn version at least).
To Reproduce
from causalml.inference.tree import UpliftTreeClassifier
import numpy as np
num_points = 1_000
X = np.random.randn(num_points, 10)
t = (np.random.rand(num_points) < .5).astype(int)
beta1 = np.random.randn(10)
beta2 = np.random.randn(10)
y1 = X @ beta1; y2 = X @ beta2
y = np.where(t == 0, y1, y2) > 0
model = UpliftTreeClassifier("0", evaluationFunction='CTS', honesty=True, )
model.fit(X, t.astype(str), y)
---> Stratified sampling failed. Falling back to random sampling.
from sklearn.model_selection import train_test_split
train_test_split(X, t.astype(int), y, stratify=[t.astype(int), y], shuffle=True)
Results in
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/ian.delbridge/.pyenv/versions/causalml-developer-py38/lib/python3.8/site-packages/sklearn/utils/_param_validation.py", line 214, in wrapper
return func(*args, **kwargs)
File "/Users/ian.delbridge/.pyenv/versions/causalml-developer-py38/lib/python3.8/site-packages/sklearn/model_selection/_split.py", line 2670, in train_test_split
train, test = next(cv.split(X=arrays[0], y=stratify))
File "/Users/ian.delbridge/.pyenv/versions/causalml-developer-py38/lib/python3.8/site-packages/sklearn/model_selection/_split.py", line 1745, in split
X, y, groups = indexable(X, y, groups)
File "/Users/ian.delbridge/.pyenv/versions/causalml-developer-py38/lib/python3.8/site-packages/sklearn/utils/validation.py", line 453, in indexable
check_consistent_length(*result)
File "/Users/ian.delbridge/.pyenv/versions/causalml-developer-py38/lib/python3.8/site-packages/sklearn/utils/validation.py", line 407, in check_consistent_length
raise ValueError(
ValueError: Found input variables with inconsistent numbers of samples: [1000, 2]
Expected behavior
Successful stratified sampling by treatment and outcome.
Screenshots
If applicable, add screenshots to help explain your problem.
Environment (please complete the following information):
OS: macOS
Python Version: 3.8.17
scikit-learn=1.3.2 (from pip install ".[test]")
Additional context
Add any other context about the problem here.
Describe the bug When using an
UpliftTreeClassifier
orUpliftRandomForestClassifier
withhonesty=True
, stratified sampling always fails (because of an invalid call totrain_test_split
on this scikit-learn version at least).To Reproduce
---> Stratified sampling failed. Falling back to random sampling.
Results in
Expected behavior Successful stratified sampling by treatment and outcome.
Screenshots If applicable, add screenshots to help explain your problem.
Environment (please complete the following information):
pip install ".[test]"
)Additional context Add any other context about the problem here.