wayfair / pylift

Uplift modeling package.
http://pylift.readthedocs.io
BSD 2-Clause "Simplified" License
368 stars 76 forks source link

Should training and validation data have same treatment:control ratio? #45

Closed krithika22 closed 3 years ago

krithika22 commented 3 years ago

Hi, I have two questions.

Question 1: Considering the following scenario: I have a dataset where my treatment is given to 94% of the total base so the control is just 6%. Is it the right to approach to use this imbalanced data into the model with specifying the p=0.94 in the TransformedOutcome method? Or if a sampling is needed, how to deal with the validation data? Should that also be sampled?

Question 2: I am using the trained model on a hold out data and I have the uplift scores. But I am getting a huge chunk of customers getting exactly the same uplift score. Any thoughts on why this occurs?

Please advice.