wayfair / pylift

Uplift modeling package.
http://pylift.readthedocs.io
BSD 2-Clause "Simplified" License
368 stars 76 forks source link

Question on Package - What is added #10

Closed BrianMiner closed 5 years ago

BrianMiner commented 5 years ago

Thank you for publishing this! I was not sure I entirely followed from the documentation, does the class essentially:

1) Created the transformed class 2) Code some standard and variant evaluation metrics (including weighting them when treatment and control was sampled at uneven rates for some instances) 3) Convenience functions plotting, optimization search etc

Otherwise, what i was curious about, once transformed, is the model fit exactly the same as the base regressor (meaning no change to the fit or objective function)?

Also I was curious, is the same method applicable when the target is not binary but is perhaps something like revenue?

rsyi commented 5 years ago

Absolutely! Thank you for taking a look! If there are pain points in the documentation, feel free to point them out to us/contribute.

The TransformedOutcome class does (1) and (3), but the evaluation metrics largely remain unchanged. By default for xgboost, for example, the objective function is MSE, and optimizing for MSE actually is equivalent to optimizing for lift, so we don't have to do anything (in most cases). It IS possible to change the objective function, however (in xgboost, at least), and we may release some work in the future once we get that done, though in our preliminary tests we've found it to not be a substantial improvement...

is the model fit exactly the same as the base regressor (meaning no change to the fit or objective function)?

Yes it's the same, but the randomized search is not - we create a custom scoring function that optimizes for AUC of the cumulative gains curve. This should prevent a lot of the overfitting typically associated with this method.

Also I was curious, is the same method applicable when the target is not binary but is perhaps something like revenue?

Yes! Everything (except for the show_theoretical_max=True curve and related curves) should work out of the box if you set col_outcome to a continuous value, like revenue, as long as that value is always positive.