maks-sh / scikit-uplift

:exclamation: uplift modeling in scikit-learn style in python :snake:
https://www.uplift-modeling.com
MIT License
725 stars 96 forks source link

Why is perfect uplift calculated differently for uplift curve and qini curve? #93

Open steprandelli opened 3 years ago

steprandelli commented 3 years ago

💡 Feature request

Hi! Perfect uplift is required to compute both perfect uplift curve and perfect qini curve. Why is the formula to generate the perfect uplift different? Does it make sense to unify the perfect uplift formula?

perfect uplift curve

cr_num = np.sum((y_true == 1) & (treatment == 0)) # Control   Responders
tn_num = np.sum((y_true == 0) & (treatment == 1))  # Treated Non-Responders
summand = y_true if cr_num > tn_num else treatment
perfect_uplift = 2 * (y_true == treatment) + summand

perfect qini curve perfect_uplift = y_true * treatment - y_true * (1 - treatment)

Irek21 commented 2 years ago

Same question. I also don't understand the idea of counting perfect uplift in the perfect_uplift_curve, no descriptions anywhere

maks-sh commented 2 years ago

@steprandelli @Irek21 Thanks for your question!

Recall that in the classical uplift problem we are dealing with vectors, target is the value of the target variable and treatment is the value of influence (communication in marketing, treatment in medicine, etc.), which are binary.

Thus, we have only 4 different classes that we need to sort correctly ((1, 1), (0, 0), (0, 1), (1, 0)).

In order to understand what an ideal curve should look like, you need to understand in what order you need to arrange these 4 classes (pairs). Obviously, by moving observations inside each of the classes, the value of the curve will not change.

Let's call the ideal curve the curve with the maximum area under it. So, you need to understand how to rank 4 classes so that the area under the curve is maximal.

In the code, you can find an implementation of how these classes should be sorted. I hope someday we will add a section about metrics, in which there will be material about ideal curves.

If you describe the proofs of sorting these classes in more detail, we will be happy to add it to the user guide.

Many thanks to @kirrlix1994 for consultations on the metrics issues.