Closed AdiVarma27 closed 4 years ago
Hi! I would really appreciate your contribution to the library 👍
Just a few suggestions:
first, I think it would be better not to define a parameter treatment_interaction
, but define a parameter method
, similarly with TwoModels. So, the approach that is implemented now will be called dummy
(_SoloModel(CatBoostClassifier(verbose=100, randomstate=777, method='dummy')), and the new one will be called treatment_interaction
((_SoloModel(CatBoostClassifier(verbose=100, random_state=777, method='treatmentinteraction'))). It would be better in terms of extensibility in the future. Also, a very similar approach is called SDR (shared data representation) and is described in the article 2. Artem Betlei, Criteo Research; Eustache Diemert, Criteo Research; Massih-Reza Amini, Univ. Grenoble Alpes Dependent and Shared Data Representations improve Uplift Prediction in Imbalanced Treatment Conditions FAIM'18 Workshop on CausalML.
Second, are there any suggestions about using not only Logistic Regression estimator (like in the papers) but also a Tree based estimator? I feel like it is a kind of research question.
Hey ! Thanks for your inputs.
I shall include a parameter method, similar to TwoModels, which works well for ease of extensibility.
1). I shall look into SDR (shared data representation) from the reference you provided and include it once I completely test it on my end.
2). Speaking only about single model, According to (Lo, Victor. 2002. The True Lift Model - A Novel Data Mining Approach to Response Modeling in Database Marketing. SIGKDD Explorations. 4. 78-86.),
In fact, equation (1) is a general supervised learning model form as f(.) may be nonlinear or other complicated functions such as step-functions (e.g. decision trees such as CART and CHAID, see [6;28]), splines ([36;38]), composite functions (e.g. multi-layer perception in neural networks [5;12]), other neural network models (e.g. [23;35]), mixture models (e.g. [37;12]), Bayesian models (e.g. [13;20]), or hybrids (e.g. [11;14;26]).
Hence, we could technically pass in ANY Supervised Model., as long as we have an assigned Propensity (predict_proba() method).
Hey ! I was wondering if I could contribute to scikit-uplift by including an additional parameter to the SoloModel class.
According to the paper (Lo, Victor. 2002. The True Lift Model - A Novel Data Mining Approach to Response Modeling in Database Marketing. SIGKDD Explorations. 4. 78-86.), looking at equation (6), which takes the general form, interaction terms are included in the model.
New changes would have the following:
sm = SoloModel(CatBoostClassifier(verbose=100, random_state=777), treatment_interaction=False)
We would pass in treatment_interaction variable at a class level, and check for it in both the fit and predict methods to include all feature interactions in X array.
If you have any other suggestions to change it elsewhere, we could do that as well.
Kindly let me know your thoughts.