Predict New Data? - Githubissues

wayfair / pylift

Uplift modeling package.

http://pylift.readthedocs.io

BSD 2-Clause "Simplified" License

367 stars 76 forks source link

Predict New Data? #11

Closed BrianMiner closed 5 years ago

BrianMiner commented 5 years ago

Are there any descriptions in the docs for scoring new data?

rsyi commented 5 years ago

Can you clarify what you mean by new data? If you're asking how to use the model to make predictions, the model is saved in up.model (where up is a TransformedOutcome object), and it's just an XGBRegressor object (or whatever sklearn-style regressor you use). You can then use this to predict on new data as you usually would. I'll clarify this in the documentation.

BrianMiner commented 5 years ago

Thanks for the reply. Do you do..

up.model_final.predict()?

I get an error:

training data did not have the following fields: TransformedOutcome

So I assume this is not correct.

rsyi commented 5 years ago

Ah. So there's a quirk with the package in that the dataframe df that's passed in to the TransformedOutcome actually gets updated (we automatically add a column to it called TransformedOutcome). So if you try to then use up.model_final.predict on the original dataframe as is, it will fail (it will also fail because you should also not pass in the col_outcome column into up.model_final.predict()). For example, as a test, you can try up.model_final.predict(up.x_train), and that should work fine.

Let me know if that helps!

BrianMiner commented 5 years ago

Success! Thanks..

I'm generally curious what your team's experience has been with uplift models? I have tried probably every flavor out here and attempted to build them for quite a few years - all the way back the original 2 model and victor lo interaction approaches. I have mainly worked with them in the financial services industry and always found them fragile and very hard to build something that would generalize and be stable. in a more pure retail / consumer commerce space have you had success?

rsyi commented 5 years ago

We've historically had mixed success as well (uplift modeling can indeed feel fragile!), but we have had much more promising, consistent results once we shifted over to the methods laid out in this package. Susan Athey's Transformed Outcome method is simple, but once we adjusted evaluation methods to account for its shortcomings, it turned out to be quite effective!

JordanHagan commented 5 years ago

I just came here to note that in order for the up.model_final.predict(up.x_train) solution to work, you have to set the productionize hyperparameter to True - otherwise the self.model_final variable never gets set. Took me a little too long to figure that out and wanted to pass it on in case it would save anyone else some time... :)

up.fit(**best_params, productionize=True)

rsyi commented 5 years ago

Ah also for people looking, you don't have to necessarily do that ^. I just assumed he was setting productionize=True, but you could also simply access up.model for the model trained only on the training data, which functions similarly (up.model.predict(up.x_train)).

I'll adjust the docs to make this clearer!

krithika-vp commented 3 years ago

Hi, I am trying to score my model on new data. I am building the model using a defined train and validation set. While fitting, if I use up.fit(**best_params, productionize=True), will that give me the model trained on the entire dataset or just the training set ?