Closed BrianMiner closed 5 years ago
Can you clarify what you mean by new data?
If you're asking how to use the model to make predictions, the model is saved in up.model
(where up
is a TransformedOutcome
object), and it's just an XGBRegressor
object (or whatever sklearn-style regressor you use). You can then use this to predict on new data as you usually would. I'll clarify this in the documentation.
Thanks for the reply. Do you do..
up.model_final.predict()?
I get an error:
training data did not have the following fields: TransformedOutcome
So I assume this is not correct.
Ah. So there's a quirk with the package in that the dataframe df
that's passed in to the TransformedOutcome
actually gets updated (we automatically add a column to it called TransformedOutcome
). So if you try to then use up.model_final.predict
on the original dataframe as is, it will fail (it will also fail because you should also not pass in the col_outcome
column into up.model_final.predict()
). For example, as a test, you can try up.model_final.predict(up.x_train)
, and that should work fine.
Let me know if that helps!
Success! Thanks..
I'm generally curious what your team's experience has been with uplift models? I have tried probably every flavor out here and attempted to build them for quite a few years - all the way back the original 2 model and victor lo interaction approaches. I have mainly worked with them in the financial services industry and always found them fragile and very hard to build something that would generalize and be stable. in a more pure retail / consumer commerce space have you had success?
We've historically had mixed success as well (uplift modeling can indeed feel fragile!), but we have had much more promising, consistent results once we shifted over to the methods laid out in this package. Susan Athey's Transformed Outcome method is simple, but once we adjusted evaluation methods to account for its shortcomings, it turned out to be quite effective!
I just came here to note that in order for the up.model_final.predict(up.x_train)
solution to work, you have to set the productionize hyperparameter to True
- otherwise the self.model_final
variable never gets set. Took me a little too long to figure that out and wanted to pass it on in case it would save anyone else some time... :)
up.fit(**best_params, productionize=True)
Ah also for people looking, you don't have to necessarily do that ^. I just assumed he was setting productionize=True, but you could also simply access up.model
for the model trained only on the training data, which functions similarly (up.model.predict(up.x_train)
).
I'll adjust the docs to make this clearer!
Hi, I am trying to score my model on new data. I am building the model using a defined train and validation set. While fitting, if I use up.fit(**best_params, productionize=True), will that give me the model trained on the entire dataset or just the training set ?
Are there any descriptions in the docs for scoring new data?