iancovert / sage

For calculating global feature importance using Shapley values.
MIT License
245 stars 34 forks source link

adaptive estimator for online data #8

Open Piyushbalwani opened 3 years ago

Piyushbalwani commented 3 years ago

Hi, how to load and save estimator, and run estimator for adaptive online stream of batch data after loading? Thanks!!

iancovert commented 3 years ago

Hi there, sorry for the slow reply! The easiest way right now would be to save the objects that are necessary to construct the estimator (in many cases, just by pickling the underlying model) and re-create it after loading the necessary objects.

For example, in the bike notebook, you would save the model and the test samples that are used for the imputer. Then to load the estimator, just load the model and data, then create the imputer and then create the estimator.

Let me know if that sounds reasonable. I tried directly pickling the estimator and the only reason that doesn't work is that there's a lambda function in there (due to how we do model conversion) and those can't be saved. But if it would be useful to pickle/unpickle the estimator directly, let me know and I can try to implement that.

Piyushbalwani commented 3 years ago

Thanks @iancovert for the reply. I think this will not serve the purpose.

My goal is to calculate feature importance for the classifier trained on large training data. As data is large, so I am doing batch learning. I am calculating feature importance with help of estimator for only one batch. For next batch, I need to load imputator or estimator, which is learnt from the last batch and incrementally update imputator for this batch.