interpretml / DiCE

Generate Diverse Counterfactual Explanations for any machine learning model.
https://interpretml.github.io/DiCE/
MIT License
1.33k stars 184 forks source link

Time series data with DICE? #180

Open Saladino93 opened 3 years ago

Saladino93 commented 3 years ago

Hi everyone,

I am still quite new to machine learning coding. I managed to do a few things with DICE, although I was wondering how do you manage time series data for a general model?

The idea is that the counterfactual explainer has to be able to respect time ordering when generating counterfactuals. The only idea I have till now is to just post-filter counterfactuals after they are generated... Any suggestion or better ways (if this simple way is ok)?

amit-sharma commented 3 years ago

This is a great question and depends on the causal relationships between features. We had some preliminary work on this but it may take some time to be integrated fully with DiCE.

I think your post-filtering step is simple and effective. As long as you generate enough number of CFs, post-filtering on them should return the desired CFs. In the future, we may actually implement a function for specifying these kind of constraints. To help design it, can you share what would be ideal for you? Would you want to provide the two variables and which one causes the other? Or some other kind of constraint that you have in mind?

Saladino93 commented 3 years ago

@amit-sharma thanks for your reply! Yes, I am aware of that work, although it does not seem to deal directly with time series. Maybe one with some manipulation could make it work.

For me it would be ideal to choose the constraints, plus respect causality among features for a fixed time, plus causality along time.

I do not have for now specific constraints in mind (but if I come I will let you know).

For now, I was running this example https://machinelearningmastery.com/feature-selection-time-series-forecasting-python/ , to see how time series prediction can be done in Python, and then I ran Dice on top of it.

Basically, after some feature engineering for time series data, you have [t-12, t-11, ...., t-2, t-1, t] lag variables, and you want to predict t from the other ones. After training I run Dice for counterfactuals to obtain what I could change in my lagged variables (although this probably is more a what-if analysis, rather than what I can do, in general, as one changes variables of the past).

Here is an example

image

The big picture would be to run on stores data for example, like here https://www.kaggle.com/kyakovlev/m5-lags-features/notebook, although I have still to try there.

One last thing. For now, I will generate tons of CFs and check which ones make sense when I filter (I have to develop some automatic way). Although, how much is safe to run lots, like 10 or 20, of CFs? Did you see this paper on robustness of generation of CFs https://arxiv.org/pdf/2106.02666v1.pdf (Counterfactual Explanations Can Be Manipulated)? I might open another issue to discuss this, seems quite important.

Also, I think I will code the necessity and sufficiency metrics that you implemented in a recent paper of yours. Lots of stuff to discuss here, but step by step.

Thanks, and hopefully I will have something more concrete in the future.

Saladino93 commented 3 years ago

@amit-sharma sorry to disturb you again, but would you have any idea if DICE will support time series, or how to do this?

shreyakhandelwal07 commented 1 year ago

Hi @amit-sharma is there any update on having support for time series?

asha24choudhary commented 6 months ago

Hi @amit-sharma. Was thinking if we should introduce feature lags as new features to deal with time series data. What I mean is if we have feature A,B, C, do you think we should add A, B, C, A_lag1, B_lag1.... until a certain lag and then perform CF reasoning?

Or please let us know if you have any updates?

asha24choudhary commented 6 months ago

Or was wondering should we use a model in the model parameter which takes into account temporal dependencies?

amit-sharma commented 6 months ago

yeah, the feature lags is the best solution currently in DiCE.

asha24choudhary commented 6 months ago

Oh thank you :)