interpretml / interpret-community

Interpret Community extends Interpret repository with additional interpretability techniques and utility functions to handle real-world datasets and workflows.
https://interpret-community.readthedocs.io/en/latest/index.html
MIT License
421 stars 85 forks source link

Interpret explainer module question #514

Open kannanthiru1 opened 2 years ago

kannanthiru1 commented 2 years ago

I ran a forecasting experiment in Azure ML and it chose SeasonalAverage algorithm as the best model. In the explanation, only the target column ‘WeeklySales’ had importance and none of the other input columns (CustomerType, CustomerClass) seem to have any importance. I wanted to force the model to use other input columns. I couldn’t find a way force it in my python script, so I opened a MS support case. They came back and mentioned there was no way to force the model to use the other input columns. I could instead use the interpret package explainer to understand as to why. I tried to use the TabularExplainer, MimicExplainer and PFIExplainer modules, but I get the error “For a forecasting model, predict is not supported. Please use forecast instead.”. I do use forecasting task in the automl_config. MS support suggested that I get in touch with the developers and they also suggested that the explainer modules might not have been designed for time series.

I would appreciate your suggestion.

Thank you Kannan

imatiach-msft commented 2 years ago

@kannanthiru1 can you send some sample notebook with the automl run and how you are running the explainer? Note for some forecasting algorithms, like SeasonalAverage (I think!), the algorithm itself ignores the features and only uses the time column for timeseries forecasting, so you wouldn't actually expect the feature importances to be non-zero for any of the other features.

kannanthiru1 commented 2 years ago

Hello IIya

In the attached script ‘SKU Forecasting 12’,

  1. for Tabular Explainer I am getting the error in line 155
  2. for Mimic Explainer, the line is 172
  3. for PFI explainer, the line is 185

I would appreciate your help. Kannan SKU Forecasting 12.txt

imatiach-msft commented 2 years ago

@kannanthiru1 sorry where can I get the "SKU Model Training 10" and "SKU Model Testing 10" datasets used in the script above? I'm not sure how to reproduce this issue. If the dataset is private, can you send me some dummy data and script which I can use to reproduce the issue?

imatiach-msft commented 2 years ago

@kannanthiru1 just looking at the code the main issue you are having is that you are not using the automl_explain_utilities. Please see the guide: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-machine-learning-interpretability-automl

you need to wrap your model using

 from azureml.train.automl.runtime.automl_explain_utilities import automl_setup_model_explanations

automl_explainer_setup_obj = automl_setup_model_explanations(fitted_model, X=X_train, 
                                                             X_test=X_test, y=y_train, 
                                                             task='forecasting')

this will resolve the error “For a forecasting model, predict is not supported. Please use forecast instead.”, as the model wrapper from automl_setup_model_explanations just remaps the forecast() method on the timeseries model to be called predict() instead.

kannanthiru1 commented 2 years ago

Hello llya

Thank you for the help. After adding the wrapper in the code (lines 145 – 167 in the attached file ‘SKU Forecasting 13.txt’), it generated the explanation dashboard. The importance was only on the target column. It is hard for me to conceive that the best model (SeasonalAverage in this case) isn’t using the other input columns. May be the model doesn’t really see the importance of the other columns. Earlier you had asked for the datasets and I have attached them for your reference.

I appreciate your help. Kannan SKU Forecasting 13.txt SKU Model Testing 10.csv SKU Model Training 10.csv

imatiach-msft commented 2 years ago

@kannanthiru1 hi, I reached out to the AutoML timeseries forecasting team and they replied that this is indeed expected. SeasonalAverage is a naive model, and the features are not used for making predictions (however, as I understood from them, some categorical features are used for determining the grains which are used for making predictions). After discussing this with them, they said they will follow-up on this issue by improving the documentation more and possibly explaining how each of the supported forecasting models work. So, to summarize, this is expected, and there is a follow-up to improve documentation.

kannanthiru1 commented 2 years ago

Thank you Ilya for following up. I deeply appreciate it. Kannan