microsoft / responsible-ai-toolbox

Responsible AI Toolbox is a suite of tools providing model and data exploration and assessment user interfaces and libraries that enable a better understanding of AI systems. These interfaces and libraries empower developers and stakeholders of AI systems to develop and monitor AI more responsibly, and take better data-driven actions.
https://responsibleaitoolbox.ai/
MIT License
1.32k stars 343 forks source link

Question: install from local files #1590

Closed yangwendy closed 2 years ago

yangwendy commented 2 years ago

The dashboard does not work with catboost model because the y_pred dimension mismatch. So I download the files from git and make simple modification " np.reshape(model.predict(), (len(true_y),1). Then I install from local by pip install -e from my local file.

I got the error when "ExplanationDashboard(global_explanation, catboost_model, dataset=x_test, true_y=y_test)" " FileNotFoundError: [Errno 2] No such file or directory: '/mnt/responsible-ai-widgets/raiwidgets/raiwidgets/widget/index.html'

Please help. Thanks a lot.

imatiach-msft commented 2 years ago

@yangwendy if you are doing the local install using pip install -e, you will first need to build the UI code. You can build the UI code by running at the root directory of the repository:

yarn install
yarn buildall

Note you will need to have node, npm and yarn installed. For more detailed information please see the guide: https://github.com/microsoft/responsible-ai-toolbox/blob/main/CONTRIBUTING.md#development-process Particularly, this might be useful: https://github.com/microsoft/responsible-ai-toolbox/blob/main/CONTRIBUTING.md#run-e2e-tests-locally-with-notebook-data Hope this helps clarify how to build locally.

For the original issue, you can wrap the model to have a predict/predict_proba function in the scikit-learn format. I wonder if using our ml-wrappers repository may resolve this for you:

https://github.com/microsoft/ml-wrappers/

from ml_wrappers import wrap_model
wrapped_model = wrap_model(model, input, model_task='regression')

This function automatically wraps models and tries to put them in a common scikit-learn format all other RAI related packages can work with, but it may not always work.

Can you give more information about which specific catboost python class you are using, and how you trained it? If you have a sample notebook, we can also try to debug if the wrap_model function does not work for you.

yangwendy commented 2 years ago

catboost_iris.zip

Thank you so much for your help. I am waiting for the IT team to install yarn. Meanwhile, I tried the ml_wrapper. It cannot transfer array[[1],[2],[3]] to array[1,2,3]. I attached a simple notebook for your reference.

imatiach-msft commented 2 years ago

@yangwendy thank you for the example notebook. I've sent a PR here to add support for catboost framework (classifier and regressor) to ml-wrappers: https://github.com/microsoft/ml-wrappers/pull/55

Taken from PR description: Mainly, the current predict and predict_proba functions from catboost classifier fail to support scikit-learn specification due to two issues: 1.) Catboost predict_proba returns different results when either a single array is passed to the prediction function or multiple arrays are passed. In the single instance case, catboost returns a single dimensional array of probabilities, whereas scikit-learn models always return a two-dimensional array of probabilities. 2.) Catboost predict function returns two dimensional array of one column, in the format [[1], [3], [2], [1]] etc, whereas scikit learn models return a one dimensional array on predict in the format [1, 3, 2, 1], etc.

For the first issue, we detect if the output dimensions are different in the single vs multi instance case - and if they differ by one, we add an extra dimension for the single instance prediction scenario.

For the second issue, a simple ravel() to one dimension resolves this inconsistency from catboost.

imatiach-msft commented 2 years ago

@yangwendy I've released ml-wrappers 0.2.1 which should now support catboost. Please run:

pip install --upgrade ml-wrappers

Then, you can use wrap_model function to wrap the catboost model: https://github.com/microsoft/ml-wrappers/

from ml_wrappers import wrap_model
wrapped_model = wrap_model(model, input, model_task='regression')

The wrapped catboost model can then be passed to the ExplanationDashboard for viewing.