oegedijk / explainerdashboard

Quickly build Explainable AI dashboards that show the inner workings of so-called "blackbox" machine learning models.
http://explainerdashboard.readthedocs.io
MIT License
2.3k stars 331 forks source link

Question regarding deployment on Heroku #38

Closed hkoppen closed 3 years ago

hkoppen commented 3 years ago

I just tried to deploy my app on Heroku by directly importing the github project. However, I did not manage to "add the buildpack" correctly - I'm still generating a slug larger than 500MB. I did

oegedijk commented 3 years ago

Yes, each composite (base for a tab), simply adds a number to the end to self.name, e.g.

class ImportancesComposite(ExplainerComponent):
    def __init__(self, explainer, title="Feature Importances", name=None,
                    hide_importances=False,
                    hide_selector=True, **kwargs):
        """Overview tab of feature importances

        Can show both permutation importances and mean absolute shap values.

        Args:
            explainer (Explainer): explainer object constructed with either
                        ClassifierExplainer() or RegressionExplainer()
            title (str, optional): Title of tab or page. Defaults to 
                        "Feature Importances".
            name (str, optional): unique name to add to Component elements. 
                        If None then random uuid is generated to make sure 
                        it's unique. Defaults to None.
            hide_importances (bool, optional): hide the ImportancesComponent
            hide_selector (bool, optional): hide the post label selector. 
                Defaults to True.
        """
        super().__init__(explainer, title, name)

        self.importances = ImportancesComponent(
                explainer, name=self.name+"0", hide_selector=hide_selector, **kwargs)

    def layout(self):
        return html.Div([
            dbc.Row([
                make_hideable(
                    dbc.Col([
                        self.importances.layout(),
                    ]), hide=self.hide_importances),
            ], style=dict(margin=25))
        ])

Then ExplainerDashboard instantiates all the tabs using ExplainerTabsLayout, which has the line

self.tabs  = [instantiate_component(tab, explainer, name=str(i+1), **kwargs) for i, tab in enumerate(tabs)]

So each tab gets the name "1", "2", 3", etc. And then each subcomponent gets the name "11", "12", etc.

Are you defining custom components? Or defining them before you add them to ExplainerDashboard?

E.g.

tab = ImportancesComposite()
ExplainerDashboard(tab).run

Would result in a random uuid name for tab

moeller84 commented 3 years ago

Hi We tried running the dashboard on a single container. That works. But running on multiple containers / swarm gives cause to the problem.

oegedijk commented 3 years ago

So in a swarm it starts generating uuid names but in a single container it doesn't?

That seems super strange... Again, the only thing I can think of is old versions of explainerdashboard in a cached docker layer.

oegedijk commented 3 years ago

I'm gonna see if I can build some diagnostic functionality that makes it easier to see the whole component tree, including .name properties, and would also give a warning when it detects any uuid .name...

moeller84 commented 3 years ago

it also does generate uuid names with a single container. But it seems that callback names are being mixed when running on more than one container

oegedijk commented 3 years ago

ah, okay, that at least is an easier to understand problem. So the example I gave you didn't give uuid names right?

Is there any code you can share on how you generate the dashboard? Because you have to be doing something custom otherwise it would just work out of the box.

moeller84 commented 3 years ago

dashboard.yaml

dashboard:
  explainerfile: data/processed/explainer.joblib
  params:
    title: Fastholdelses model
    hide_header: false
    hide_shapsummary: false
    header_hide_title: false
    header_hide_selector: false
    block_selector_callbacks: false
    pos_label: null
    fluid: true
    mode: dash
    width: 1000
    height: 800
    external_stylesheets: null
#    server: true
#    url_base_pathname: null
    responsive: true
    logins: null
    port: 8050
    tabs:
    #- importances
    #- model_summary
    - contributions
    - whatif
    - shap_dependence
    #- shap_interaction
    #- decision_trees

__init__.py

import logging
from pathlib import Path
from flask import Flask

from for_p_afgang_dashboard.extensions import setup_extensions
from explainerdashboard import ClassifierExplainer, ExplainerDashboard
import yaml

# Metadata for the package
# fmt: off
__version__ = "0.1.0"
__url__ = "https://lspgitlab01.alm.brand.dk/advanced-analytics/for_p_afgang_dashboard"
__description__ = "explainer dashboard for for_p_afgang model performance"
__author__ = "Niels Møller-Hansen"
__email__ = "abnimo@almbrand.dk"
# fmt: on

logger = logging.getLogger("api_logger")
file_path = Path(__file__)

def create_app(config):
    logger.info("Starting app...")
    logger.debug(f"Using config {config}")
    app = Flask("for_p_afgang_dashboard")
    app.config.from_object(config)

    @app.route("/health")
    def healthcheck():
        return "Healthy", 200

    setup_extensions(app)

    dashboard_yaml_path = file_path.parent.joinpath("dashboard.yaml")
    explainerfile = str(file_path.parent.joinpath("data").joinpath("explainer.joblib"))
    logger.info(explainerfile)
    config = yaml.safe_load(open(dashboard_yaml_path, "r"))
    params = config["dashboard"]["params"]
    explainer = ClassifierExplainer.from_file(explainerfile)
    print("X:", len(explainer.X))
    logger.info(f"Explainer contains {len(explainer.X)} samples")
    dashboard = ExplainerDashboard(
        explainer, server=app, url_base_pathname="/", **params
    )
    print(list(dashboard.app.callback_map.values()))

    @app.route("/")
    def return_dashboard():
        return dashboard.app.index()

    logger.info("Explainer dashboard loaded")
    return app
oegedijk commented 3 years ago

Ah, I think I got it!

In the yaml I see:

tabs:
    #- importances
    #- model_summary
    - contributions
    - whatif
    - shap_dependence
    #- shap_interaction
    #- decision_trees

So that equates to ExplainerDashboard(explainer, ["contributions", "whatif", "shap_dependence"]).

The string tab indicators get converted by

https://github.com/oegedijk/explainerdashboard/blob/080597aa2d2f13308ffaec9fac110e5f21616d5a/explainerdashboard/dashboards.py#L670

def _convert_str_tabs(self, component):
        if isinstance(component, str):
            if component == 'importances':
                return ImportancesTab
            elif component == 'model_summary':
                return ModelSummaryTab
            elif component == 'contributions':
                return ContributionsTab
            elif component == 'whatif':
                return WhatIfTab
            elif component == 'shap_dependence':
                return ShapDependenceTab
            elif component == 'shap_interaction':
                return ShapInteractionsTab
            elif component == 'decision_trees':
                return  DecisionTreesTab
        return component

These ImportancesTab, ModelSummaryTab, have actually been deprecated. They are only there for backward compatibility reasons: they have been deprecated in favor of ImportancesComposite, etc, but I had not adjusted this helper method. So I will fix this in the next release, but in the meanwhile, I think if you change dashboard.yaml to:

dashboard:
  explainerfile: data/processed/explainer.joblib
  params:
    title: Fastholdelses model
    hide_header: false
    hide_shapsummary: false
    header_hide_title: false
    header_hide_selector: false
    block_selector_callbacks: false
    pos_label: null
    fluid: true
    mode: dash
    width: 1000
    height: 800
    external_stylesheets: null
#    server: true
#    url_base_pathname: null
    responsive: true
    logins: null
    port: 8050
    importances: false
    model_summary: false
    shap_interaction: false
    decision_trees: false

So this is equivalent of passing booleans to switch off tabs: ExplainerDashboard(explainer, importances=False, model_summary=False, shap_interaction=False, decision_trees=False)

oegedijk commented 3 years ago

Just released https://github.com/oegedijk/explainerdashboard/releases/tag/v0.2.20 which should fix this issue...

oegedijk commented 3 years ago

I think you can also simplify the loading of the dashboard:

def create_app(config):
    logger.info("Starting app...")
    logger.debug(f"Using config {config}")
    app = Flask("for_p_afgang_dashboard")
    app.config.from_object(config)

    @app.route("/health")
    def healthcheck():
        return "Healthy", 200

    setup_extensions(app)

    explainerfile = str(file_path.parent.joinpath("data").joinpath("explainer.joblib"))
    dashboard_yaml_path = file_path.parent.joinpath("dashboard.yaml")
    logger.info(explainerfile)

    dashboard = ExplainerDashboard.from_config(
        explainerfile , dashboard_yaml_path, server=app, url_base_pathname="/")
    logger.info(f"Explainer contains {len(dashboard.explainer)} samples")
    print(list(dashboard.app.callback_map.values()))

    @app.route("/")
    def return_dashboard():
        return dashboard.app.index()

    logger.info("Explainer dashboard loaded")
    return app
moeller84 commented 3 years ago

i updated to the latest version and also altered the .yaml file. That leaves me with this error (having touched anything else):

Traceback (most recent call last):
  File "/home/niels/.pyenv/versions/3.7.9/envs/for_p_afgang_dashboard/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/home/niels/.pyenv/versions/3.7.9/envs/for_p_afgang_dashboard/lib/python3.7/site-packages/flask/cli.py", line 184, in find_app_by_string
    app = call_factory(script_info, attr, args)
  File "/home/niels/.pyenv/versions/3.7.9/envs/for_p_afgang_dashboard/lib/python3.7/site-packages/flask/cli.py", line 115, in call_factory
    return app_factory(*arguments)
  File "/home/niels/projektmappe/for_p_afgang_dashboard/app/for_p_afgang_dashboard/__init__.py", line 43, in create_app
    explainer, server=app, url_base_pathname="/", **params
  File "/home/niels/.pyenv/versions/3.7.9/envs/for_p_afgang_dashboard/lib/python3.7/site-packages/explainerdashboard/dashboards.py", line 465, in __init__
    fluid=fluid))
  File "/home/niels/.pyenv/versions/3.7.9/envs/for_p_afgang_dashboard/lib/python3.7/site-packages/explainerdashboard/dashboards.py", line 88, in __init__
    self.tabs  = [instantiate_component(tab, explainer, name=str(i+1), **kwargs) for i, tab in enumerate(tabs)]
  File "/home/niels/.pyenv/versions/3.7.9/envs/for_p_afgang_dashboard/lib/python3.7/site-packages/explainerdashboard/dashboards.py", line 88, in <listcomp>
    self.tabs  = [instantiate_component(tab, explainer, name=str(i+1), **kwargs) for i, tab in enumerate(tabs)]
  File "/home/niels/.pyenv/versions/3.7.9/envs/for_p_afgang_dashboard/lib/python3.7/site-packages/explainerdashboard/dashboard_methods.py", line 431, in instantiate_component
    component = component(explainer, name=name, **kwargs)
  File "/home/niels/.pyenv/versions/3.7.9/envs/for_p_afgang_dashboard/lib/python3.7/site-packages/explainerdashboard/dashboard_components/composites.py", line 271, in __init__
    hide_selector=hide_selector, **kwargs)
  File "/home/niels/.pyenv/versions/3.7.9/envs/for_p_afgang_dashboard/lib/python3.7/site-packages/explainerdashboard/dashboard_components/shap_components.py", line 1027, in __init__
    if not self.explainer.onehot_cols:
AttributeError: 'XGBClassifierExplainer' object has no attribute 'onehot_cols'
oegedijk commented 3 years ago

ah, yeah, you have to rebuild the explainer with the new version: I made some breaking changes how categorical features and one hot encoded features are handled internally in order to support categorical features. (on the plus side: categorical features are supported now!)

carlryn commented 3 years ago

Is there a reason why you are using UUIDs in the first place? Thinking you could just set seed and do randomization with numbers to get deterministic names.

E.g line 177 in dashboard_methods.py if not hasattr(self, "name") or self.name is None: self.name = name or "uuid"+shortuuid.ShortUUID().random(length=5)

oegedijk commented 3 years ago

Original goal was to generate a unique name that is both short and url-friendly (planning on adding querystring support at some point). But I guess that could be done simpler and without the shortuuid dependency, e.g.: https://proinsias.github.io/til/Python-UUID-generate-random-but-reproducible-with-seed/

Got a code suggestion?

oegedijk commented 3 years ago

Is it working now? Shall I close the issue?

carlryn commented 3 years ago

This seems to be working now! Ran with several workers on gunicorn and also saved callback id names which all matches.

oegedijk commented 3 years ago

Awesome!