oegedijk / explainerdashboard

Quickly build Explainable AI dashboards that show the inner workings of so-called "blackbox" machine learning models.
http://explainerdashboard.readthedocs.io
MIT License
2.3k stars 332 forks source link

Feature : Export in plain .html + javascript #90

Closed arita37 closed 3 years ago

arita37 commented 3 years ago

Is there a way to export in static HTML (ie with limited features) directly. Sometimes. we want to store as part of the model training info.

thansk

oegedijk commented 3 years ago

Hi @arita37 I agree that that would be extremely cool. I looked into saving dash apps to .pdf but there is no straightforward way of doing it with the open source version it seems. (the paid corporate version does have this as a feature).

However you could generate the relevant plots from the explainer directly, e.g.:

fig = explainer.plot_importances()

And then export the resulting fig to a static image: https://plotly.com/python/static-image-export/

Would then be cool to combine multiple figures to a single pdf. If you could get something like that to work would be happy to help integrating it to the library!

arita37 commented 3 years ago

A small hack would be :

Launching the server and saving using chrome.....

On Feb 24, 2021, at 0:57, Oege Dijk notifications@github.com wrote:

 Hi @arita37 I agree that that would be extremely cool. I looked into saving dash apps to .pdf but there is no straightforward way of doing it with the open source version it seems. (the paid corporate version does have this as a feature).

However you could generate the relevant plots from the explainer directly, e.g.:

fig = explainer.plot_importances() And then export the resulting fig to a static image: https://plotly.com/python/static-image-export/

Would then be cool to combine multiple figures to a single pdf. If you could get something like that to work would be happy to help integrating it to the library!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

oegedijk commented 3 years ago

Would you be able to automate that? In that case, yes!

oegedijk commented 3 years ago

Guess, maybe with selenium you could do something like that?

psmgeelen commented 3 years ago

Hi there, I would love to have these features as a standalone HTML file. How can I support this?

psmgeelen commented 3 years ago

@oegedijk , does it make sense for me to commit to this?

oegedijk commented 3 years ago

Hi @psmgeelen, yeah this would be a very useful feature if we could add it!

I looked into it some time ago, and then my impression was that it would be quite complicated to support. There is support for this in dash enterprise, but not in the open source version.

I think you can export plotly figures to to .html: https://plotly.com/python/interactive-html-export/

So would be a matter of somehow exporting the other elements as well. Not sure how difficult that would be, especially since the library heavily uses dash-bootstrap-components as well.

If this sounds like a fun project you could first try to get it to work with a simple dash-bootstrap-components demo and then we can see if we can extend it to more complex dashboards.

psmgeelen commented 3 years ago

Lets make it happen!

psmgeelen commented 3 years ago

@oegedijk , am I doing something wrong? I cloned the repo and setup an environment with requirements_testing.txt. The tests didnt run at first. I installed additional dependencies like jupyter_dash and I still had tests failing (46 out of 416).. The tests that fail come from the

The error seems to be consistent:

FileNotFoundError: [Errno 2] No such file or directory: '/home/ComputerName/PycharmProjects/explainerdashboard/tests/tests/test_assets/explainer.yaml'

My first question is: how well designed are the tests atm? I am surprised to have the amount of issues that I am having, but this can also relate to the fact that I prefer to use conda and PyCharm. The second question: Is this a known error? Am I doing something fundamentally wrong here?

Regards

oegedijk commented 3 years ago

Ah, that's weird. The test_assets/explainer.yaml file should be generated during the test itself.

Probably main reasons that these tests fail is that you need to install chromedriver to run the tests: https://chromedriver.chromium.org/getting-started

Alternatively you can just fork, submit a PR and let the tests run on github.

oegedijk commented 3 years ago

So from this discussion: https://github.com/plotly/dash/issues/145

It seems that the way to do this probably is to first render the page using a browser and then save as html. (some of the html gets dynamically rendered on the client side, so hard to generate precisely on the server side). Could try to automate this with selenium.

Not a super satisfying solution though.

Another solution could be a specific static dashboard layout option, where you know the html layout exactly and you only need to export the plotly figures to html and include them.

psmgeelen commented 3 years ago

So from this discussion: plotly/dash#145

It seems that the way to do this probably is to first render the page using a browser and then save as html. (some of the html gets dynamically rendered on the client side, so hard to generate precisely on the server side). Could try to automate this with selenium.

Not a super satisfying solution though.

Another solution could be a specific static dashboard layout option, where you know the html layout exactly and you only need to export the plotly figures to html and include them.

Thanks!

psmgeelen commented 3 years ago

Ah, that's weird. The test_assets/explainer.yaml file should be generated during the test itself.

Probably main reasons that these tests fail is that you need to install chromedriver to run the tests: https://chromedriver.chromium.org/getting-started

Alternatively you can just fork, submit a PR and let the tests run on github.

Then that's how we do it. I am hoping to have something by the end of this week.

psmgeelen commented 3 years ago

OK, so here are some preliminary results:

I am able to store and read the HTML file. If I use an approach like below, it read the html, and stores this as it should. The problem however is that it doesnt store the 'mechanics' of the website. It seems that it is a literal copy of the render that is made within the browser. The approach was tested as follows:

import time
from selenium import webdriver
from selenium.webdriver.chrome.service import Service

service = Service('/home/pieter/Downloads/chromedriver')
service.start()
driver = webdriver.Remote(service.service_url)
driver.get('http://0.0.0.0:8050/');

time.sleep(5)

with open('test.html', 'w') as f:
    f.write(driver.page_source)

driver.quit()

This probably has to do with the 'get' method within Selenium.

These results (or lack of), made me regression test the basic principle in which we aim to copy the dashboard; which is an automated approach to physically opening the browser and storing the website. So I went back and did exactly that, I opened the dashboard in Chrome and stored it locally. The result seemed good at first. It stores a HTML file and creates a respective folder with some JS files in there. So far so good... When I opened the local HTML file, I was happily surprised, seeing the exact dashboard. This lasted a split second, as the screen changed into a basic loading screen, displaying the literal loading... and nothing else.

My concrete questions are: Was this behaviour known? Does anyone know a fix for this? Am I doing something wrong here?

Thanks for considering my questions Regards

PS: This is the script I use to get a dashboard going:

from sklearn.ensemble import RandomForestClassifier
from explainerdashboard import ClassifierExplainer, ExplainerDashboard
from explainerdashboard.datasets import titanic_survive, titanic_names

feature_descriptions = {
    "Sex": "Gender of passenger",
    "Gender": "Gender of passenger",
    "Deck": "The deck the passenger had their cabin on",
    "PassengerClass": "The class of the ticket: 1st, 2nd or 3rd class",
    "Fare": "The amount of money people paid", 
    "Embarked": "the port where the passenger boarded the Titanic. Either Southampton, Cherbourg or Queenstown",
    "Age": "Age of the passenger",
    "No_of_siblings_plus_spouses_on_board": "The sum of the number of siblings plus the number of spouses on board",
    "No_of_parents_plus_children_on_board" : "The sum of the number of parents plus the number of children on board",
}

X_train, y_train, X_test, y_test = titanic_survive()
train_names, test_names = titanic_names()
model = RandomForestClassifier(n_estimators=50, max_depth=5)
model.fit(X_train, y_train)

explainer = ClassifierExplainer(model, X_test, y_test, 
                                cats=['Deck', 'Embarked',
                                    {'Gender': ['Sex_male', 'Sex_female', 'Sex_nan']}],
                                cats_notencoded={'Embarked': 'Stowaway'}, # defaults to 'NOT_ENCODED'
                                descriptions=feature_descriptions, # adds a table and hover labels to dashboard
                                labels=['Not survived', 'Survived'], # defaults to ['0', '1', etc]
                                idxs = test_names, # defaults to X.index
                                index_name = "Passenger", # defaults to X.index.name
                                target = "Survival", # defaults to y.name
                                )

db = ExplainerDashboard(explainer, 
                        title="Titanic Explainer", # defaults to "Model Explainer"
                        shap_interaction=False, # you can switch off tabs with bools
                        )
db.run(port=8050)
oegedijk commented 3 years ago

Yeah, the mechanics will be lost in any case when you do a static export. The interactivity of a dash dashboard happens on the server side (although there are also client side callbacks now, but that requires writing javascript), so once you export it you lose that interactivity. Only the basic plot interactivity (hover overs, etc) will remain.

So the idea would be that you would 'freeze' the dashboard in place, and build a static website based on that, that could more easily be shared. Another cool feature would be do do an automated PDF export.

I'm not that familiar with the underlying mechanics of dash, but I'm guessing there is some javascript in there that makes an initial call to the server for the layout and displayes"loading..." while it is waiting for a response. So you would have to rip out that piece of javascript.

So yeah, as I said, don't think this is going to be trivial :)

psmgeelen commented 3 years ago

Awesome, step by step we go. I am going to see how I can rip the 'loading..' part out of the JS and see how we can go from there. Hopefully by tomorrow I will have more.

oegedijk commented 3 years ago

So one way about it is to first focus on the simple layout: https://explainerdashboard.readthedocs.io/en/latest/tabs.html#simplifiedclassifiercomposite

or:

ExplainerDashboard(explainer, simple=True)

Take the html from that as a scaffold and then insert the fig.to_html() into them.

oegedijk commented 3 years ago

That way you can avoid using selenium (you only have to use it once to generate a template)

psmgeelen commented 3 years ago

That probably alleviates alot of the issues

psmgeelen commented 3 years ago

So I have made a few steps and did some more research overall into the matter. Dash is fundamentally designed around a server to manage client requests (like clicking on some visual by which the visual would change) in order to assure performance. This is fair, as we all might have had the experience of loading in a 10 MB offline plotly html render that took ages. I also found that the co-founder of Dash is not interested in looking into any offline capability for Dash.

I think we are basically stuck at a performance cross-roads, where offline would be nice, but is not necessarily achievable with modern-day technologies like HTML5 and JS. On the other hand we can consider the issue in a functional manner. Dash is not able to run offline and nor will it ever. The interactive nature of ExplainerDashboards is probably not achievable without a small server running in the background.

I think the point of the matter is portability. I understood that the deployment through heroku is possible and common-place. My proposal would be that we make a heroku container, wrapped in some kind of executable, so that even idiots can easily open them. Would this be interesting for you as well? @oegedijk and @arita37 ? I am thinking that the point of SHAP and ExAi is that it can be shared throughout, to make ML performance communicable; hence a wrapper that is able to start up the heroku container and opens a browser would be ideal, I'd say.

Let me know whether I am making the right assumptions and whether this route is interesting for you guys!

Thanks for you consideration!

oegedijk commented 3 years ago

Sorry for taking some time to get back to you: was on holiday. But while on holiday I had some time to think about this and think I have a possible solution, although it will require some work to get it done.

As a demo here's an example of downloading a simple dashboard with only a confusion matrix to html:

from explainerdashboard.custom import *

class CustomDashboard(ExplainerComponent):
    def __init__(self, explainer, name=None):
        super().__init__(explainer, title="Custom Dashboard")
        self.confusion = ConfusionMatrixComponent(explainer)

    def layout(self):
        return dbc.Container([
            dbc.Row([
                dbc.Col([
                    html.H1("Download .html demo"),
                    dbc.Button("download .html", id='download-button'),
                    dcc.Download(id='download-html'),
                ])
            ]),
            dbc.Row([
                dbc.Col([
                    self.confusion.layout(),
                ]),
            ])
        ])

    def component_callbacks(self, app):
        @app.callback(
            Output('download-html', 'data'),
            Input('download-button', 'n_clicks'),
            State('confusionmatrix-cutoff-'+self.confusion.name, 'value'),
            State('confusionmatrix-percentage-'+self.confusion.name, 'value'),
            State('confusionmatrix-binary-'+self.confusion.name, 'value'),
            State('pos-label-'+self.confusion.name, 'value')
        )
        def download_html(n_clicks, cutoff, normalized, binary, pos_label):
            if n_clicks is not None:
                fig = self.explainer.plot_confusion_matrix(
                            cutoff=cutoff, normalized=bool(normalized), 
                            binary=bool(binary), pos_label=pos_label)
                html = f"<html><H1>DOWNLOAD TEST TITLE</H1><div>{fig.to_html()}<div></html>"
                return dict(content=html, filename="dashboard.html")
            raise PreventUpdate

db = ExplainerDashboard(explainer, CustomDashboard, hide_header=True)
db.run()

What we could then do is add a .to_html() method to every ExplainerComponent that returns the html for that component. Components with subcomponents would have to define the define the bootstrap scaffolding and then add the .to_html() output of the subcomponents. The tricky thing is making this dependent on the state of the component as in the above component. You would need to write a callback with all the State(id, prop) of each subcomponent and somehow route these to the right component.

oegedijk commented 3 years ago

So something like this:

from explainerdashboard.custom import *

class ConfusionMatrixComponent(ExplainerComponent):

    _state_props = [('confusionmatrix-cutoff-', 'value'),
             ('confusionmatrix-percentage-', 'value'),
             ('confusionmatrix-binary-', 'value'),
             ('pos-label-', 'value')]

    def __init__(self, explainer, title="Confusion Matrix", name=None,
                    subtitle="How many false positives and false negatives?",
                    hide_title=False, hide_subtitle=False, hide_footer=False,
                    hide_cutoff=False, hide_percentage=False, hide_binary=False,
                    hide_selector=False, hide_popout=False, pos_label=None,
                    cutoff=0.5, percentage=True, binary=True, description=None,
                    **kwargs):
        """Display confusion matrix component

        Args:
            explainer (Explainer): explainer object constructed with either
                        ClassifierExplainer() or RegressionExplainer()
            title (str, optional): Title of tab or page. Defaults to 
                        "Confusion Matrix".
            name (str, optional): unique name to add to Component elements. 
                        If None then random uuid is generated to make sure 
                        it's unique. Defaults to None.
            subtitle (str): subtitle
            hide_title (bool, optional): hide title.
            hide_subtitle (bool, optional): Hide subtitle. Defaults to False.
            hide_footer (bool, optional): hide the footer at the bottom of the component
            hide_cutoff (bool, optional): Hide cutoff slider. Defaults to False.
            hide_percentage (bool, optional): Hide percentage toggle. Defaults to False.
            hide_binary (bool, optional): Hide binary toggle. Defaults to False.
            hide_selector(bool, optional): hide pos label selector. Defaults to False.
            hide_popout (bool, optional): hide popout button. Defaults to False.
            pos_label ({int, str}, optional): initial pos label. Defaults to explainer.pos_label
            cutoff (float, optional): Default cutoff. Defaults to 0.5.
            percentage (bool, optional): Display percentages instead of counts. Defaults to True.
            binary (bool, optional): Show binary instead of multiclass confusion matrix. Defaults to True.
            description (str, optional): Tooltip to display when hover over
                component title. When None default text is shown. 
        """
        super().__init__(explainer, title, name)

        self.cutoff_name = 'confusionmatrix-cutoff-' + self.name

        if len(self.explainer.labels) <= 2:
            self.hide_binary = True

        if self.description is None: self.description = """
        The confusion matrix shows the number of true negatives (predicted negative, observed negative), 
        true positives (predicted positive, observed positive), 
        false negatives (predicted negative, but observed positive) and
        false positives (predicted positive, but observed negative). The amount
        of false negatives and false positives determine the costs of deploying
        and imperfect model. For different cutoffs you will get a different number
        of false positives and false negatives. This plot can help you select
        the optimal cutoff.
        """

        self.selector = PosLabelSelector(explainer, name=self.name, pos_label=pos_label)
        self.popout = GraphPopout('confusionmatrix-'+self.name+'popout', 'confusionmatrix-graph-'+self.name, 
                            self.title, self.description)
        self.register_dependencies("preds", "pred_probas", "pred_percentiles", "confusion_matrix")

    def layout(self):
        return dbc.Card([
            make_hideable(
                dbc.CardHeader([
                    html.Div([
                        html.H3(self.title, id='confusionmatrix-title-'+self.name),
                        make_hideable(html.H6(self.subtitle, className='card-subtitle'), hide=self.hide_subtitle),
                        dbc.Tooltip(self.description, target='confusionmatrix-title-'+self.name),
                    ]), 
                ]), hide=self.hide_title),
            dbc.CardBody([
                dbc.Row([
                    make_hideable(
                        dbc.Col([self.selector.layout()], width=3), hide=self.hide_selector)
                ], justify="end"),
                dcc.Graph(id='confusionmatrix-graph-'+self.name,
                                config=dict(modeBarButtons=[['toImage']], displaylogo=False)),
                dbc.Row([
                    make_hideable(
                        dbc.Col([
                            self.popout.layout()
                        ], md=2, align="start"), hide=self.hide_popout),
                ], justify="end"),
            ]),
            make_hideable(
            dbc.CardFooter([
                make_hideable(
                    html.Div([
                    html.Div([
                        html.Label('Cutoff prediction probability:'),
                        dcc.Slider(id='confusionmatrix-cutoff-'+self.name, 
                                    min = 0.01, max = 0.99, step=0.01, value=self.cutoff,
                                    marks={0.01: '0.01', 0.25: '0.25', 0.50: '0.50',
                                            0.75: '0.75', 0.99: '0.99'}, 
                                    included=False,
                                    tooltip = {'always_visible' : False},
                                    updatemode='drag'),
                    ], id='confusionmatrix-cutoff-div-'+self.name),
                    dbc.Tooltip(f"Scores above this cutoff will be labeled positive",
                                    target='confusionmatrix-cutoff-div-'+self.name,
                                    placement='bottom'),
                    ], style={'margin-bottom': 25}), hide=self.hide_cutoff),
                make_hideable(
                    html.Div([
                        dbc.FormGroup([
                            #dbc.Label("Percentage:", id='confusionmatrix-percentage-label-'+self.name),
                            dbc.Tooltip("Highlight the percentage in each cell instead of the absolute numbers",
                                    target='confusionmatrix-percentage-'+self.name),
                            dbc.Checklist(
                                options=[{"label":  "Highlight percentage", "value": True}],
                                value=[True] if self.percentage else [],
                                id='confusionmatrix-percentage-'+self.name,
                                inline=True,
                                switch=True,
                            ),
                        ]),
                    ]), hide=self.hide_percentage),
                make_hideable(
                    html.Div([
                        dbc.FormGroup([
                            dbc.Label("Binary:", id='confusionmatrix-binary-label-'+self.name),
                            dbc.Tooltip("display a binary confusion matrix of positive "
                                            "class vs all other classes instead of a multi"
                                            " class confusion matrix.",
                                        target="confusionmatrix-binary-label-"+self.name),
                            dbc.Checklist(
                                options=[{"label":  "Display one-vs-rest matrix", "value": True}],
                                value=[True] if self.binary else [],
                                id='confusionmatrix-binary-'+self.name,
                                inline=True,
                                switch=True,
                            ),
                        ]),
                    ]), hide=self.hide_binary),
            ]), hide=self.hide_footer)
        ])

    def component_callbacks(self, app):
        @app.callback(
             Output('confusionmatrix-graph-'+self.name, 'figure'),
            [Input('confusionmatrix-cutoff-'+self.name, 'value'),
             Input('confusionmatrix-percentage-'+self.name, 'value'),
             Input('confusionmatrix-binary-'+self.name, 'value'),
             Input('pos-label-'+self.name, 'value')],
        )
        def update_confusionmatrix_graph(cutoff, normalized, binary, pos_label):
            return self.explainer.plot_confusion_matrix(
                        cutoff=cutoff, normalized=bool(normalized), 
                        binary=bool(binary), pos_label=pos_label)

    def _get_state_props(self):
        return [(id_+self.name, prop_) for id_, prop_ in self._state_props]

    def to_html(self, state_dict):
        kwargs = dict(
            cutoff=('confusionmatrix-cutoff-'+self.name, 'value'),
            normalized=('confusionmatrix-percentage-'+self.name, 'value'),
            binary=('confusionmatrix-binary-'+self.name, 'value')
        )
        kwargs = {k:state_dict[v] for k, v in kwargs.items() if v in state_dict}
        if 'binary' in kwargs:
            kwargs['binary'] = bool(kwargs['binary'])

        fig = self.explainer.plot_confusion_matrix(**kwargs)
        html = f"<div>{fig.to_html()}<div>"
        return html

class CustomDashboard(ExplainerComponent):

    def __init__(self, explainer, name=None):
        super().__init__(explainer, title="Downloadable Dashboard")
        self.confusion = ConfusionMatrixComponent(explainer)

    def layout(self):
        return dbc.Container([
            dbc.Row([
                dbc.Col([
                    html.H1("Download .html demo"),
                    dbc.Button("download .html", id='download-button'),
                    dcc.Download(id='download-html'),
                ])
            ]),
            dbc.Row([
                dbc.Col([
                    self.confusion.layout(),
                ]),
            ])
        ])

    def component_callbacks(self, app):
        @app.callback(
            Output('download-html', 'data'),
            [Input('download-button', 'n_clicks')],
            [State(id_, prop_) for id_, prop_ in self.confusion._get_state_props()]
        )
        def download_html(*args):
            if args[0] is not None:
                state_dict = dict(zip(self.confusion._get_state_props(), args[1:]))
                html = f"<html><H1>DOWNLOAD TEST TITLE</H1><div>{self.confusion.to_html(state_dict)}<div></html>"
                return dict(content=html, filename="dashboard.html")
            raise PreventUpdate

db = ExplainerDashboard(explainer, CustomDashboard, hide_header=True)
db.run()

So then would just have to add a recursive function to the ExplainerComponent class that recursively collects all the _state_props attributes, and add to_html() methods to all the existing ExplainerComponents (and tabs, etc)...

What do you think of this approach? You would still lose the interactivity, but you would be able to export a single .html file that you could easily share with people...

psmgeelen commented 3 years ago

Hi @oegedijk , I think this definitely can work. My issue is a little academic too. I think that the interaction is fundamental to the understanding of algorithms. In other words, the interactivity, in my opinion, makes the explainability viable in the first place. So I would love to keep the interactivity. I also realise that this means that you need a server, because otherwise there is simply too much information. For now I am checking out the explainer-hub and the docker options that you have already developed quiet nicely. Thanks for all your efforts, you are changing the landscape of ML!

oegedijk commented 3 years ago

Alright @arita37 and @psmgeelen, it has been released. Check out the lastest version.

You can do static html export both directly from the dashboard itself with the new download link in the header, and from a dashboard or component directly, e.g. dashboard.save_html("dashboard.html")

psmgeelen commented 3 years ago

@oegedijk , that awesome. As discussed, it does make the dashboards static, but I think that this is a great middle-ground. Thanks for your efforts!