dash_duo does not work with All-in-One components

lucassus commented 2 years ago

dash                      2.1.0
dash-bootstrap-components 1.0.3
dash-core-components      2.0.0
dash-html-components      2.0.0
dash-table                5.0.0
pytest                    7.0.1

Describe the bug

It seems that dash testing utils don't work property with All-in-One components. For instance, given the following test scenario:

import uuid
from typing import Optional

import dash
from dash import MATCH, Input, Output, callback, html

class SomeButton(html.Button):
    class ids:
        button = lambda aio_id: {"component": "SomeButton", "aio_id": aio_id}

    def __init__(self, aio_id: Optional[str] = None):
        if aio_id is None:
            aio_id = str(uuid.uuid4())

        super().__init__(
            "Loading...",
            id=self.ids.button(aio_id),
            disabled=True,
        )

    @callback(
        Input(ids.button(MATCH), "children"),
        output=dict(
            label=Output(ids.button(MATCH), "children"),
            disabled=Output(ids.button(MATCH), "disabled"),
        ),
    )
    def update_button(_):
        return dict(label="Loaded", disabled=False)

def test_some_button_first_time(dash_duo: dash.testing.composite.DashComposite):
    app = dash.Dash("foo123")
    app.layout = html.Div(SomeButton(), id="wrapper")

    dash_duo.start_server(app)
    dash_duo.wait_for_contains_text("#wrapper", "Loaded")

def test_some_button_second_time(dash_duo: dash.testing.composite.DashComposite):
    app = dash.Dash("bar123")
    app.layout = html.Div(SomeButton(), id="wrapper")

    dash_duo.start_server(app)
    dash_duo.wait_for_contains_text("#wrapper", "Loaded")

The first run test_some_button_first_time is perfectly fine. But test_some_button_second_time results with selenium.common.exceptions.TimeoutException: Message: text -> Loaded not found inside element within 10s error. I ran it with --pause option and in the DevTools I noticed that for the second test the update callback _dash-update-component is not being triggered (see the attached screenshots).

Moreover, if I run test_some_button_second_time separately or if I comment out the first test everything is fine. For some reason callbacks are fired only for the first test.

Expected behavior

It seems that dash_duo does not work with @callback for all-in-one components.

Screenshots

CleanShot 2022-02-17 at 17 25 05

CleanShot 2022-02-17 at 17 26 26

lucassus commented 2 years ago

It works fine with in-lined components in test functions, see https://gist.github.com/lucassus/32256b4cc4587d3188d56c3a83ca0bf4

lucassus commented 2 years ago

Update: It works fine when I used a shared dash app instance, see https://gist.github.com/lucassus/892e593366e04ca207722c73af6a34f2

T4rk1n commented 2 years ago

This is a bug with dash.callback, when the server is setup, it removes the callbacks from the global callbacks dictionary to insert them in the dash server callback map. So when you start the second server in the test, the callback doesn't exist in the global map and is not registered.

https://github.com/plotly/dash/blob/6ee23288513d2251d3c972dcef836da1094eb6ec/dash/dash.py#L1378-L1381

MrTeale commented 2 years ago

@T4rk1n - What do you think a solution would be for this? If you have something in mind, I'd be happy to give it a go at implementing it.

My initial thoughts was to duplicate the GLOBAL_CALLBACK_LIST after it is cleaned of duplicates but before it is cleared (As shown in the above snippet) but unsure of the performance implications of this.

T4rk1n commented 2 years ago

@MrTeale

There could be another callback map, something like ORIGINAL_CALLBACK_MAP, inserting the callbacks into it at the same time as the app callback_map at line 1378. Then you can take the difference of the the original callback and the app callback map and insert those missing.

alexcjohnson commented 2 years ago

This is a tricky one. I'm looking at #2062 removing the code that clears the global stores, but I'll comment here so we can decide on the right direction to go. For running an app normally (not during tests) I don't think it matters either way - the only case this would affect is multiple dash apps running side-by-side in the same Python process. That's a discouraged pattern where you just shouldn't be using dash.callback anyway because it's unclear which app the callback should be attached to. I suppose making the change proposed in #2062 would allow AIO components with multiple apps, provided you use app.callback everywhere else - but again, this pattern isn't a high priority.

During testing it's more complicated, at least in principle, because we're repeatedly creating and destroying apps, and we don't want these apps affecting each other. If we think dash.callback is exactly equivalent to app.callback, we should be able to replace all app.callback calls in our test suite with dash.callback. And as it stands, with Dash clearing the global stores when a callback has been taken by an app, that would work. But after #2062 we couldn't do that, each test app would add callbacks to all the tests that run after it and at some point they'd conflict with each other. I guess the point is the callbacks defined in AIO components are qualitatively different from other callbacks, in that they're intended to be available for any app to use; whereas other callbacks defined with dash.callback, even pattern-matching callbacks, are intended for use just within one app.

I can think of two directions to go with this:

Go ahead with #2062 as is, but then if and when we want to start using dash.callback inside dash_duo tests we stash the state of the global stores when dash_duo is initializing a test, and restore it to that state when the test is tearing down. That way any dash.callback calls inside the test case only apply to that case, but those at the global scope apply everywhere. The concern I have with doing it that way is in most tests we wouldn't actually use these AIO callbacks, they'd just be sitting there doing nothing (though I guess there's some value in that too, ensure that they don't break anything just by being present)
A different pattern for testing AIO components: import the AIO component and then immediately clear the global stores to get them back to the base state. In each test that uses an AIO component, reload the module that defines it: from importlib import reload; reload(my_aio_module). AFAICT from some brief testing, this successfully re-adds the callbacks, though if the AIO component had a submodule that defined callbacks it would be necessary to explicitly reload that too.
- A variant of this pattern that might be more robust but reaches farther into the internals of dash: after importing the AIO component, pull its callbacks out into a local stash (removing them from the global stores) and put them back into the global stores at the beginning of each test that needs them. We could make this a dash.testing util so users don't need to directly touch the global stores, something like:
```
from dash.testing import stash_callbacks
from my_aio_component import SomeButton
my_aio_callbacks = stash_callbacks()
```

def test_my_aio_component(dash_duo): my_aio_callbacks.apply() app = Dash() app.layout = html.Div(SomeButton())


In this scenario, clearing the global store after loading an AIO component and before any test case runs is important because otherwise the first test case will get these callbacks no matter what. If that first test case happens to also reimport this component, these callbacks will be duplicated and there will be an error.

---

Curious what @T4rk1n and @AnnMarieW think, but I guess after all that I'm inclined toward the first option: merge #2062 as is, then at some point modify `dash_duo` to allow case-scoped `dash.callback` calls.

AnnMarieW commented 2 years ago

@alexcjohnson It looks like this was added to make the long_callback tests pass. Will this become a problem again if it's removed? https://github.com/plotly/dash/pull/1718#issuecomment-902729842

The last scenario looks like a nice solution. :+1:

alexcjohnson commented 2 years ago

Ah, thanks @AnnMarieW - I can't see the specific failures over there anymore, and the tests in #2062 do pass, but maybe just because of how they test files are currently being partitioned for parallel execution, or the order the tests happen to run. We do have a dash.callback test with IDs that match those in various other tests, so in principle seems like this could break at any time. To make sure whatever we implement doesn't break this, let's run the existing dash.callback test twice like this -> https://github.com/plotly/dash/commit/f1a479655a32ed88ac7ea9b7d949a6a7e75b9620 (currently fails if I run this locally on the #2062 branch, but it passes if I remove those changes)

I do like the simplicity of "callbacks defined within a test are local to that test, callbacks defined at the top level are global" from the first option, and I bet we can do this fairly easily in dash_duo, probably by stashing and resetting the global stores in __enter__ and __exit__ here

alexcjohnson commented 2 years ago

Also GLOBAL_INLINE_SCRIPTS should be handled similarly to the other two:

https://github.com/plotly/dash/blob/34fd84879ef6844760151340ce19599f654e9446/dash/dash.py#L1491

T4rk1n commented 2 years ago

I thought it was clearing the callbacks for tests where the app is restarted, maybe the tests got refactored but I think there is a valid use case. If there is no clear, when you define an app globally and use the app in parameterized tests, the server will fail to start since it will have duplicate callback.

Example:

import pytest

from dash import Dash, html, callback, Input, Output

app = Dash(__name__)

app.layout = html.Div([
    html.Button("click", id="input"),
    html.Div(id="output")
])

@callback(Output("output", "children"), Input("input", "n_clicks"))
def on_click(n_clicks):
    return n_clicks

@pytest.mark.parametrize("counter", [(1,), (2,), (3,)])
def test_usage_params(dash_duo, counter):
    dash_duo.start_server(app)
    dash_duo.wait_for_element("#input").click()
    dash_duo.wait_for_text_to_equal("#output", "1")

alexcjohnson commented 2 years ago

~Yes that’s exactly the case covered in https://github.com/plotly/dash/commit/f1a479655a32ed88ac7ea9b7d949a6a7e75b9620 - that I thought we could handle within dash_duo so callbacks within the test case would be cleared but those defined outside would remain.~

alexcjohnson commented 2 years ago

Oh sorry - you’re saying restarting the same app defined outside the test using dash.callback. That seems like a dangerous pattern…

T4rk1n commented 2 years ago

Oh sorry - you’re saying restarting the same app defined outside the test using dash.callback. That seems like a dangerous pattern…

User apps may be imported and started across multiple tests, but then we have import_app function that doesn't import but exec with runpy that can be used in that case so I guess that is kinda covered.

alexcjohnson commented 2 years ago

User apps may be imported and started across multiple tests

I see, I was thinking about the use case of creating an app purely for test purposes but you're referring to testing a specific app. Yes, that's a very important use case.

One way around that is what we do in our docs (that's the old public repo but still valid) - start the app once (session scope) and only restart the browser with each test:

@pytest.fixture(scope="session")
def doc_server():
    with ProcessRunner() as server:
        server(raw_command="python index.py", port=8060, start_timeout=60)
        yield server

@pytest.fixture
def dash_doc(doc_server, request):
    with Browser(...) as browser:
        browser.server_url = doc_server.url
        yield browser

In addition to avoiding the problem here, this skips the overhead of multiple app starts - huge for big apps like the docs which can take 30 sec to start up, if you're running maybe 100 tests on them! So we could probably do something to make that pattern easier to replicate.

Leaving that aside, how do we handle the three cases here?

App defined globally and run in multiple tests (Make sure we only get one copy of each global callback in all tests)
AIO component defined globally and used in apps in multiple tests (Make sure every app gets the global callbacks)
Callbacks defined within one test case that may conflict with those defined in other tests (Make sure global callbacks defined in the test case are dropped after that test case)

I don't see any one rule that would satisfy all of these, but we can take them one at a time:

If we leave the global stores intact when the app is started (as in #2062, plus GLOBAL_INLINE_SCRIPTS), case (2) is satisfied.
Case (1) we could cover by, inside dash_duo.start_server (probably ThreadedServer.start), inspecting app._callback_list and removing any item that's present in GLOBAL_CALLBACK_LIST. Same with app. _inline_scripts and GLOBAL_INLINE_SCRIPTS. I think it's OK to leave app.callback_map as is, the new ones will overwrite the old ones, right?
Case (3) we could cover as I said above: "by stashing and resetting the global stores in __enter__ and __exit__ here"

MrTeale commented 2 years ago

I don't see any one rule that would satisfy all of these, but we can take them one at a time:

If we leave the global stores intact when the app is started (as in Fixes AIO Callback testing with Dash-Duo #2062, plus GLOBAL_INLINE_SCRIPTS), case (2) is satisfied.

Case (1) we could cover by, inside dash_duo.start_server (probably ThreadedServer.start), inspecting app._callback_list and removing any item that's present in GLOBAL_CALLBACK_LIST. Same with app. _inline_scripts and GLOBAL_INLINE_SCRIPTS. I think it's OK to leave app.callback_map as is, the new ones will overwrite the old ones, right?

Case (3) we could cover as I said above: "by stashing and resetting the global stores in __enter__ and __exit__ here"

@alexcjohnson - I'll give these changes a go in #2062 and see if there are any issues. Reading the above discussions and the different parts that have been effected by this, is there any way to write some tests for this to pick up any future issues changes like this could have?

alexcjohnson commented 2 years ago

is there any way to write some tests for this to pick up any future issues changes like this could have?

I think we'd just want to create some tests covering the three cases I mentioned above. Case 3 (defining callbacks via dash.callback inside a test case) we should be able to include two tests in the regular test suite that, if the callbacks from the first were not removed, would cause the second test to fail. The other two same thing, but these might need to be broken out into a different location and run separately on CI, since we don't want the global state they depend on to influence all the other tests we run.

plotly / dash

dash_duo does not work with All-in-One components #1933