plotly / dash-labs

Work-in-progress technical previews of potential future Dash features.
MIT License
139 stars 39 forks source link

Long callbacks triggered multiple times when running with multiple processes #39

Closed sebbegg closed 2 years ago

sebbegg commented 3 years ago

Hi there,

we're testing the long_callback to issue somewhat long (several seconds) running SQL queries. When running with gunicorn with multiple workers I noticed that the queries are started/sent multiple times, from different workers when the polling-requests are routed to a worker which hasn't started the callback in the first place.

A tiny sample that reproduces the issue:

import logging
import os
import sys
import time
import dash
import dash_bootstrap_components as dbc
import dash_core_components as dcc
import dash_html_components as html
import dash_labs
import diskcache
from dash_labs import Input, Output
from dash_labs.plugins import DiskcacheCachingCallbackManager
import multiprocessing

# workaround issues with spawn method on macos
multiprocessing.set_start_method("fork")

worker_pid = os.getpid()

cache = diskcache.Cache("./dash-cache")
app = dash.Dash(
    plugins=[
        dash_labs.plugins.FlexibleCallbacks(),
        dash_labs.plugins.HiddenComponents(),
        dash_labs.plugins.LongCallback(DiskcacheCachingCallbackManager(cache)),
    ]
)
wsgi_app = app.server

app.layout = html.Div(
    [
        dbc.Button(id="button-id", children="Run Query!", n_clicks=0, color="primary"),
        html.P(id="paragraph", children="Click the button")
    ]
)

@app.long_callback(
    output=Output("paragraph", "children"),
    args={"n_clicks": Input("button-id", "n_clicks")},
)
def callback(n_clicks):
    if n_clicks == 0:
        return "Click the button"

    print(f"Starting query on worker {worker_pid}")
    time.sleep(5)
    print(f"Query done on worker {worker_pid}")
    return f"Query done on worker {worker_pid}"

if __name__ == "__main__":
    app.run_server(port=8000, debug=True)

Running that with

gunicorn long_callback_test:wsgi_app --access-logfile - --capture-output --log-file - -w 4 -b :8000

produces something like this:

127.0.0.1 - - [13/Aug/2021:12:11:18 +0200] "POST /_dash-update-component HTTP/1.1" 200 344 "http://localhost:8000/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36"
Starting query on worker 59201
127.0.0.1 - - [13/Aug/2021:12:11:29 +0200] "POST /_dash-update-component HTTP/1.1" 200 326 "http://localhost:8000/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36"
Starting query on worker 59203
127.0.0.1 - - [13/Aug/2021:12:11:30 +0200] "POST /_dash-update-component HTTP/1.1" 200 326 "http://localhost:8000/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36"
Starting query on worker 59202
127.0.0.1 - - [13/Aug/2021:12:11:31 +0200] "POST /_dash-update-component HTTP/1.1" 200 326 "http://localhost:8000/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36"
127.0.0.1 - - [13/Aug/2021:12:11:32 +0200] "POST /_dash-update-component HTTP/1.1" 200 326 "http://localhost:8000/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36"
Starting query on worker 59197
127.0.0.1 - - [13/Aug/2021:12:11:33 +0200] "POST /_dash-update-component HTTP/1.1" 200 326 "http://localhost:8000/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36"
Query done on worker 59201
127.0.0.1 - - [13/Aug/2021:12:11:34 +0200] "POST /_dash-update-component HTTP/1.1" 200 344 "http://localhost:8000/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36"
Query done on worker 59203
Query done on worker 59202
Query done on worker 59197

I that a known issue/limitation with the diskcache or am I doing something wrong here? Would using the celery backend solve this?

jonmmease commented 3 years ago

Thanks for the report and example @sebbegg. We'll get this fixed before long_callback is integrated into Dash for Dash 2.

uns1 commented 2 years ago

@jonmmease - was this fixed?

alexcjohnson commented 2 years ago

@T4rk1n would you please test this and verify whether it's still an issue?

T4rk1n commented 2 years ago

I confirm this is fixed.