plotly / dash

Data Apps & Dashboards for Python. No JavaScript Required.
https://plotly.com/dash
MIT License
21.58k stars 2.08k forks source link

[BUG] long_callback fails when bundling python code #1885

Closed JonThom closed 3 months ago

JonThom commented 2 years ago

Describe your context Please provide us your environment, so we can easily reproduce the issue.

When using Pyinstaller to bundle a Dash app using long_callback as a single MacOS .app file, the app fails to register the long_callback. The error occurs in https://github.com/plotly/dash/blob/78ca3ec1752f6178fe25a02753b8a9f9946a5772/dash/long_callback/managers/__init__.py#L38 with the call to inspect.getsource, which throws an OSError, stating that the source code cannot be retrieved. In this case, the long_callback calls external libraries used to query a database. The error only occurs when packaging with Pyinstaller. Although the Pyinstaller packaging is probably a fairly rare use case, I was wondering why access to source is needed, and of course whether there might be an obvious workaround.

Expected behavior

Expect the long_callback to register correctly.

Simon-12 commented 2 years ago

I run into the same problem under Windows 10. Any updates on the issue here?

JonThom commented 2 years ago

@Simon-12 a workaround with pyinstaller is to write the long callbacks in a separate file and include it using 'datas'. Downside is that this exposes the long callbacks source code to the user. Let me know if you find any other way.

Simon-12 commented 2 years ago

@JonThom thanks for the fast response. I think i know what you mean. But, what you mean with in seperate file? The whole callback function in a seperate .py file as a function? What is with the path to the function?

I will test during the next week and will write a feedback here.

JonThom commented 2 years ago

@Simon-12

Here is a sketch of the suggested approach: Here we create a filelong_callbacks.py in the root project folder (for simplicity; you probably want to put it somewhere else) with the following content:


def register_long_callbacks(app):
   @app.long_callback(
        Output("id1","prop1"),
        Input("id2","prop2"),
    )
    def lc1(input_1):
        ...

    @app.long_callback() ..

Then in the main app file, main_app_file.py, import that function and call it, passing the Dash app object

from .long_callbacks import register_long_callbacks

my_app = Dash()
register_long_callbacks(my_app)

...

Now, in the pyinstaller .spec file, include the long_callbacks.py in the datas argument to Analysis:

a = Analysis([
        'my_app/main_app_file.py'],
        pathex=[],
        binaries=[],
        datas=[
            ('path/to/long_callbacks.py', '.')
        ],
alexcjohnson commented 2 years ago

@JonThom I missed this issue when you first raised it, apologies and thanks for bringing this case to our attention. The problematic line you link to is in building a cache key for this function invocation, I believe the reason we're using the function source for that is so the cache will invalidate if you modify the callback code itself. Clearly not a concern if you've wrapped the app up and shipped it, it's really only for apps being actively developed.

@T4rk1n I bet we can fall back on callback_id when getsource fails - want to give that a shot? Actually... we might even want to always include callback_id in generating that cache key, and only add the source if it's available. It would be weird, but I think you could construct two functions with identical source and identical arguments that nevertheless behave differently - maybe because they're in different modules and call out to different inner functions, or maybe because there's a factory function creating these long callbacks and the meat of the operation is passed in to the outer scope.

Simon-12 commented 2 years ago

@JonThom It was possible for me to implement the workaround and the error message disapered. But when i start the long callback function nothing happens and the app magically restarts. I have no clue ... Have anyone similar experiences?

T4rk1n commented 2 years ago

@JonThom @Simon-12

Downside is that this exposes the long callbacks source code to the user.

Just a warning, Pyinstaller doesn't obfuscate any part of the code, I can take the executable and get all the original source quite easily. The only way to keep the code secure is keeping it private and deploying the server.

T4rk1n commented 2 years ago

It would be weird, but I think you could construct two functions with identical source and identical arguments that nevertheless behave differently - maybe because they're in different modules and call out to different inner functions, or maybe because there's a factory function creating these long callbacks and the meat of the operation is passed in to the outer scope.

I agree the callback_id should be included in the cache key for callback factories that may share the same function but have different outputs/inputs.

JonThom commented 2 years ago

@JonThom @Simon-12

Downside is that this exposes the long callbacks source code to the user.

Just a warning, Pyinstaller doesn't obfuscate any part of the code, I can take the executable and get all the original source quite easily. The only way to keep the code secure is keeping it private and deploying the server.

Thanks @T4rk1n, I should have made clear that including the source as plain text just makes it even more trivial to access it. I was wondering, have you tried different approaches to obfuscating python code? Could one hope for an approach where reversing it is at least a very laborious process? I do not have the knowledge to judge a priori whether this should be the case. I have tried nuitka but overcoming the many errors arising with Dash and PySide required many workarounds and some issues I could not overcome. I also considered pyarmor, but haven't tried it yet.

T4rk1n commented 2 years ago

I was wondering, have you tried different approaches to obfuscating python code?

No, I don't think obfuscation is a reliable way to protect code. If there is intellectual property that should be protected by licensing. Otherwise private code or secrets used to access databases belongs on servers.

Simon-12 commented 2 years ago

I dont want to secure my code, i just want to distribute the app to my friends who are absolutely not familiar with programming.

I wrote a small example, just a simple dash app with a long callback: github.com/Simon-12/simple-dash If some one want to test, just open a terminal and run inside the folder: pyinstaller specs.spec

When i start the long callback function nothing happens, the app magically restarts and opens a new tab. If anyone has an idea, feel free to help.

Cheers

DeKhaos commented 1 year ago

I dont want to secure my code, i just want to distribute the app to my friends who are absolutely not familiar with programming.

I wrote a small example, just a simple dash app with a long callback: github.com/Simon-12/simple-dash If some one want to test, just open a terminal and run inside the folder: pyinstaller specs.spec

When i start the long callback function nothing happens, the app magically restarts and opens a new tab. If anyone has an idea, feel free to help.

Cheers

Hi @Simon-12 , have you solve this problem so far? Would be nice of you to share with me the workaround if you figured it out. I'm deploying a desktop dashboard using cx_freeze to convert python to .exe and I'm using lots of background callbacks and multiple pages. Every time I try to activate a background callback with a button, it doesn't do anything then restart the application as well as the console print.

JonThom commented 1 year ago

Hi @DeKhaos Not sure for cx_freeze, but pyinstaller makes it possible to include 'data' files with the app, and as described above, I included the long callbacks in such a data file when using pyinstaller.

DeKhaos commented 1 year ago

@JonThom , I tried as you suggested, but I got the same result as @Simon-12, the error message doesn't show up but the background callback doesn't work either. Every time I triggered the Run button, the server restarts and nothing happens.

I used example 4 from Background callbacks and separate the callback to another file as you suggested and included to datas of main.spec.

main.py

import time
import os
from long_callbacks import register_long_callbacks
import dash
from dash import DiskcacheManager, Input, Output, html

# Diskcache for non-production apps when developing locally
import diskcache
cache = diskcache.Cache("./cache")
background_callback_manager = DiskcacheManager(cache)

app = dash.Dash(__name__, background_callback_manager=background_callback_manager)
register_long_callbacks(app)
app.layout = html.Div(
    [
        html.Div(
            [
                html.P(id="paragraph_id", children=["Button not clicked"]),
                html.Progress(id="progress_bar", value="0"),
            ]
        ),
        html.Button(id="button_id", children="Run Job!"),
        html.Button(id="cancel_button_id", children="Cancel Running Job!"),
    ]
)

if __name__ == "__main__":
    app.run_server(debug=False)

long_callbacks.py

import time
import os

import dash
from dash import DiskcacheManager, Input, Output, html

def register_long_callbacks(app):
    @app.callback(
        output=Output("paragraph_id", "children"),
        inputs=Input("button_id", "n_clicks"),
        background=True,
        running=[
            (Output("button_id", "disabled"), True, False),
            (Output("cancel_button_id", "disabled"), False, True),
            (
                Output("paragraph_id", "style"),
                {"visibility": "hidden"},
                {"visibility": "visible"},
            ),
            (
                Output("progress_bar", "style"),
                {"visibility": "visible"},
                {"visibility": "hidden"},
            ),
        ],
        cancel=Input("cancel_button_id", "n_clicks"),
        progress=[Output("progress_bar", "value"), Output("progress_bar", "max")],
        prevent_initial_call=True
    )
    def update_progress(set_progress, n_clicks):
        total = 5
        for i in range(total + 1):
            set_progress((str(i), str(total)))
            time.sleep(1)

        return f"Clicked {n_clicks} times"

image

JonThom commented 1 year ago

@DeKhaos

I have taken your code and made a working app using pyinstaller Instructions in the README. I used pyinstaller since I have no experience with cz_freeze and imagine the problem and solution is the same. Hope it is of use.

DeKhaos commented 1 year ago

@DeKhaos

I have taken your code and made a working app using pyinstaller Instructions in the README. I used pyinstaller since I have no experience with cz_freeze and imagine the problem and solution is the same. Hope it is of use.

Thank you for your time and the clear instruction @JonThom

Although I copied your whole repository and install poetry and followed every step, running ./dist/background_callbacks/background_callbacks , it still doesn't work. At this point I have no idea what's the problem. On the other hand, the dist folder is very large (1GB) Capture1 Capture2

My environment is Window 10 64-bit and dash 2.7.0

JonThom commented 1 year ago

@DeKhaos

Sorry to hear the pyinstaller repo isn't working for you, either. Maybe it is linked to the OS (I am on macOS 13.2). I would at least try to get as much information as possible by setting app.run_server(debug=True) and checking the browser devtools console (open with Cmd + Shift + C on mac, then 'console') for any error messages when navigating to http://127.0.0.1:8050/.

Edit: I looked again at your debug info, have you checked out thatclick module error?

Simon-12 commented 1 year ago

Hi @DeKhaos,

sorry for my late response but there was a lot of stuff to do. I just tried all kind of python bundle tools like pyinstaller, cx_freezeand even Nuitka. Sadly, it all ends up in the same problem you also described.

My final workaround is. I just copied my whole python environment folder to the target device. For example, my python environment is under: <CONDA_PATH>/envs/dash. I copied the whole folder to C:/app_build and inside the folder I place my dash app code: C:/app_build/code. Then I create a short batch file to start the dash app C:/start_app.bat:

echo off
echo  ------------------------------
echo     Start Dash Application
echo  ------------------------------
call cd "app_build/code"
echo on
call "../python.exe" "run_app.py"
pause

And that’s it. I tested it on three different devices. In detail the python bundle tools do the same thing, They take your whole python environment and pack it up into an executable file (exe). When you start the exe, everything gets unpacked into a temp folder (this is actually very time consuming). I also created a python script to automate the copy process, maybe I can provide you next week.

Finally, it’s not the perfect solution but it works fine me😊

Cheers

DeKhaos commented 1 year ago

@JonThom Yeah, maybe it's because of the differences in OS. Things that work on your environment doesn't seem to work for me. With debug=True, after compiling into executable, it causes a different error, so I can't really have good look at the error. image According to this discussion, that can be avoid by using debug=False, you can see the irony here :joy:

@Simon-12 , really nice approach. I think that might work for me, it would be nice to see your code snippet soon.

Simon-12 commented 1 year ago

Hi,

here is my build script:

import os
import sys
import shutil

target_path = 'build/python-dash'
target_code = target_path + '/code'
target_assets = target_code + '/assets'
data = ['config.ini']
start_file = 'start_app.bat'

copy_interpreter = True
create_zip = False

def main():

    print('Creates build folder ...')
    if not os.path.exists('build'):
        os.mkdir('build')

    # Python interpreter
    if copy_interpreter:

        print('Copy python interpreter ...')
        if os.path.exists(target_path):
            shutil.rmtree(target_path)

        path_env = sys.executable
        path_env = path_env.replace('python.exe', '')
        shutil.copytree(path_env, target_path)

    # Copy code
    if os.path.exists(target_code):
        shutil.rmtree(target_code)
    os.mkdir(target_code)

    files = os.listdir('.')
    for f in files:
        if '.py' not in f or '.pytest' in f:
            continue  # skip
        shutil.copyfile(f, f'{target_code}/{f}')

    # Needed files
    for d in data:
        shutil.copyfile(d, f'{target_code}/{d}')

    # assets folder
    if os.path.exists(target_assets):
        shutil.rmtree(target_assets)
    shutil.copytree('assets', target_assets)

    # Start up file
    if not os.path.isfile(f'build/{start_file}'):
        shutil.copyfile(start_file, f'build/{start_file}')

    if create_zip:
        print('Create zip folder ...')
        if os.path.isfile('python-dash.zip'):
            os.remove('python-dash.zip')
        shutil.make_archive('python-dash', 'zip', 'build')

    print('Finished!')

if __name__ == '__main__':
    main()

I also update my repository with a working example: github.com/Simon-12/simple-dash

Cheers

JonThom commented 1 year ago

@DeKhaos I had forgotten that the pyinstaller workflow and .spec files are OS-specific.

Can you try the updated example https://github.com/JonThom/dash-background-callbacks-pyinstaller?

For simplicity, I have updated the project to use venv and pip rather than poetry.

PS: At least on macOS, pyinstaller builds both a single executable, as well as an app bundle. This is why the dist folder is so large. The app bundle loads faster because it doesn't need to unpack files to temporary directories.

corebit-nl commented 8 months ago

Adding the Python-source to the .spec file and

import multiprocessing
multiprocessing.freeze_support()

as described by @rokm in https://github.com/pyinstaller/pyinstaller-hooks-contrib/issues/493 did the trick for me!