sanic-org / sanic

Accelerate your web app development | Build fast. Run fast.
https://sanic.dev
MIT License
18.03k stars 1.55k forks source link

Triggering a worker reload does not take the code change into consideration #2557

Closed cnicodeme closed 1 year ago

cnicodeme commented 2 years ago

Describe the bug Changing the code - like the response returned by a view - should be updated when the Sanic instance is reloaded (triggered either via the CLI --trigger-reload or via app.m.restart("__ALL_PROCESSES__").

Right now, it doesn't seems to be the case

Code snippet

from sanic import Sanic
from sanic.response import text

app = Sanic('Test')

@app.get('/reload')
async def re(request):
    request.app.m.restart("__ALL_PROCESSES__")
    return text('Restart triggered')

@app.get('/')
async def index(request):
    return text('Hello world')

if __name__ == '__main__':
    app.run(host='127.0.0.1', port=1337, access_log=True, debug=True)

Expected behavior

  1. Run the above code
  2. Go to "http://127.0.0.1:1337/"
  3. See the data : "Hello world"
  4. Change the code and replace "Hello world" by something else
  5. Go to "http://127.0.0.1:1337/reload"
  6. The console shows that the server was restarted
  7. Go back to "http://127.0.0.1:1337/"
  8. The text displayed is still "Hello world", not the new one.

(Restarting the server via the command line sanic --trigger-reload app:app has the same behavior.

Environment (please complete the following information):

cnicodeme commented 2 years ago

I've tried to follow the code and try to understand the difference in behavior between a reload from a change code, and a reload from the new Worker system (m.restart()), and I'm lost.

It seems that both are using the same logic ; when setting auto_reload, Sanic adds a new "Reloader" Worker that watches the files and trigger a restart('__ALL_PROCESSES__:[list of files changed]'), which will be caught by the WorkerManager to restart all the instances.

When calling request.app.m.restart('__ALL_PROCESSES__'), it calls the same "WorkerManager" code that restarts all the workers.

So why does auto_reload takes the files into consideration, and not a call to restart ?

Logically, I thought it was related to the fact that the Reloader module pass along the list of changed files, which would somehow update them before really restarting the app, which is not done when calling m.restart('__ALL_PROCESSES__').

So what I did was to modify the Reloader class.

I first removed the passing of the list of files. The Reloader module was now sending the trigger using only '__ALL_PROCESSES__', no files added to it.

I tried, and the modifications are still taken into consideration.

I then did the opposite, I modified the WorkerManager to includes a specific file (the one I was modifying) when receiving a 'ALL_PROCESSES__' trigger, and disabled auto_reload. Using the app shown above, I would go to the "/" page, modify the output from the code, save, and go to /reload. That reload would send the 'ALL_PROCESSES__' along with the path of the modified file, to tell that this was the code modified.

It didn't worked.

So somehow, somewhere I haven't looked, setting auto_reload modify the code changed when not setting auto_reload doesn't take into consideration the files modified.

I wasn't able to find where this is happening.

ahopkins commented 2 years ago

@cnicodeme I think you stumbled into an interesting edge case on Linux where a decision was made for slight performance boost. If auto-reload is not enabled, Sanic is not expecting a reload call to be accompanied with a desire for the app to be refreshed from source (after all, that is what auto-reload is meant for). In this case, and only if running on Linux, Sanic will opt for fork instead of spawn. And, in this instance since it has a copy from memory it is not actually reloading the source.

I suppose one option might be to skip this when DEBUG is enabled. But, that sort of circumvents the idea that the two environments should be as similar as possible. Another alternative would maybe have a ALLOW_RELOAD="always" type config value.

cnicodeme commented 2 years ago

That is definitely interesting (and yes, I'm on Fedora 36).

What happens when ALLOW_RELOAD is set to always? Where, in Sanic, it is decided that the code is reloaded from memory or fully loaded from the files? Or is this something handled by Python/Linux entirely ?

ahopkins commented 2 years ago

That is definitely interesting (and yes, I'm on Fedora 36).

What happens when ALLOW_RELOAD is set to always? Where, in Sanic, it is decided that the code is reloaded from memory or fully loaded from the files? Or is this something handled by Python/Linux entirely ?

On my phone so it doesn't allow me to copy with line numbers. Look at line 693 https://github.com/sanic-org/sanic/blob/main/sanic/mixins/startup.py

cnicodeme commented 2 years ago

Oh yes, I saw this one.

It might be interesting to be able to modify that behavior in this case?

My first thought would be to pass more parameters to the restart method, but instead of more and more parameters that would be added in the long run, maybe pass the context object directly, which would accept the list of reloaded files like it is the case already, and customize the behavior on restart?

(Except if the spawn vs fork is defined at the initial run, and cannot be changed after that?)

ahopkins commented 2 years ago

(Except if the spawn vs fork is defined at the initial run, and cannot be changed after that?)

Correct. This needs to be decided up front. My point earlier would be to always select spawn if ALLOW_RELOAD="always" or something like that. In the meantime, you could monkeypatch it so that auto-reload is off, but you can manually reload with spawn context:

from sanic import Sanic

Sanic._get_context = lambda *_: get_context("spawn")
cnicodeme commented 2 years ago

I'm trying to implement a restart based on a watcher but I'm stumbling upon an issue I can't really understand why:

app = Sanic('test')

@app.get('/')
async def index(request):
    return text('ok')

app.prepare(host='0.0.0.0', port=1337)
print(app.m) # <-- It will fail here
Sanic.serve()

AttributeError: 'Sanic' object has no attribute 'multiplexer'

Why can't I access the multiplexer (m) right after the app is ready?

I want to pass it to another class that will watch for files to be reloaded to trigger the m.restart() command.

ahopkins commented 2 years ago

The multiplexer is only available in worker processes. In what process is your watcher running? Why not use the built in reloader?

cnicodeme commented 2 years ago

I think it will make more sense if I share what I'm trying to achieve here, so:

I've created a secondary Sanic app on my server that solely listen to a POST request made from Github. When I push my code to the "prod" branches, that script is triggered to update the code locally, apply the new packages from requirements.txt, update the database (via alembic), then, and only then, restart the Sanic instance from the main app.

The initial issue I had was that I had set my main app with AUTO_RELOAD, but the issue was that Sanic was restarting upon file changes, but the database migration wasn't done yet, causing bugs on the newly restart app. That's why I had to delay the restart and do it manually.

From the new v22.9.0 Sanic version, I have a few options on how to do it:

  1. Create a new endpoint on my main app, secured, that when hit, would trigger the request.app.m.restart
  2. Use the INSPECTOR parameter, and call restart via the command line from my secondary app
  3. Listen to a specific file change (reload) and trigger on file change.

I'm not a fan of 1. because it can hardly be properly secured and someone can manage to access it and trigger restart infinitely. Using 2 is a good new way, but from the secondary app, doing a touch on the file is easier and from my point of view (maybe I'm wrong), listening on a (sole) file change via Watchdog is less resource intensive that having to open another socket to listen to (via INSPECTOR). That's why I went with 3.

So what I do in reality is create a Watchdog instance, passing it app before running it. When the reload file is changed, I'm trying to call the self.app.m.restart with is why I'm having this issue.

Now, you mention the "build in reloader", which seems to be what I'm looking for but I wasn't able to find a way to call restart directly from the created app that would trigger that reload to all the process.

Sorry for the long post, but maybe it will help you grasp my approach :)

ahopkins commented 2 years ago

OOo.... I like this! :heart:

There seems to be a legit use case for getting that Multiplexer into a manged process. Perhaps we should create a nicer API for it. Creating managed processes is something in general I want to make more simple, this would fit nicely.

In the mean time, you should be able to do it. Let me noodle on the idea a little and get back to you.

ahopkins commented 2 years ago

This should do the trick.

from sanic.worker.multiplexer import WorkerMultiplexer

app = Sanic("MyApp")
Sanic._get_context = lambda *_: get_context("spawn")

def all_acked(worker_state: Dict[str, Any]):
    return all(
        state.get("state") == "ACKED"
        for state in worker_state.values()
        if state.get("server")
    )

def my_reloader(monitor_publisher: Connection, worker_state: Dict[str, Any]):
    multiplexer = WorkerMultiplexer(monitor_publisher, worker_state)
    while not all_acked(multiplexer.workers):
        sleep(1)
    try:
        tick = count()
        while True:
            current = next(tick)
            if current and current % 10 == 0:
                multiplexer.reload("__ALL_PROCESSES__")
            sleep(1)
    except KeyboardInterrupt:
        print("done")

@app.main_process_ready
async def ready(app: Sanic, _):
    app.manager.manage(
        "MyReloader",
        my_reloader,
        {
            "monitor_publisher": app.manager.monitor_publisher,
            "worker_state": app.manager.worker_state,
        },
    )

In the future, I sort of feel like we need something a little nicer. Maybe it should look something like handlers with a decorator:

@app.managed_process("MyProcess", with_multiplexer=True)
def my_process(multiplexer):
    ...

Or, we could make it look more like a Blueprint

process = ManagedProcess("MyProcess", with_multiplexer=True)

@process.target
def my_process(process: ManagedProcess):
    assert hasattr(process, "m")

Open to other API ideas.

cnicodeme commented 2 years ago

Welp ... I'm not sure what this part does:

    try:
        tick = count()
        while True:
            current = next(tick)
            if current and current % 10 == 0:
                multiplexer.reload("__ALL_PROCESSES__")
            sleep(1)
    except KeyboardInterrupt:
        print("done")

(count() ?)

But you did put me on a great track, and I now have something that works great, here it is:

app = Sanic("MyApp")
Sanic._get_context = lambda *_: get_context("spawn")

autoreload = AutoReload()

@app.main_process_ready
def on_ready(app):
    autoreload.listen(app)

@app.main_process_stop
def on_stop(app):
    autoreload.stop()

app.run(**params)

And AutoReload:

from watchdog.events import FileSystemEventHandler
from watchdog.observers import Observer
import os

class ModificationEventLoader(FileSystemEventHandler):
    def __init__(self, app):
        self.app = app

    def on_modified(self, event):
        if event.is_directory:
            return

        self.app.manager.restart()

class AutoReload:
    """Watches the "reload" file for any changes and triggers a reload of Sanic"""

    def __init__(self):
        self.app = None
        self.observer = None

    def listen(self, app):
        reload_file_path = os.path.join(app.ctx.root_path, 'reload')
        if not os.path.isfile(reload_file_path):
            # We print here because the app was not instantiated!
            print('Missing reload file. Auto reload will be disabled')
            return

        self.observer = Observer()
        self.observer.schedule(ModificationEventLoader(app), reload_file_path, recursive=False)
        self.observer.start()

    def stop(self):
        """Stops the observer, if initiated"""
        if self.observer:
            self.observer.stop()
            self.observer.join()

Tested, it works perfectly!

cnicodeme commented 2 years ago

As for your suggestion to use decorators to access managers, it seems great, but I don't know what the usage would be, so I can't provide any good suggestions on this, I'm sorry.

cnicodeme commented 2 years ago

I updated my code to be clearer and easier to use:

app = Sanic("MyApp")
Sanic._get_context = lambda *_: get_context("spawn")

autoreload = AutoReload(app)

app.run(**params)

AutoReload:

from watchdog.events import FileSystemEventHandler
from watchdog.observers import Observer
import os

class ModificationEventLoader(FileSystemEventHandler):
    def __init__(self, app):
        self.app = app

    def on_modified(self, event):
        if event.is_directory:
            return

        self.app.manager.restart()

class AutoReload:
    """Watches the "reload" file for any changes and triggers a reload of Sanic"""

    def __init__(self, app):
        self.app = None
        self.observer = None

        self.reload_file_path = os.path.join(app.ctx.root_path, 'reload')
        if not os.path.isfile(self.reload_file_path):
            # We print here because the app was not instantiated!
            print('Missing reload file. Auto reload will be disabled')
            self.reload_file_path = None
            return

        app.register_listener(self.listen, 'main_process_ready')
        app.register_listener(self.stop, 'main_process_stop')

    def listen(self, app):
        if self.reload_file_path:
            self.observer = Observer()
            self.observer.schedule(ModificationEventLoader(app), self.reload_file_path, recursive=False)
            self.observer.start()

    def stop(self):
        """Stops the observer, if initiated"""
        if self.observer:
            self.observer.stop()
            self.observer.join()
ahopkins commented 2 years ago

Welp ... I'm not sure what this part does:

I was just simulating your reload watcher and making it restart every 10 seconds.

cnicodeme commented 2 years ago

Oh ok! That' where I would put the reloader system then.

I re-read the part about access the process, and I think the first one, with @app.managed_process makes more sense: in a way, you want to access the process from the app, so it makes sense.

(But I haven't fully grasped what this would be useful for, and for which case, so maybe I'm wrong).

ahopkins commented 2 years ago

It would be useful in your use case: wanting to run something to interact with your application. In many use cases trying to do what you are doing might lockup the main process and make it block. This could be a problem. In your case it does not seem to be because it looks like watchdog is not blocking the main process (not sure if its threading or multiprocessing under the hood).

For your use case, using these seems okay:

        app.register_listener(self.listen, 'main_process_ready')
        app.register_listener(self.stop, 'main_process_stop')

But, that likely will not be the case for many other implementations. Another use case is this: https://sanic.dev/en/plugins/sanic-ext/logger.html

Instead of logging in the worker process, it pushes every log call to a Queue that is then handled inside of a dedicated process. This would be super helpful if you were logging to some third-party application.

cnicodeme commented 1 year ago

Oh yes, it makes sense. I did used Watchdog instead of the custom code available at Sanic exactly because watchdog is non blocking ;)

cnicodeme commented 1 year ago

In that case, I think we can close this ticket, it is working fine and resolved for me.

cnicodeme commented 1 year ago

(Except if the spawn vs fork is defined at the initial run, and cannot be changed after that?)

Correct. This needs to be decided up front. My point earlier would be to always select spawn if ALLOW_RELOAD="always" or something like that. In the meantime, you could monkeypatch it so that auto-reload is off, but you can manually reload with spawn context:

from sanic import Sanic

Sanic._get_context = lambda *_: get_context("spawn")

I'm having issues starting an instance with the above hack present (with version 22.9.0). Removing that line works.

@ahopkins Can you tell me if there were any changes that could explain why it's not working anymore ?