microsoft / playwright-python

Python version of the Playwright testing and automation library.
https://playwright.dev/python/
Apache License 2.0
11.91k stars 915 forks source link

[Feature] A way to prevent SIGINT (cmd+c) being passed to the browser processes so that they can be shutdown cleanly. #1170

Open samwillis opened 2 years ago

samwillis commented 2 years ago

I have been using Playwright with the Scrapy web scraping framework, this is the plugin: https://github.com/scrapy-plugins/scrapy-playwright

Scrapy is designed to cleanly shutdown on SIGINT, saving its crawl state so that it can be resumed. When used with Playwright the SIGINT immediately closes the browsers (with both handle_sigint=False or True), this results in the currently processing pages crashing and the shutdown failing with the state not being saved. The ideal way to fix this is to prevent the SIGINT from being passed to the browsers so that scrapy-playwright can handle their clean shutdown as active pages finish being processed.

I have successfully made this work on posix by monkey patching Playwright based on this from stackoverflow to include a preexec_fn:

# Monkey patch Playwright
from playwright._impl._transport import PipeTransport, _get_stderr_fileno

def new_pipe_transport_connect_preexec(): # Don't forward signals.
    os.setpgrp()

async def new_pipe_transport_connect(self):
    self._stopped_future: asyncio.Future = asyncio.Future()
    # Hide the command-line window on Windows when using Pythonw.exe
    creationflags = 0
    if sys.platform == "win32" and sys.stdout is None:
        creationflags = subprocess.CREATE_NO_WINDOW

    try:
        # For pyinstaller
        env = os.environ.copy()
        if getattr(sys, "frozen", False):
            env.setdefault("PLAYWRIGHT_BROWSERS_PATH", "0")

        self._proc = await asyncio.create_subprocess_exec(
            str(self._driver_executable),
            "run-driver",
            stdin=asyncio.subprocess.PIPE,
            stdout=asyncio.subprocess.PIPE,
            stderr=_get_stderr_fileno(),
            limit=32768,
            creationflags=creationflags,
            env=env,
            preexec_fn=new_pipe_transport_connect_preexec,  # My new line!!
        )
    except Exception as exc:
        self.on_error_future.set_exception(exc)
        raise exc

    self._output = self._proc.stdin

PipeTransport.connect = new_pipe_transport_connect

What would be ideal is either a flag that could be set to enable this (for both Posix and Windows) or a way to pass a custom preexec_fn when starting a browser.

See the related bug on scrapy-playwright https://github.com/scrapy-plugins/scrapy-playwright/issues/62

zidokobik commented 1 year ago

Having the same issue where SIGINT closes the browser even with handle_sigint=False. I have a long running program that would appreciate graceful shutdown, I do so with loop.add_signal_handler() in asyncio, but SIGINT still kills the browser process.

Rafiot commented 1 year ago

Not as handy as SIGINT, but I use SIGTERM and it works doesn't kill the browser: https://github.com/ail-project/lacus/blob/main/bin/capture_manager.py#L66 & https://github.com/ail-project/lacus/blob/main/bin/stop_capture_manager.py

blksith0 commented 1 year ago
while True:
  do_stuff(playwright)
  sleep(long_time)

Look at this beautiful loop. I cancel it with Ctrl+C, then I handle the KeyboardInterrupt exception. Except Ctrl+C messes with the playwright browser. That's not nice. Alright, let's do handle_sigint=False, handle_sigterm=False Alas, to no avail. God please send divine knowledge and wisdom to my python code.

Rafiot commented 1 year ago

@blksith0 as far as I can tell, it is a bug in Playwright, you cannot solve it on your side.

If I may, I'd recommend you to use SIGTERM instead of SIGINT (as I explained in the message above), and it then works fine. But you cannot use Ctrl+C.

blksith0 commented 1 year ago

@Rafiot Yes, I see that. I'm just not smart enough to know how I can send a SIGTERM to my program and use that to stop the loop.

blksith0 commented 1 year ago

It would be really great if I could terminate my loop and not have it kill the playwright browser. Can you please look into this?

ustulation commented 1 year ago

While I'm aware what I'm going to write is not an exact solution to the mentioned problem, I've got a slightly different perspective due to the "other" stuff going on with playwright. Exactly like yous I needed these feature, because I wanted to handle playwright error separately and do something except for ctrl-c etc (mainly SIGINTs and SIGTERMs). I would expect them to generate asyncio.CancelledError which doesn't derive from Exception and is thus not accidentally caught unless you wanted to, and hence ending the application gracefully. That doesn't happen because while every other await gets the cancelled-error, awaits on playwright stuff gets a normal playwright-error about target being closed and so on.

However, playwright has another bug (hopefully soon to be solved after 2.5 yrs of it being an open issue) - it mem-leaks. It's a big thread but I face most problems mentioned there for long running applications. The TL;DR from there is that the browser will unexpectedly close after certain num of context-creates and destroys and mem just keeps rising (so it sometimes closes and sometimes crashes with the app with OOM in dockers/fargates but amounts to the same thing - the browser's gone).

Which means that the browser close not only happens on ctrl-c etc. but can happen due to other bugs like above. So instead of waiting for the solution from here (for ctrl-c to not raise playwright error due to browser close), I ended up doing the less desirable thing of catching playwright error, inspecting the string message and if it's anything like regex r'Target.*close.*' (because there seems to be some permutations like Target closed., Target page, context or browser closed. and so on) then raise the asyncio.CancelledError myself from that exception.

Ofc this may not be suitable for everyone as their use case might be different, but for someone who otherwise wants to do something separate for playwright-errors except browser close (e.g you might want to backoff and retry if doing a web-scrape) and interpret browser-close as an unexpected irrecoverable scenario, then this is perhaps the most straightforward solution (unless somebody knows a better way ofc).

Rafiot commented 1 year ago

@ustulation that's also what I'm doing. In case you want a pretty complete list of possible errors, I have one there: https://github.com/Lookyloo/PlaywrightCapture/blob/main/playwrightcapture/capture.py#L837

KRRT7 commented 7 months ago

following.

telecran-telecrit commented 7 months ago

Still not fixed! Can not stop playwright correctly after KeyboardInterrupt, and infinite awation produced.

    finally:
        print('Closing...')
        _BLOCK_IMAGES = _BLOCK_IMAGES_old
        try:
            if (context is not None):
                await context.close()
                context = None
            if (playwright is not None):
                await playwright.stop()
                playwright = None
            print(context, playwright)
        except:
            pass
        print('Done.[Closing]')
junyeong-huray commented 6 months ago

Hello folks,

Here's an OS tip related to this issue.

Have you heard about PGID (Process Group ID)? When you execute your program in a terminal, and your program fork()-ed multiple child processes, than all of the processes have the same PGID unless you change it explicitly.

The OS kernel knows your processes are running at the foreground of the terminal and it also knows the PGID of them.

IF you type Ctrl+c in a terminal, the terminal driver of the OS kernel sends SIGINT signals to all processes that belong to the PGID.

If your program imports playwright package, and it creates playwright instances, some of playwright scripts are executed in a child process, and browser process runs in a child process also.

In this setup, if you hit Ctrl+c, then all of these processes receive SIGINT signal simultaneously. Even if your main program handles SIGINT signal and does some cleanup including shutting down playwright, you can not stop SIGINT not to signal the child processes.

So in this case, just send SIGTERM to your main process instead of hitting ctrl+c in the terminal.

I hope this tip helps.