microsoft / playwright-python

Python version of the Playwright testing and automation library.
https://playwright.dev/python/
Apache License 2.0
11.84k stars 904 forks source link

[Bug]: use async api with threading moudle cause process hang forever #2444

Closed goofy-z closed 5 months ago

goofy-z commented 5 months ago

Version

1.35.0

Steps to reproduce

python version: 3.7.9

import asyncio
import threading
from playwright.async_api import async_playwright

lst = [
    'https://www.sncf-connect.com/aide/contact#conseiller', 
    'https://www.sncf-connect.com/train/bons-plans/budget-mobilite?prex=homepage_footer', 
    'https://www.sncf-connect.com/aide/le-paiement-de-vos-billets-de-train-modes-de-paiement-acceptes', 
    'https://www.sncf-connect.com/conditions-generales-presentation-offres-agence'
]

async def run():
    async with async_playwright() as p:
        browser = await p.chromium.launch(
            channel='chrome', 
            args=[
                "--no-sandbox",
                "--disable-dev-shm-usage",
                "--blink-settings=imagesEnabled=false"
            ],
            headless=False
        )
        await asyncio.gather(*(_scrape(browser, j) for j in lst))

async def _scrape(browser, url):
    context = await browser.new_context()
    page = await context.new_page()
    async with page:
        try:
            await page.goto(url)
            await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
            html = await page.content()
        except Exception as e:
            html = ""
        return html

def test_async():
    asyncio.run(run())

p = threading.Thread(target=test_async)
p.daemon = True
p.start()
p.join()

Expected behavior

The code should execute successfully

Actual behavior

I use asyncio.run to execute my code and it works fine and exit successfully. but when I create a child thread to execute same code, the whole process hangs forever

I found that the code is running to async_playwright end context, can no longer execute, from the background, all chrome processes have quit。 it's looks like the main process have failed to catch the child thread has finished, but why the crawl action is completed, the browser has exited, still determines that the thread is not finished.

But instead of using thread, I can simply execute asyncio.run to exit. I hope someone can help me solve this problem. Thank you.

Additional context

No response

Environment

- Operating System: macos
- CPU: [arm64]
- Browser: chrome
- Python Version: [3.7.9]
- Other info:
mxschmitt commented 5 months ago

Python 3.7 is not supported anymore. Also you are not using Playwright's latest version - could you try updating both before we investigate further?

mxschmitt commented 5 months ago

Please re-file as per above.