microsoft / playwright-python

Python version of the Playwright testing and automation library.
https://playwright.dev/python/
Apache License 2.0
11.86k stars 906 forks source link

[Bug]: sync_api switches to async context after init #2649

Closed developer992 closed 20 hours ago

developer992 commented 22 hours ago

Version

sync_api switches to async context after init

Steps to reproduce

Hello, is this a bug?

from playwright.sync_api import sync_playwright

import asyncio

class BaseScrapper(object):
    @property
    def playwright(self):
        print('basescrapper.playwright')
        return sync_playwright().start()

    @property
    def browser(self):
        print('basescrapper.browser')
        return self.playwright.chromium.launch(headless=True)

    def open_page(self, url):
        print('basescrapper.open page')
        page = self.browser.new_page()
        page.goto(url)
        return page

class SampleScrapper(BaseScrapper):
    def run(self):
        print(f'before - {"async" if asyncio.get_event_loop().is_running() else "sync"}')
        page = self.open_page(url='https://www.google.com')
        print(f'after - {"async" if asyncio.get_event_loop().is_running() else "sync"}')

def run():
    print(f'run 1 - {"async" if asyncio.get_event_loop().is_running() else "sync"}')
    s = SampleScrapper()
    print(f'run 2 - {"async" if asyncio.get_event_loop().is_running() else "sync"}')
    s.run()
    print(f'run 3 - {"async" if asyncio.get_event_loop().is_running() else "sync"}')
>>> run()
run 1 - sync
run 2 - sync
before - sync
basescrapper.open page
basescrapper.browser
basescrapper.playwright
after - async
run 3 - async

It switches to async context for some reason, any idea why?

Expected behavior

i expected it to run in sync context

Actual behavior

it didn't

Additional context

No response

Environment

- Operating System: [Ubuntu 22.04]
- CPU: [arm64]
- Browser: [All, Chromium, Firefox, WebKit]
- Python Version: [3.12]
- Other info:
mxschmitt commented 22 hours ago

Looks like working as expected - Playwright uses asyncio under the hood, even in our sync implementation.

developer992 commented 22 hours ago

i am trying to parse a table and process each row individually but i get random order and thus breaks my logic

because i need to rely on sequential order for processing, so i can stop at the right time

something like this:

async def get_objects():
    async with self.open_page(url=self.PORTAL_URL) as page:
        # do some button clicking, enter form fields, click submit
        # which makes ajax call, which i intercept and store objects on self._data
        # then i yield these objects one by one
        for obj in self._data:
            yield obj

my func:

async def func():
    async for obj in scrapper.get_objects():
        print(f'process obj={obj}')

the processing is in different order than self._data

i tried to switch completely to sync context but then i got the problem above because django didn't want to cooperate on db calls

developer992 commented 20 hours ago

you may delete this thread