scrapy-plugins / scrapy-playwright

🎭 Playwright integration for Scrapy
BSD 3-Clause "New" or "Revised" License
1.03k stars 113 forks source link

How to config the nested PageMethods into Request? #229

Closed sick-pupil closed 1 year ago

sick-pupil commented 1 year ago
for headline in page.locator("//div[contains(@class, 'headline') and contains(@class, 'clearfix')]").all(): 
        headline.scroll_into_view_if_needed()
        page.wait_for_timeout(2000)

How to config the nested methods into Request(url=..., meta={'playwright_page_methods': [...]})

sick-pupil commented 1 year ago
async def parse(self, response):
    page = response.meta['playwright_page']

    for headline in await page.locator("//div[contains(@class, 'headline') and contains(@class, 'clearfix')]").all():
        await headline.scroll_into_view_if_needed()
        await page.wait_for_timeout(2000)

    logging.info(await page.content())

    with open(file = os.path.join(self.screenshot_path, 'anime', '{}_{}_screenshot.html'.format('anime', self.current_timestamp)), mode = 'w', encoding = 'utf-8') as f:
        f.write(await page.content())

    selector = Selector(response = page.content())
    ...