miyakogi / pyppeteer

Headless chrome/chromium automation library (unofficial port of puppeteer)
Other
3.56k stars 372 forks source link

what is the right way to open multi page's in one browser intense? #85

Closed imfht closed 6 years ago

imfht commented 6 years ago

Hi, I need crawl lots of urls use headless chrome and python. I think launch a browser for each url is not a good idea.

import asyncio
import uuid
from pyppeteer import launch

async def main(url):
    browser = await launch()
    page = await browser.newPage()
    await page.goto(url)
    # await page.waitFor("body > div.footer-up")

    title = await page.title()
    await page.screenshot(options={'path': "/tmp/%s.png" % str(uuid.uuid5(uuid.NAMESPACE_URL, url))})
    await browser.close()
    print(title)

for url in ['http://www.baidu.com', 'http://www.google.com']:
    asyncio.get_event_loop().run_until_complete(main(url))

I would appreciate any ideas. Thanks so much!

miyakogi commented 6 years ago

Use browser.newPage() to make new page instance for each url.

browser = await launch()
for url in ['http://...', 'http://...', ...]:
    page = await browser.newPage()
    await page.goto(url)
    # await page.waitFor("body > div.footer-up")

    title = await page.title()
    await page.screenshot(options={'path': "/tmp/%s.png" % str(uuid.uuid5(uuid.NAMESPACE_URL, url))})
    print(title)

 await browser.close()
marcinzielen77 commented 5 years ago

The code below fails after just few iterations

async def main():
    browser = await launch({"headless": False})
    for i in range(100):
        pprint(i)
        page = await browser.newPage()
        await page.goto('https://github.com/miyakogi/pyppeteer/issues/85')
        await page.waitFor(1000)
        await page.screenshot(options={'path': "/tmp/" + str(i) + ".png"})
        await page.close()

    await browser.close()

asyncio.get_event_loop().run_until_complete(main())

with the following exception: websockets.exceptions.ConnectionClosed: WebSocket connection is closed: code = 1006 (connection closed abnormally [internal]), no reason Traceback (most recent call last): ... File "/usr/local/lib/python3.5/dist-packages/pyppeteer/connection.py", line 176, in send method), '): Session closed. Most likely the ', '{}'.format(self._targetType), ' has been closed.'])) pyppeteer.errors.NetworkError: Protocol Error (Target.activateTarget): Session closed. Most likely the page has been closed.

Can you tell me what's wrong?