miyakogi / pyppeteer

Headless chrome/chromium automation library (unofficial port of puppeteer)
Other
3.56k stars 372 forks source link

Browser network get stuck when setRequestInterception is on #260

Open getter3 opened 4 years ago

getter3 commented 4 years ago

I worked on a small project and used this library for quite a week time.

I have tried to intercept the request of the page for analyze I wish to do it in non-headless mode, but strange behavior is seen

First of all, after setting Page.setRequestInterception() to True, Page.goto seems never returned Also there is seems no network after a while. This is essential as I am going to have some interaction manually on the browser

Here is my code, the code stuck at 'finished loading page':

import asyncio
import re
import time
from pyppeteer import launch
import logging
import logging.config

async def onMediaInfoIntercept(self, request):
    request.continue_()

async def main():
    logging.config.fileConfig('./logging.cfg')
    logger = logging.getLogger('root')
    inputurl = 'https://www.youtube.com/watch?v=LNBjMRvOB5M'

    logger.info('creating browser instance....')
    browser = await launch(
        {'headless': False, 'dumpio': True, 'autoClose': False, 'ignoreHTTPSErrors': True, 'autoClose': False,
         'args': ['--no-sandbox', '--enable-features=NetworkService', '--window-size=1080,720', '--disable-setuid-sandbox']})
    logger.info('finished browser init....')

    pages = await browser.pages()
    page = pages[0]

    await page.setRequestInterception(True)
    page.on('request', onMediaInfoIntercept);
    logger.info('finished intercept setup....')

    logger.info('goto page ....')
    await page.goto(inputurl, options={'timeout': 0 })
    await page.reload(options={'timeout': 0})
    logger.info('finished loading page')

    await browser.close()

asyncio.get_event_loop().run_until_complete(main())
Tren commented 4 years ago

I also encountered this problem and could not solve it!

getter3 commented 4 years ago

I also encountered this problem and could not solve it!

Well, I have switched to use puppeteer instead XD

playma commented 4 years ago

Face the same problem.

byamao1 commented 4 years ago

If you dont use page.setRequestInterception(True) , interception can also be done. Have a try.

# await page.setRequestInterception(True)
page.on('request', onMediaInfoIntercept)
wonghang commented 4 years ago

Hi, I suffer the same problem. Are there any update on this issue? Can anyone point me a way to fix it or any workaround? I am using a headless browser to automatic some tasks and I want to block loading of ads to improve the performance.

digitalkaoz commented 4 years ago

same problem here! it seems the page.on('request') callbacks dont get called. I looked into the code, and all seems correct when emitting those requests in Page and NetworkManager

digitalkaoz commented 4 years ago

@wonghang simply install an adblocker before launching the headless browser! Here is a version from the nodejs implementation (works the same with python)

https://gist.github.com/sindresorhus/bca2f7d0c8b31205fa3c9f328d548c70