Open oldani opened 5 years ago
This would be a good item to get fixed, currently when rendering I have to stop using proxy servers.
I will take on this
cool thanks, I was going to take a look later but I'm not up on the whole async thing yet :)
I am in a very restrictive Coorporate Network and expiriencing many issues with Python and Proxies since the beginning of using requests-html. My goal is to scrape some cisco site, which has al lot of html returned by js - therefor I have to use the render functionality.
1st (solved manually) The initial Chromium Download of pyppeteer does not use proxies, so I had to download it manually and check where it expects to be:
python -c 'import pyppeteer; print(pyppeteer.chromium_downloader.chromiumExecutable)'
>>'win64': WindowsPath('C:/Users/XXX/AppData/Local/pyppeteer/pyppeteer/local-chromium/575458/chrome-win32/chrome.exe'
2nd (solved manually) Chromium does not accept Auth+Password given to --proxy-server="XXX" arg, see here
Now I am starting chromium with
session = HTMLSession(browser_args=['--no-sandbox', '--proxy-pac-url="http://XXX/XXX.pac"'])
while using the Proxy Auto Auth addon for chromium...
Start chrome.exe with the --proxy-pac-url="http://XXX/XXX.pac argument, enter your credentials and install the Proxy Auto Auth addon. Restart chrome.exe with the arguemts and check if you can use it without any proxy auth.
3rd (not solved yet) The render function does not use my proxy:
req = session.get(url=url, proxies=proxyDict, verify=False)
req.html.render()
pyppeteer.errors.PageError: net::ERR_NAME_NOT_RESOLVED at <URL>
I would be very happy if this can be solved ...
+1 On this being an amazing thing to get resolved.
Are there any news about this issue? Scraping behind corporate proxies is impossible right now... Any planned progress on this? Thank you
Is there any news on this ? I saw this commit but don't know if it is the expected patch : https://github.com/psf/requests-html/pull/396
According to me, the best solution would be to be able to use proxies in the same way as requests do (from env or dict). Is it possible at this time ?
How is this going? I would like to know how I can use socks5 proxies with requests-html... and the .render() function.
bump? any updates?
bump
bump
any updates?
any updates?
I have used selenium for alternative, however it is a lot slower
If you're using proxies with
requests-html
and renderingJS
sites is all good. Once you render a website pyppeteer don't know about this proxies and will expose your IP. This is an undesired behavior when scraping with proxies.The idea is that whenever someone passes in proxies to the
session
object or anymethod call
, make pyppeteer also use these proxies. #265