Closed scrapenetwork closed 2 years ago
Are you setting the proxies according to https://github.com/scrapy-plugins/scrapy-playwright#proxy-support? I can't really do much without a (minimal) example and the resulting logs. See also https://github.com/scrapy-plugins/scrapy-playwright/issues/56#issuecomment-1033069738.
Correct settings the proxies correctly As other proxies work perfectly fine. Checking each ip to double check as well, i tried to reproduce but im unable to without the proxies which are giving this error. Those set of proxies work perfectly as well with other modules (ie scrapy,request,splash) but with playwright i get 407( on firefox its ns connection refused)
if you want i can send you the proxies so you can check it out, i tried to reproduce and monkey fix it but unable
Have you tried the same proxies with plain playwright-python? It is very hard to debug an issue without a code sample and/or execution logs. If you think there is a bug, please supply a minimal, reproducible example (emphasis on minimal). Also, to make sure you're facing an issue within the confines of this package, the example should work correctly by disabling scrapy-playwright
.
Looks like you just need my proxy for an example
as any example will pull the same error with the proxy, if you want i can dm you it privately and you can confirm it is a strange ssl issue , other module same proxy works fine.
I do not take private inquiries or requests, this is a public issue tracker and I want conversations to remain public. If you want help from the community, you should provide steps to reproduce the issue, only excluding or redacting parts because of privacy concerns or financial restrictions (paid subscriptions, for instance). So far the only thing I've learned is that you're getting 407 responses.
Sadly this error it does not matter which example i draw up as whats needed looks like is the proxies which are getting this error. Any example i sent will work as its looks like its a certain set of proxies. I cant determine the difference myself from proxies , on my end they all looks the same, and other modules have no issues using the 407 error / network denied error ones that this module pulls.
Apologies as its something i can not draw up an example for, i can send you just the proxy and you can input in any example and will pull the errors. I just dont want to send the private proxy publicly is all. I will continue on my end and if any results i will come back and share the feedback.
Well in that case I'm afraid I can't do much more. I'd suggest you to contact the proxy provider, they might be able to assist you.
For completeness, I'm including the code I'm using to try proxies with authentication.
from scrapy import Spider, Request
class ProxySpider(Spider):
name = "proxy"
custom_settings = {
"PLAYWRIGHT_LAUNCH_OPTIONS": {
"proxy": {
"server": "***",
"username": "***",
"password": "***",
},
}
}
def start_requests(self):
yield Request(url="http://httpbin.org/get", meta={"playwright": True})
def parse(self, response):
print(response.text)
import asyncio
from playwright.async_api import async_playwright
async def main():
async with async_playwright() as p:
for browser_type in [p.firefox, p.chromium]:
browser = await browser_type.launch(
proxy={
"server": "***",
"username": "***",
"password": "***",
},
)
context = await browser.new_context(ignore_https_errors=True)
page = await context.new_page()
await page.goto("https://httpbin.org/ip")
print(await page.content())
await browser.close()
if __name__ == "__main__":
asyncio.run(main())
Hello
using any example given in the example.py I am getting 407 error on a batch of valid proxies.
i ran an example with just scrapy and one with just requests, both work with the same proxy that playwright is pulling 407 on. Switching browsers/urls same issue as well.
I think is something with ssl maybe?
anyways the only way i know how to reproduce this issue is my using the proxy, as all sites pull 407 .
strange because the proxy works everywhere else but with this module. Any help would be appreciated