Closed Urahara closed 4 years ago
How are you configuring your proxy? Are you using proxy profiles?
@Gallaecio Hi, i configured using proxy profiles.
I'm running splash using docker and my proxy using node.
docker run -v ~/proxy-profiles:/etc/splash/proxy-profiles --network=host scrapinghub/splash
default.ini:
[proxy]
host=127.0.0.1
port=8001
username=test
password=
type=HTTP
[AnyProxy Log][2019-11-29 11:43:04]: received https CONNECT request httpbin.org
{
'proxy-connection': 'keep-alive',
host: 'httpbin.org',
'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/602.1 (KHTML, like Gecko) splash Version/9.0 Safari/602.1'
}
[AnyProxy Log][2019-11-29 11:49:21]: received request to: GET httpbin.org/get
{
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/602.1 (KHTML, like Gecko) splash Version/9.0 Safari/602.1',
Accept: 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Proxy-Authorization': 'Basic dGVzdDo=',
Connection: 'Keep-Alive',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'en,*',
Host: 'httpbin.org'
}
Making the same requests using cURL works smoothly on both cases.
curl -k -x http://test:@localhost:8001 https://httpbin.org/get
[AnyProxy Log][2019-11-29 12:03:50]: received https CONNECT request httpbin.org
{
host: 'www.google.com.br:443',
'proxy-authorization': 'Basic dGVzdDo=',
'user-agent': 'curl/7.58.0',
'proxy-connection': 'Keep-Alive'
}
I also created a standalone spider to testing with pure Scrapy and also works:
# -*- coding: utf-8 -*-
from scrapy import Spider, Request
class ProxyTestSpider(Spider):
name = 'proxy_test'
def start_requests(self):
yield Request(url='https://httpbin.org/get',
meta={'proxy': ' http://test:@localhost:8001'})
def parse(self, response):
self.logger.warning(response.text)
@Gallaecio Can you take a look at this? Is there any workaround?
I’m busy with other projects at the moment, and I’m not familiar enough with using Splash with proxies to suggest any workaround. If you think this may be a configuration issue on your side, I suggest you try asking in Stackoverflow or checking existing questions. Otherwise, we’ll have to wait until this is confirmed to be a bug (e.g. someone else reproducing the issue).
It's seems that the Splash or Qt have a different flow from others apps like cURL.
Because the CONNECT method don't send Proxy-Authorization but all others methods like GET/POST/etc send this header, when all others apps do the opposite, sends Proxy-Authorization on CONNECT and not on other methods.
So i was able to solve this, dealing with this flow, now my application is working fine with proxies.
I'm using AnyProxy and i can't use proxy on https endpoints with Splash, the authorization headers are not sended on CONNECT method, so i received 407 Proxy Authentication Required error.
The proxy works normaly on HTTP endpoints.