scrapinghub / splash

Lightweight, scriptable browser as a service with an HTTP API
BSD 3-Clause "New" or "Revised" License
4.09k stars 513 forks source link

Proxy-authorization is not being sended on 'CONNECT' method #969

Closed Urahara closed 4 years ago

Urahara commented 4 years ago

I'm using AnyProxy and i can't use proxy on https endpoints with Splash, the authorization headers are not sended on CONNECT method, so i received 407 Proxy Authentication Required error.

The proxy works normaly on HTTP endpoints.

Gallaecio commented 4 years ago

How are you configuring your proxy? Are you using proxy profiles?

Urahara commented 4 years ago

@Gallaecio Hi, i configured using proxy profiles.

I'm running splash using docker and my proxy using node.

docker run -v ~/proxy-profiles:/etc/splash/proxy-profiles --network=host scrapinghub/splash

default.ini:

[proxy]
host=127.0.0.1
port=8001
username=test
password=
type=HTTP

Logs from AnyProxy using Splash

Making the same requests using cURL works smoothly on both cases.

curl -k -x http://test:@localhost:8001 https://httpbin.org/get

Logs from AnyProxy when using cURL on HTTPS

[AnyProxy Log][2019-11-29 12:03:50]: received https CONNECT request httpbin.org
{
  host: 'www.google.com.br:443',
  'proxy-authorization': 'Basic dGVzdDo=',
  'user-agent': 'curl/7.58.0',
  'proxy-connection': 'Keep-Alive'
}

I also created a standalone spider to testing with pure Scrapy and also works:

# -*- coding: utf-8 -*-
from scrapy import Spider, Request

class ProxyTestSpider(Spider):
    name = 'proxy_test'

    def start_requests(self):
        yield Request(url='https://httpbin.org/get',
                      meta={'proxy': ' http://test:@localhost:8001'})

    def parse(self, response):
        self.logger.warning(response.text)
Urahara commented 4 years ago

@Gallaecio Can you take a look at this? Is there any workaround?

Gallaecio commented 4 years ago

I’m busy with other projects at the moment, and I’m not familiar enough with using Splash with proxies to suggest any workaround. If you think this may be a configuration issue on your side, I suggest you try asking in Stackoverflow or checking existing questions. Otherwise, we’ll have to wait until this is confirmed to be a bug (e.g. someone else reproducing the issue).

Urahara commented 4 years ago

It's seems that the Splash or Qt have a different flow from others apps like cURL.

Because the CONNECT method don't send Proxy-Authorization but all others methods like GET/POST/etc send this header, when all others apps do the opposite, sends Proxy-Authorization on CONNECT and not on other methods.

So i was able to solve this, dealing with this flow, now my application is working fine with proxies.