samuong / alpaca

A local HTTP proxy for command-line tools. Supports PAC scripts and NTLM authentication.
Apache License 2.0
196 stars 35 forks source link

URLs are not passed to shExpMatch #58

Closed camh- closed 2 years ago

camh- commented 4 years ago

A CONNECT verb used when proxying usually contains just a hostname:port, not a full URL - i.e. it does not contain a scheme.

However, certain code in alpaca expects this to be a URL - in particular, it is passed to shExpMatch as a URL but without the scheme, so any matches against a pattern with a scheme will fail. It seems that chrome does pass a URL here, from anecdotal evidence.

I think we need to add the https scheme if it is missing. I'm not sure it makes sense for any scheme other than https to be used for the tunnel.

samuong commented 3 years ago

Interesting, this would definitely break PAC files which call shExpMatch with a glob starting with http*://, for example.

I agree that Alpaca should add a scheme if the URL does not already contain one - I guess this would happen in either findProxyForRequest() or FindProxyForURL().

Makes sense to go with https as the scheme in all cases, at least initially. But I do wonder whether there are any cases where CONNECT is used for non-TLS tunnels.

samuong commented 3 years ago

Btw, in addition to adding a scheme, does Alpaca need to do anything with the port? E.g. if we get a request like CONNECT example.com:443 HTTP/1.1 - should the PAC function get a url of https://example.com:443 or just https://example.com?

I'm not sure what PAC scripts out in the wild expect, or what browsers like Chrome do...