Match curl's behavior for root domains in 'no_proxy'

pedorro commented 1 year ago

I have an additional update to Issue #120

In curl, the 'no_proxy' env var can include root domains, that will match all sub-domains. I think its semantics are complicated, and not well documented, but here's pretty nice breakdown for it:

https://about.gitlab.com/blog/2021/01/27/we-need-to-talk-no-proxy/

TLDR - curl allows you to declare a root domain in the 'no_proxy' var, and it will apply to any sub-domain of that root as well. It will also drop any preceding dot given in the root domain, so that to bare root also matches.

Ex:

$ export no_proxy=.myroot.com     # converted to myroot.com
$ curl https://mysub.myroot.com    # will bypass any proxy setting

sbstp commented 1 year ago

Seems like the new behaviour would make no_proxy=google.com match docs.google.com even if there are not dots at the beginning. Is that really the expected behaviour? I haven't had time to read the gitlab article yet.

pedorro commented 1 year ago

That is precisely the expected behavior, as demonstrated in this example:

$ curl -fs https://docs.google.com
$ echo $?
0
$ no_proxy="$no_proxy,google.com"
$ curl -fs https://docs.google.com
$ echo $?
28

On my work laptop, internet requests must all be routed through the corporate proxy. As shown, initially the curl request to https://docs.google.com succeeds (exit code 0). But when the root domain "google.com" is added to the 'no_proxy' list, that request then fails. Curl exit code 28 is a timeout, since that request cannot be completed without going through our proxy.

In addition to being curl's standard behavior, I think it also makes sense. It seems to me that the proxy requirements of any sub-domain will almost always be the same as those of the root. In the rare cases where that's not true, each sub-domain can be listed individually as needed. Especially in situations where dynamic sub-domains can come and go (think of a hosting service that uses sub-domains for each account, or something like that), being able to just specify the root domain is actually the right choice.

pedorro commented 1 year ago

Question about the test failures - I noticed that all the other test cases were using capitalized environment variable names (e.g. "NO_PROXY"). But that is not how the actual env var is supposed to be defined. And in fact, the only way I could get any of those tests to pass on my machine (MacOS), was to lower-case them all. I left my one new test case lower-cased, since that seemed correct to me (and was the only way I could get it to pass).

The question is, should that actually be upper-case for some reason, like the others? Is that what is required to get the test builds to succeed here?

Thanks.

sbstp commented 1 year ago

Normally the library should support both lowercase and upper case proxy environment variables. It's worth investigating what's going on with the test.

Replicating curl's behavior is perfect in terms of domain matching.

sbstp commented 1 year ago

Ah yes of course. If you look at the with_reset_proxy_vars function, you'll see that we only remove the variables in capital letter form. Not lowercase. You can add the lowercase version to the function if you want.

pedorro commented 1 year ago

If it's OK with you, I'll leave the other code as-is. I'll just get my new test passing, by using caps. Assuming that works, are you amenable to accepting this PR?

sbstp commented 1 year ago

Yeah sounds good.

sbstp commented 1 year ago

Deployed in 0.23.0

sbstp / attohttpc

Match curl's behavior for root domains in 'no_proxy' #129