Fallback configuration - Githubissues

yaroslavros commented 1 month ago

Result of proxy lookup in PAC files may include multiple proxies to fallback to aka PROXY proxy1.example.com; PROXY proxy2.example.com which instructs the client to try proxy1 first, fallback to proxy2 if that one is not available and does not allow direct communication with the destination if both proxies are down. Or it can tell client to go direct if proxies are not available aka PROXY proxy1.example.com; PROXY proxy2.example.com; DIRECT.

Potentially proxy fallback can be achieved with PvD if multiple proxies are happy to take a given destination (though question of priorities is unclear), however it does not instruct client whether or not it may try direct communication.

It feels to me that this should be in scope so that PvD would be able to replace all reasonable use cases of PAC files.

tfpauly commented 1 month ago

The PvD can certainly list out multiple proxies that are eligible to be used, but it doesn't directly give a priority order for which one to use with preference.

We could either leave it up to clients to decide which order to try, or have some flag or number indicating preference.

SVCB records also do allow providing different services with priority, though... so if we are going via DNS, we can fail over that way.

ddragana commented 1 month ago

The PvD well-known url gives information about other related processes. The client can probably trust this source of information, as the proxy is preconfigured.

In case of getting PvD from a network, I think the client should decide, in any case, it should not trust the proxies, but should rather use this info to choose from pre-configured proxies. In this case, I think the decision should be made by the client.

I know that this PAC feature is used, but I do not know how important is the order or if it is "Here are 3 proxies use any that you succeed in connecting to"

joshco commented 1 month ago

Could there be a key for "default" meaning "if all else fails...", which could have a value of either a proxy, or a token like "DIRECT"?

yaroslavros commented 1 month ago

There are multiple scenarios when enterprise wants to signal preferences to the client from the network:

Geographic proximity of proxies on a global network (suggest client to go to the closest one from network point of view for performance and content localisation)
Regulatory reasons (need to egress from a certain location and use backup option if everything else fails)
Load balancing across proxies (order of proxies is shuffled for clients)
Capabilities available on proxies
Test/pilot scenarios for a subset of users with a somewhat seamless way for them to fallback if the primary proxy fails or is no longer available

In my experience strict order of proxy preferences is widely relied on in enterprise setups.

Perhaps worth discussing at IETF121.

joshco commented 1 month ago

Looks like our future is going to look like this:

{
  "proxies": [
    {
      "protocol": "http-connect",
      "proxy": "proxy.example.org:80",
      "priority": 10
    },
    {
      "protocol": "connect-udp",
      "proxy": "https://proxy.example.org/masque{?target_host,target_port}",
      "priority": 20
    }
  ]
}

Developers will have to sort based on the priority key.

DavidSchinazi commented 1 month ago

Can't we rely on the fact that JSON lists are ordered, and just say that the proxies are listed in decreasing priority?

joshco commented 1 month ago

That's a good point. However, does it need to support round robin for load balancing?

That could be specified with priority keys with the same value.

Perhaps the priority keys could be optional. If absent then just use the array order. In more complex cases the priority key can be used.

yaroslavros commented 1 month ago

I think for network provisioned proxy list it would make more sense to provide destination-centric proxy priority lists. Along the lines of:

{
  "proxies": [
    {
      "identifier": "legacy",
      "protocol": "http-connect",
      "proxy": "proxy.example.org:80"
    },
    {
      "identifier": "masque",
      "protocol": "connect-ip",
      "proxy": "https://proxy.example.org/masque{?target_host,target_port}"
    }
  ],
  "proxy_destinations": [
    {
      "matchPorts": [80, 443],
      "proxies": ["legacy", "DIRECT"]
    },
    {
      "matchDomains": ["*.internal", "*.intranet"],
      "proxies": ["masque", "legacy"]
    }
 ]
}

It's not uncommon for enterprise PAC files to carry thousands of match items and duplicating them across multiple proxy definitions feels to be unnecessary bloat, makes processing more complex and whole structure more error-prone.

Traditionally proxy load balancing with PAC files is accomplished with DNS round robin or separate sets of PAC files randomly distributed to clients. Some people also do hierarchical proxies for load balancing purpose...

yaroslavros commented 1 month ago

Or use null instead of "DIRECT" to avoid error-prone reserved identifiers.

joshco commented 4 weeks ago

Am I getting this right? So then "proxy_destinations" is a set of rules. A rules engine will take a given URL and run it through the rules and get an identifier for a given proxy. That's similar to a case statement or if/then else chain used in JS PAC.

However, what about protocol? who really decides which protocol should be used? Is that decided based on the destination URL? Or is it the App/Browser, perhaps being hinted by the server.

JamesTaft commented 1 week ago

Looks like our future is going to look like this:

{
  "proxies": [
    {
      "protocol": "http-connect",
      "proxy": "proxy.example.org:80",
      "priority": 10
    },
    {
      "protocol": "connect-udp",
      "proxy": "https://proxy.example.org/masque{?target_host,target_port}",
      "priority": 20
    }
  ]
}

Developers will have to sort based on the priority key.

May I add one more layer of complexity to this idea? Many enterprises will use the client's IP to determine which group of proxies to prefer, then set fail over proxies if the primaries are unavailable.

Could we add source subnets to the priority, giving a proxy config a higher priority if coming from those subnets, then a default priority for all other sources. This would allow the client to know which to prefer based on location and still be able to create a fail over list.

yaroslavros commented 1 week ago

I think proxy-specific priority is too crude as priority might differ depending on the destination. Also it does not allow us to communicate possibility for client to do direct fallback. I plan to submit a PR by the end of this week describing my previous suggestion with list of proxies per destination group.

I am not a big fan of encoding client IP restrictions in the PvD for a number of reasons:

Client could be behind NAT and does not know its IP
Client could have multiple interfaces some of which could have overlapping IPs
There are potential privacy and security risks associated with specific client IP hints

If client-specific configuration is needed, PvD contents should be provided depending on client IP from PvD hosting service perspective - similarly to how it is done today for PAC files. I'll submit a corresponding text in a PR to clarify this.

JamesTaft commented 1 week ago

Opened a new issue to discuss ClientIPs to stay on topic. I agree on the priorities.

How about some kind of specific syntax to allow round-robin or load balancing if needed, like:

"proxies": ["proxy1" OR "proxy2", "proxy3"] The browser or OS should randomly choose one of the two before the comma in order to promote session reuse. If the chosen proxy is unavailable, the other is attempted before moving on to proxy3.

"proxies": ["proxy1" AND "proxy2", "proxy3"] The browser or OS should round robin between proxy1 and proxy2, use the remaining proxy if one is unavailable, then move to proxy3 if both are unavailable.

tfpauly / privacy-proxy

Fallback configuration #262