stongo commented 3 years ago

Feature

Include Proxy Protocol to support the use of Load Balancers and Reverse Proxies.

Background

The current behaviour of a libp2p node (e.g. lotus) behind a load balancer is to reflect a server's private IP or loopback interface as the source IP address of upstream peer connections established through a Load Balancer or Reverse Proxy.

This is a well known TCP load balancing issue, with a conventional but complex workaround to use Transparent Proxies. This requires kernel and iptables configuration, which creates a high barrier to success running libp2p nodes in this use case.

For Proxy Protocol to be fully supported, the downstream endpoint (libp2p node) should support Proxy Protocol 1 and 2 to establish the client's IP address (upstream libp2p peer).

Current Behaviour

Behind an AWS Elastic Load Balancer:

$ lotus net peers
12D3KooWxxxxxxxxxxxxxxxxxxxxxxxxx, [/ip4/10.0.10.233/tcp/50202]

Behind a Reverse Proxy e.g. NGINX, HAPROXY:

$ lotus net peers
12D3KooWxxxxxxxxxxxxxxxxxxxxxxxxx, [/ip4/127.0.0.1/tcp/50202]

Desired Behaviour

$ lotus net peers
12D3KooWxxxxxxxxxxxxxxxxxxxxxxxxx, [/ip4/<upstream-peer-public-ip>/tcp/<upstream-peer-port>]

Current Pitfalls without Proxy Protocol support

negative peer scores resulting from duplicate peer IPs (private IP reflection or loopback)
restricted auditing capabilities

Stebalien commented 3 years ago

This would likely have to live as a module that plugs into go-libp2p-swarm and translates addresses. A general purpose "connection transformer" (takes a connection, returns a wrapped connection) may do the trick.

However, this seems pretty terrible. I'm surprised that, e.g., AWS doesn't just use a NAT. Do you know the motivation for this approach?

willscott commented 3 years ago

it comes out of http://www.haproxy.org/ where you have an edge that terminates TLS, and passes the unwrapped connection back to the backend application.

Stebalien commented 3 years ago

I see... this seems like a bad fit for libp2p:

Our crypto transports have libp2p-specific requirements/features so we can't really offload this work without a custom proxy.
We have UDP-based transports. Does this work?
Load balancing doesn't really work because every libp2p node has a different peer ID.

Is there no way to just use a NAT? And/or is there anything HAProxy provides us that's actually useful?

Note: Going down the "connection transform function" route is pretty simple and non-invasive so I'm not really against providing this feature, but I want to make sure it's really worth solving first.

stongo commented 3 years ago

An example use case is when a peer needs to be discoverable via DNS.

Having DNS point at a load balancer or reverse proxy allows for instantaneous adaptive routing in the event of node maintenance, disaster recovery or machine upgrade. Having DNS point directly at a node is several orders of magnitude slower to update in these scenarios.

Some Libp2p node roles this applies to include bootstrap nodes, hosted APIs, etc

stongo commented 3 years ago

Also note in some cases hot standby nodes behind a load balancer with a Fixed Weighting LB algorithm would be desirable. In general this pushes libp2p to move in the direction of High Availability (HA), rather than assuming a single peer to machine mapping.

Stebalien commented 3 years ago

Ok, so it looks like I misunderstood how this protocol worked. I assumed every proxied connection would get a new source port and you'd ask the proxy "what's the real IP/port behind this source port". Apparently, the proxy will just prepend a header to the packet.

This is doable in libp2p, just more invasive. We'd need to pass a "proxy" (that supports both UDP and TCP) to all transports on construction, and these transports would need to invoke this proxy on new inbound connections. We can do it but there will need to be significant motivation.

For now, I'd consider something like https://blog.cloudflare.com/mmproxy-creative-way-of-preserving-client-ips-in-spectrum/, even better, use a packet-based routing system.

In terms of motivation, it sounds like HA proxies were invented to:

Serve as a TLS/SSL gateway.
Work with HTTP servers over TCP.
Provide a nice and easy to deploy solution that didn't require mucking around with routes.

From this I conclude:

If you're at the level where you need an HA proxy, 3 is a very weak motivation. Unless this is the only solution available on your service provider...
Packet-based approaches are simply better.
HAProxy isn't a fit for libp2p as it doesn't support UDP based transports. It looks like Cloudflare has some basic (experimental?) support and AWS also has support, but with a completely different protocol.

I'm frustrated because it sounds like someone came up with a solution targeting HTTP proxying/balancing, then this solution became the "standard" (ish because there are several "standards") for general-purpose load balancing even though it operates on the wrong OSI layer. I'd like to avoid infecting libp2p with this if at all possible.

stongo commented 3 years ago

Proxy Protocol is not tied to HAProxy in any way, nor is it exclusively HTTP. For better or worst it has become a standard,gaining adoption in all reverse proxies... even databases and other applications now support it. It can be used with TCP only 100%.

The particular use case in question is with AWS Elastic Load Balancers and Kubernetes, but would effect any libp2p node on a conventional EC2 machine behind an ELB, as well.

To pass upstream client IPs, enabling Proxy Protocol (v1 in ELB Classic, v2 in NLB) is required.

@Stebalien I'm going to push back on your last comment, I don't think it's accurate.

Rejecting this feature pushes the complexities of using very standard ops tools back on to operators.

As mentioned in the original issue, configuring a transparent proxy is a possible work-around, but not a great one.

Running libp2p nodes in a Cloud native platform such as Kubernetes is something that would greatly benefit this project IMHO

Stebalien commented 3 years ago

Is there a single case where this is better (technically) than NATs/routing (for libp2p)? From what I can see:
1. It requires per-connection overhead. I can make a stateless NAT, but not a stateless proxy.
2. UDP support is spotty, complex (basically app-level NAT), and inefficient (additional per-packet headers).
3. It duplicates effort (features already exist in kernels/routers).
4. It bypasses firewalls (unless they explicitly support this protocol).
5. It needs to be manually configured on a per-application basis and will be broken and/or insecure if misconfigured.
6. It adds application-level overhead.
How consistent is UDP support? We've been trying to switch over to QUIC as the main transport to help consumers behind consumer routers.

If we decide "it's what people expect so we'll support it", I'll live with that. There's no point in fighting that particular fight.

But I'd still like to know why people seem to like this protocol so much. It seems to fall directly in the "worse is better" category. That way we can at least warn users.

willscott commented 3 years ago

i think the main reason that it's being done at application layer is that there are plenty of cloud cases where the people dealing with the application / fronting don't have or aren't managing the routing layer (or don't have root privileges on the edge / loadbalancers)

stongo commented 3 years ago

@willscott hits it on the head, and I why I say this is a case for Cloud Native support. Using AWS EKS Kubernetes as an example, a Kubernetes user (access to Kubernetes control plane only) can enable Proxy Protocol in a LoadBalancer Service type without any special permissions. In this case, if libp2p supports this protocol, a user could configure their p2p application to accept Proxy Protocols and have a fully functioning node with correct peer IPs. A Kubernetes user would not be able to configure a Transparent Proxy, and would need to request their operations team set up a very complex system - specialized Kubernetes workers with kernel and iptables configured to support Transparent Proxies, along with specialized reverse proxies running on those nodes to receive inbound libp2p traffic and transparently proxy the requests to libp2p nodes. This wouldn't be possible at all in GKE (Google Kubernetes) because operators don't have this level of access to underlying servers.

stongo commented 3 years ago

it's my understanding Proxy Protocol v2 has UDP support

stongo commented 3 years ago

also, AWS Network Load Balancers support UDP and Proxy Protocol v2

h4x3rotab commented 2 years ago

Strongly agree with @stongo.

In our case, we'd love to build an easy-to-scale Substrate node cluster in a GKE stateful set. The only problem we are facing is the p2p port. In the past we allocate a VM for each node. Everything goes pretty well. However after we have decided to switch to Kubernetes stack, we suddenly found problematic to expose the p2p port of the nodes in a cluster, where there's no good way to allocate public IP addresses to the nodes in the cluster.

The de-facto solution is to set up a LB for the service. However this violates the model of libp2p. A very typical use case is like this: A peer wants to connect to a target peer. So it looks up the endpoint with its peer id from the DHT, and then establish a tcp/udp connection to the endpoint. If we put a LB in front of a cluster, then the LB has no way but to randomly select a node in the cluster to connect, and it has only 1/n chance to connect to the correct peer.

jacobhjkim commented 2 years ago

Did anyone find a solution to this issue?

Winterhuman commented 2 years ago

@jacobhjkim https://github.com/mcamou/go-libp2p-kitsune is the closest thing I can think of.

libp2p / go-libp2p

Support Proxy Protocol #1065

Feature

Background

Current Behaviour

Desired Behaviour

Current Pitfalls without Proxy Protocol support