googleforgames / agones

Dedicated Game Server Hosting and Scaling for Multiplayer Games on Kubernetes
https://agones.dev
Apache License 2.0
6.07k stars 805 forks source link

Direct connection to a GameServer/Pod without NAT #3804

Closed daniellee closed 5 months ago

daniellee commented 5 months ago

Is your feature request related to a problem? Please describe.

We have a use case where we want to send traffic directly (via Quilkin) to gameserver instances.

On a single node, we want to have multiple gameservers/pods and be able to send traffic to them via their publicly routable IP address (ipv4 or ipv6) where each gameserver has the same fixed ContainerPort (7777 is the default for Unreal) and without port forwarding or NAT.

A previous issue and PR added the pod IP addresses to the Gameserver Addresses but it is still not possible to directly communicate with a Gameserver instance on Agones. The three port policies (Dynamic, Passthrough and Static) all assume that the HostPort is used in some way and GameServer.Status.Port always shows the HostPort.

Describe the solution you'd like To directly connect to a Gameserver instance with a publicy routable address and a fixed port without any NAT, the first item in Gameserver.Status.Ports should be the ContainerPort. Ideally no HostPorts would be set as they are not needed. A new PortPolicy called DirectToPod, NoNAT or similar would be one way to configure this.

The places in the code that would need to change:

Describe alternatives you've considered We have tried configuring Agones a few different ways to get around this issue.

No PortPolicy and no ports configured. This is the closest we got to get it working. No ports are defined in the Gameserver.Spec or the Gameserver.Status. But if you know that port 7777 is exposed then you can access the gameserver instance directly. For us to get this to work with Quilkin, Quilkin would have to default/fall back to port 7777 as Agones doesn't know anything about what ContainerPort is set. Also, that it is possible to work around the Port Policy this way seems like an undocumented, non-obvious feature.

Additional context

Here is an example of how you cannot get around HostPort if a PortPolicy is set.

In the Gameserver Spec:

Output from running kubectl describe gameservers:

Spec:
  Container:  simple-game-server
...
  Ports:
    Container:       simple-game-server
    Container Port:  7654
    Host Port:       7133
    Name:            default
    Port Policy:     Dynamic
    Protocol:        UDP

The Gameserver Status - returns the HostPort and not the ContainerPort.

Status:
  Address:  xxx
  Addresses:
    Address: xxx
    Type:     InternalIP
    Address:  xxx
    Type:     Hostname
    Address:  xxx
    Type:     PodIP
    Address:  xxx
    Type:     PodIP
...
  Ports:
    Name:          default
    Port:          7133

We tested with the PortPolicy options but none of them worked for us:

  1. Dynamic PortPolicy with a ContainerPort configured. Agones sets a HostPort and that is the port that is set in GameServer.Status.Ports - and that breaks the communication with Quilkin.
  2. Static PortPolicy with same port defined for HostPort and ContainerPort. Agones spins up one Gameserver instance/pod per node and is not possible to allocate more pods on that port per node.
  3. Passthrough PortPolicy. Sets a different port for every gameserver instance so would have to do some sort of mapping to get this to work with Quilkin and our games.
zmerlynn commented 5 months ago

Hmm, interesting idea, but..

No PortPolicy and no ports configured.

This is legit what I would expect to configure in this case, since you're not asking us to manage the port at all. I'm not certain why we need more configurability here.

daniellee commented 5 months ago

This is legit what I would expect to configure in this case, since you're not asking us to manage the port at all. I'm not certain why we need more configurability here.

There is a port that we are asking Agones to manage - the container port. It is mostly informational but that information is important. When no PortPolicy is set (or if we add this new PortPolicy), the GameServer CRD should include the port of the container, otherwise you cannot use the GameServer CRD to reach the gameserver.

We will be mixing on-prem clusters with GCP clusters so the GameServer.Status.Address field, GameServer.Status.Addresses field and the GameServer.Status.Port field are what we would use to connect to a GameServer instance.

zmerlynn commented 5 months ago

I am primarly pushing back because OSS projects tend to just keep accepting complexity until it eventually becomes a too complicated system. I'm wondering if we can come up with a better way to solve this than adding yet another PortPolicy, since there's a lot of complexity around PortPolicy already. I accept that uniformity of management is important, though.

If we do roll will this, DirectToGameServer (from #3807) is a mouthful. Maybe PodPort?

markmandel commented 5 months ago

So just popping over here to discuss design. I was thinking about this over the weekend.

So I'd like to reframe the issue a little, which i think will provide some clarity!

At first I thought "wait, if you know the port, you could configure your systems to go talk to that port! Easy right? Have no ports configured and then tell your external system use port ZZZZ" but then I thought about it, and realised:

So really what we're talking about here is a way for GameServers to not expose a hostPort, but still advertise which port internal communications are coming in on. Then it's up to the end user to use that address as they see fit.

Basically the equivalent of https://docs.docker.com/reference/dockerfile/#expose

Assuming that is correct, I'd probably argue for #3807 to actually be named None. Because you don't actually get a port, it's just advertising that a port exists - the rest is up to you.

Does that track?

daniellee commented 5 months ago

It does track. I did think about calling it None.

Another alternative is to not add a new PortPolicy and just change the behaviour so that the containerPort is set as the gameserver port. And then document that that is what happens if you only define a containerPort.

markmandel commented 5 months ago

Another alternative is to not add a new PortPolicy and just change the behaviour so that the containerPort is set as the gameserver port. And then document that that is what happens if you only define a containerPort.

Oh you mean set a containerPort at the GameServer.Spec.Template level? 🤔 that feels a bit "special case" to me. WDYT @zmerlynn ?

Also, since it's just advertising, there's no point in writing an e2e test to check connectivity. Connectivity isn't our (Agones as a project) problem - we just need to show you can get at the data 😄 Connectivity is the end users issue.

zmerlynn commented 5 months ago

I think I also like the idea of None, with the structure of #3807, and would prefer it to exposing it in the template, which feels a little funny. Let's stick to #3807 with a name change.

markmandel commented 5 months ago

SGTM! I'll also take a pass through #3807 today as well.

daniellee commented 5 months ago

I didn't mean GameServer.Spec.Template. I meant like I have done in the PR where GameServer.Status.Port is set to the containerPort from the spec instead of the hostPort.

I can change the PortPolicy to be None instead of DirectToGameServer. Does the rest look ok?

markmandel commented 5 months ago

I didn't mean GameServer.Spec.Template. I meant like I have done in the PR where GameServer.Status.Port is set to the containerPort from the spec instead of the hostPort.

Not sure I follow tbh - can you show what it would look like in a YAML example?

I can change the PortPolicy to be None instead of DirectToGameServer. Does the rest look ok?

I'll also do a review shortly 👍🏻

daniellee commented 5 months ago

I have updated the PR and it is ready for review. Though the build seems to be stuck - it's been going for more than 90 minutes now. It was green the first time but hung after I did a rebase and push to synchronize the branch.

Not sure I follow tbh - can you show what it would look like in a YAML example?

If I specify this in the spec for the gameserver:

spec:
  ports:
    - name: default
      portPolicy: None
      containerPort: 7777

Then after deployment, the object status should be:

Status:
  Address:  xxx
  Addresses:
 ...
  Ports:
    Name:          default
    Port:          7777

Right now the port would not be set if I don't define a PortPolicy.

markmandel commented 5 months ago

I have updated the PR and it is ready for review. Though the build seems to be stuck - it's been going for more than 90 minutes now.

That can happen - especially if it's waiting for it's turn on the e2e cluster.

Just looking at the YAML above, that was my understanding of how it would work. The status still gets populated, but with the original conatinerPort - @zmerlynn was that your understanding as well?

Although that does make the kubectl get gs a bit weird, since the port is not actually available on the IP address. 🤔 I'd rather a blank PORT row than a PORT that's not accessible.

Now I'm thinking about it, maybe None ports shouldn't end up in the status field, since we don't actually do anything with them on the Agones side?

I'm leaning that way, but I'm not 100% sure. What do you all think?

daniellee commented 5 months ago

Although that does make the kubectl get gs a bit weird, since the port is not actually available on the IP address

In our use case, the port is actually available on the IP address.

Using IPv6 and Calico with BGP, we can send traffic directly to a pod via its containerPort. If the gameserver.Status.Port is not set then we can't easily find out what the port is.

I know this isn't a common use case yet - but I think it will be once IPv6 gets more popular and the need for NAT or routing via host ports disappears.

markmandel commented 5 months ago

In our use case, the port is actually available on the IP address.

That's also a good point. I then agree - let's keep the port information in GameServer.Status.Ports.

Could always also add more information to kubectl get gs to make things clearer as well if need be.

This is where I say: This is why we have feature flags, so we can move forward with a implementation, and see if there anything to fix over time.

Okay cool - I think that is good enough general consensus! Lemme take a pass at the PR!

stevefan1999-personal commented 3 months ago

I have also written a Agones external controller that watches the GameServer resource and extra annotations to enable simple NLB with Cilium and could nicely leverage this new feature, since I can now separate the compute nodes and network-serving nodes with DDoS protection.

Since it is based on Cilium Node IPAM LB and Egress Policy, everything runs in eBPF and kernel (including network) so there is basically no userspace overhead (I don't think there are any context switches back to the Cilium agent?) whatsoever and the performance is smooth and superb with little to no tail latency added, unlike the Quilkin which significantly increases both jitter and tail latency.

The only downside is that I cannot get the origin IP since network nodes SNAT'd (aka masquerade) back to the compute nodes for everything, but we can still ban people via their UID (SteamID, Mojang UUID, whatever) and getting origin IP address is not that important at the moment, just that GeoIP won't work (and it is not accurate either)

I'm still testing it so hope I can release it soon.