googleforgames / agones

Dedicated Game Server Hosting and Scaling for Multiplayer Games on Kubernetes
https://agones.dev
Apache License 2.0
5.91k stars 778 forks source link

Support on-demand manualscaling on Fleets #3175

Open jeremylvln opened 1 year ago

jeremylvln commented 1 year ago

Is your feature request related to a problem? Please describe. I have a lot of different Fleet. The issue with the buffer-based autoscaler is that I will have empty servers until players want to play on these game modes.

I would rather want to have to want some sort of manualscaling, to allow me to only scale up my Fleet when needed, ephemerally, meaning that when the GameServer will shut down, the Fleet will not try to create another GameServer to replace it.

We can compare that to having Fleet as GameServer templates, and have the ability to ask Agones to create one GameServer.

Describe the solution you'd like Here is a draft of a chart illustrating a flow. It does not represent precisely my usecase, but I wanted to keep it simple:

sequenceDiagram
    autonumber

    actor P as Player
    participant C as Controller
    participant A as Agones

    P->>+C: "I want to play on XYZ"
    C->>+A: Scales up a Fleet by 1
    A->>A: Assign the GameServer
    A-->>-C: Returns the info of the GameServer
    C-->>-P: Returns the info of the GameServer

    Note over A: Finishes its game
    A->>A: Scales down the Fleet by 1

Describe alternatives you've considered I saw that there is a webhook-based autoscaling that can be used to have fine control over autoscaling. However, this would mean that I need a piece of software that keeps a counter, increases it on demand, and decreases it when a GameServer is stopped. It could work, but it is also a bit flaky.

Additional context Add any other context or screenshots about the feature request here.

roberthbailey commented 1 year ago

Thanks for taking the time to file this feature request (and even provide a diagram!).

It sounds like what you are looking for is a way to more easily create ephemeral game servers. As you mentioned, fleets provide a game server template which makes it easy to say "create another one like this". However, you mentioned that you had a lot of different fleets, so it sounds like there are many different game server configurations that you want to be able to support at any given time.

Another way to think about this would be to have your controller be configured with the game server template (instead of putting it into an Agones CRD) and have your controller create unmanaged game servers. If you don't want game servers replaced when they exit, using a fleet or game server set doesn't necessarily add any value, and you are fighting against the behavior that they are trying to provide.

Some questions for you?

  1. What happens if a game session crashes (instead of exiting normally at the end of a play session)?
  2. Do your game servers start quickly? Do you run into cases where starting a game server requires provisioning a new machine (VM)? How long is reasonable for players to wait for the controller to hand them back a game session?
  3. How do you handle updates to the game servers (new binaries, configuration changes, etc)? With fleets we provide rolling update features, but if you are just looking for templates, then I don't think you need this feature of fleets either.
jeremylvln commented 1 year ago

Hi @roberthbailey, thanks for your reply.

I just want to add some more context about why I need something like that. The built-in autoscaling method Agones provides is a "rich issue". Having a buffer of ready GameServer implies having x servers running at any time for any Fleet. Having y fleets implies at least x * y servers in total just waiting for players. It can rapidly implies having dedicated nodes that are just empty. This is not an issue in massive games because these servers will surely be filled in a short amount of time. But it is not true for my usecase, so I will waste a lot of money having empty servers - and it's absolutely not sustainable. That's why in an "early and not famous game", scaling the Fleet manually is a great method for controlling costs.

That being said, you'll understand that for game modes that are viral, having Agones autoscaling the Fleet with the buffer method is a good answer to the original issue ; but it is not true for the others. In another hand, having a controller managing Fleet for viral game modes and manually creating Pod for the other is not viable as it would imply having 2 separate code branches. I want to avoid doing this.

The more I think about it, the more I want to try to setup a "server counter", that can be increased with an API call (from a controller). It will also listen to GameServer CRD to decrement this counter when a server is destroyed. It will be a prototype, but it could be the easiest way of solving my problem. If conclusive, it could be upstreamed to Agones directly as a new method for a FleetAutoscaler.

Here are the answers to your questions:

  1. It will be too bad for the players, they will join again (automatically or manually) the matchmaking queue and the whole process will start gain
  2. Let's say 40 seconds at best, 1 to 2 minutes for some modes. Absolutely concerning the VMs, I count of the cluster autoscaler of my cloud provider to do its best to seed nodes. The time the players will wait is a "rich issue" once again, as I don't have unlimited funds, having the players to wait a bit more if a node needs to be provisioned is an acceptable trade-off.
  3. I really like the rolling update of Agones's Fleet, that's why I want to avoid doing all of this myself. But as you said, it is not an issue for today but maybe tomorrow.

Thanks again!

markmandel commented 1 year ago

The more I think about it, the more I want to try to setup a "server counter", that can be increased with an API call (from a controller). It will also listen to GameServer CRD to decrement this counter when a server is destroyed.

This sounds a lot like a Fleet Autoscaler with a webhook, combined with a Kubernetes Cluster Autoscaler to resize the cluster to fit the Fleet size.

Would that solve your problem?

jeremylvln commented 1 year ago

The more I think about it, the more I want to try to setup a "server counter", that can be increased with an API call (from a controller). It will also listen to GameServer CRD to decrement this counter when a server is destroyed.

This sounds a lot like a Fleet Autoscaler with a webhook, combined with a Kubernetes Cluster Autoscaler to resize the cluster to fit the Fleet size.

Would that solve your problem?

That's what I've explained in the first post of this issue.

However, I really think this would be quite "easy" to implement directly into Agones and would help a lot of people introduce Agones into their technical stack. A topology of people that want to optimize their costs.

I have no issue trying to implement that myself and PR this repo then. If that makes sense for you. Let me know!

Thanks!

markmandel commented 1 year ago

The issue here is - there's no way of Agones knowing when you will need a new GameServer for allocation, so you have to have a buffer (also game servers can often take a while to spin up so they need it for that reason as well).

So without some sort of external input (a webhook in this case), we can't know when to increment or decrement your fleet size -- only you know, since you have access to your auth systems, your matchmakers, etc.

github-actions[bot] commented 1 month ago

'This issue is marked as Stale due to inactivity for more than 30 days. To avoid being marked as 'stale' please add 'awaiting-maintainer' label or add a comment. Thank you for your contributions '