googleforgames / quilkin

Quilkin is a non-transparent UDP proxy specifically designed for use with large scale multiplayer dedicated game server deployments, to ensure security, access control, telemetry data, metrics and more.
Apache License 2.0
1.3k stars 95 forks source link

Add datacentre discovery #789

Open XAMPPRocky opened 1 year ago

XAMPPRocky commented 1 year ago

For adding more accurate latency measurement between a proxy and a datacenter, we need the proxy to know what datacentres are available to measure.

As a solution we were thinking that we'd add a new xDS resource type called Datacentre or similar that a proxy which would contain the IP address and the QCMP port. The proxy can then use that address for QCMP latency measurements.

For relay deployments it would send all agents that are connected to it, for single cluster control plane deployments, it would return its own IP and QCMP port.

markmandel commented 1 year ago

For relay deployments it would send all agents that are connected to it, for single cluster control plane deployments, it would return its own IP and QCMP port.

It could also be an optional element - if it's not there, then it's not going to check latency and keep a metric of it (single cluster, and also if people just don't care 😃, say for example if people have separate installs in the same Cloud Region/data centre).

XAMPPRocky commented 1 year ago

I'm not sure I see the value of it being optional. Even if you're hosting in the same datacentre, understanding the latency between hops is important, as latency isn't only dictated by distance. If there is an intra-datacentre issue causing latency spikes (as opposed to inter-datacentre), then this would provide that information, where as if it's optional then you would be in the dark.

If you don't want that metric it's easier for the user to just add a filter to your grafana_agent to remove it. Having this information is important for quilkin to be able to build a network topology on top of this, so that we can accurately assign players to the cluster that is closest to their proxy.

markmandel commented 1 year ago

I'm not sure I see the value of it being optional. Even if you're hosting in the same datacentre, understanding the latency between hops is important, as latency isn't only dictated by distance. If there is an intra-datacentre issue causing latency spikes (as opposed to inter-datacentre), then this would provide that information, where as if it's optional then you would be in the dark.

That is true. But I also wonder if some people won't want the extra traffic (even though it's minimal).

I tend to err on the side of flexibility. Not a super strong opinion, but just something to consider.

XAMPPRocky commented 1 year ago

I can understand that, I'm always weary of adding something as option without a compelling reason to do so, as it adds another variation to test, and adds cognitive overhead (you have to know that the feature exists, and how to turn it on.).

I feel like if someone comes and provides a good reason, or we find it adds too much overhead, we should provide a way to disable it, without that though I think it should be included without an option, as it provides you with more insight, and having this work done for you makes Quilkin a more compelling product for operators.