triplea-game / triplea

TripleA is a turn based strategy game and board game engine, similar to Axis & Allies or Risk.
https://triplea-game.org/
GNU General Public License v3.0
1.33k stars 393 forks source link

RFC: Lobby Version switch using NGINX #10252

Open DanVanAtta opened 2 years ago

DanVanAtta commented 2 years ago

Proposal:

Notable changes:

Notable drawbacks:

Diagram

Here is what the the system would look like (all on one box): lobby-version-switch-using-nginx

Other notes:

Comments

RoiEXLab commented 2 years ago

@DanVanAtta I think the proposed change sounds good, however I'm not sure how you're planning the distribution mechanism. IIRC you were thinking about using headers to do the logic, but to be honest that just sounds like a custom version of virtual hosts.

For example making a request to 3-4.lobby.triplea-game.org automatically sets the Host header to the domain, making the configuration pretty trivial.

Pros of this approach

Cons

DanVanAtta commented 2 years ago

We are more trying to solve request routing (or arguably service discovery) more so than we are trying to achieve virtual hosts.

Routing based on headers is already in place and note we could easily redirect requests to arbitrary hosts by templating the 'localhost' portion of the redirect destinations:

Previously routing was done entirely client side by first parsing an index file: https://github.com/triplea-game/triplea/blob/master/servers.yml, running the current client version through a switch and then selecting the right host for future requests based on that switch result.

This routing logic is moving to server-side (ie: NGINX config) and how we do that routing in NGINX can be done in multiple ways.

DNS Based Routing

We could have multiple DNS entires referring to the same (NGINX) routing machine. NGINX could either have multiple server blocks or could have a switch statement using the host header. Note, that the switch statement based on a host header, or a version header is almost the same thing.

Though, there are some significant costs to DNS based routing. (1) We require many DNS entries and have to manage these (more on wildcarding later). DNS has limited access and this makes the problem of "I go on vacation and the TripleA project comes to a complete adn full stop" worse. (2) Testing of this configuration requires hacking of a developer machines DNS

The net effect is it is more difficult for others to work on new lobby versions.

(3) Deployment is not fully programmatic, we would require manual steps to set up new DNS entires and equally manual steps to remove old DNS entries (4) DNS is slow to change

Version Header Based Routing

On the other hand, we could have just one DNS entry and the 'version header' be the item that we use in the NGINX switch block when determining the correct final host to service that request.

Pros of DNS based routing vs Header based routing

Simple to implement for clients, nginx config is also straightforward

The header based routing is almost the same regardless which choice we make. Having multiple server blocks is arguably more complex (but more flexible), either way this advantage is nearly the same regardless of our choice.

Potentially allows to distribute versions onto several servers if we ever start noticing the load is too heavy without too much effort

This is the same with either approach. We can route requests to arbitrary servers and it's pretty easy to add a load balancer config to NGINX (which would be the same regardless of whether we switch on the host header or a version header). Though, load balancing is not what we are trying to achieve. It turns out we cannot do load balancing because lobby instances must be single instance.

Server cardinality & request routing, benefits

What we want is one lobby instance for each client version. Wildcard DNS entires would not work because 2.6 clients should be routed to the 2.6 lobby, and 2.7 clients routed to the 2.7 lobby, etc.. It would not be the case that any 2.x client goes to the 2.x lobby. (If we did wildcarding, it's a question as well how do we split 'beta' traffic from the 'prod' versions, but this question is a bit moot as there is a misunderstanding of lobby instance cardinality).

Instead, by having the release versions be 'pinned' between client and lobby, we never have to worry about backward compatibility. This is a main benefit. We simply leave the older instances running, when they start seeing close to zero traffic, we turn them off. To emphasize, this means we never have to worry about a 2.5 client working with a 2.7 lobby, we can completely change the APIs in a 2.7 and not worry at all about previous client versions. The only backward compatibility concern we have would be in database.

Another big benefit is this configuration is completely programmatic, no manual configuration and can be done entirely via pull request to modify the 'configuration-as-code'.

Last, clients would only need one stable DNS name forever, for any version. The DNS setup becomes a one-time operation, and if we want to migrate everything to a new server stack, it is just one DNS entry to update.

RoiEXLab commented 2 years ago

(3) Deployment is not fully programmatic, we would require manual steps to set up new DNS entires and equally manual steps to remove old DNS entries

This is only partially true, there's the concept of Wildcard DNS entries where all *.lobby.triplea-game.org hostnames are redirected to the same server.

DanVanAtta commented 2 years ago

@RoiEXLab in such a case we have a single wildcard domain - if we have clients sending requests to URLs like 2-6.lobby.triplea-game.org, and then we have an 'if/else' statement that switches on the host header, is that really much different from having clients sending the version in a header? From a systems perspective, both are switching based on a header value, both require the client to inject the version somewhere - I think the biggest differences would be in local development and test configuration.

Different aspect to this topic, I'm wondering if we even really need a 'beta' DB?

RoiEXLab commented 2 years ago

@DanVanAtta That's precisely my point. It sounds like the same concept. So i was just wondering if we're just reinventing the wheel here. So by using virtual hosts were making use of a well& established concept instead of a custom solution.

Local development and testing is a valid point though. I believe the Host header can be set as usual here to achieve the same thing but I'm not a 100% sure.

Both approaches have their pros and cons, but in the end they're almost identical

DanVanAtta commented 2 years ago

If reinventing the wheel, which tools are there that do what we want?

Virtual hosts is many DNS to one host, we are wanting the inverse (so this is a lot more akin to routing and load balancing)

On Sun, Apr 3, 2022, 8:04 PM RoiEX @.***> wrote:

@DanVanAtta https://github.com/DanVanAtta That's precisely my point. It sounds like the same concept. So i was just wondering if we're just reinventing the wheel here. So by using virtual hosts were making use of a well& established concept instead of a custom solution.

Local development and testing is a valid point though. I believe the Host header can be set as usual here to achieve the same thing but I'm not a 100% sure.

Both approaches have their pros and cons, but in the end they're almost identical

— Reply to this email directly, view it on GitHub https://github.com/triplea-game/triplea/issues/10252#issuecomment-1087059529, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC6SZOOBR5SYH3QNQ2PHH2DVDJL2LANCNFSM5SEFP65A . You are receiving this because you were mentioned.Message ID: @.***>

DanVanAtta commented 2 years ago

@RoiEXLab also what are your thoughts on the necessity of a preprod database?

CC: @tvleavitt , @bacrossland , y'all might be interested on this topic, feedback is welcome; particularly regarding any ways we can do this in a more simple manner and any potential pitfalls that you could foresee.

RoiEXLab commented 2 years ago

If reinventing the wheel, which tools are there that do what we want?

I was mainly thinking about nginx's server blocks. I was imagining a nginx config along the lines of this:

server {
  include common.conf;
  server_name 2-4.lobby.triplea-game.org;

  location / {
    proxy_pass http://localhost:1337;
  }
}
# repeat for every instance with different port and server_name

However I just read this article and learned that the map directive exists making this pretty convenient regardless. So it doesn't really matter what kind of header is used. A custom header could be used (like they do in the article) or alternatively the host header could also be used. So it's just a preference if you want to make it seem like there are many servers from the outside or if you want to hide them completely behind a single reverse proxy.

One thing that would also be possible in theory is to have some sort of versioning already built into the URL path. So https://lobby.triplea-game.org/2.4/* would actually forward to http://localhost:8080/* and for other versions respectively. But I'm sure this approach has it's own bunch of problems because now the root of the URL is no longer fixed, so I assume that's why it wasn't considered. Nginx is designed to achieve just this and using this approach we could just chain location blocks with proxy_pass directives and get a simple but expressive config. Just wanted to mention it for completeness.

RoiEXLab commented 2 years ago

Regarding a preprod database: I think there are some scenarios where it can be useful. This includes being able to find database migration issues that slipped through testing due to a small test set, as well as the possibility to have a playground to identify and debug reported issues that are not reproducable locally (maybe there's an issue that only occurs if the server and clients timezones differ). So it can be useful if there's actually data in the preprod database, but the better testing, code reviews and QA are the less useful it becomes I think.

DanVanAtta commented 2 years ago

I think we are pretty safe ground in terms of 'reinventing the wheel'. The 'wheel' in this case is using NGINX to do request routing using headers. The overall system design & configuration though is the thing we need to build.

Re: version in URL

A rule of thumb I've adopted is to avoid versions in URLs. Essentially a URL is the name of a resource, and it is an extremely long lived part of that entity. Using the URL for something that has a far shorter life span, less than the resource, creates a mismatch.

FWIW, there are a number of discussions/links supporting this perspective:

I listened to a talk (that I am still trying to find) that described, IIRC, both that version values should go into headers and then be routed, and second to pin these versions to older and running instances thereby avoiding the backward compatibility scenario.

Version based header routing, a plus for local development

Version based header routing does make local testing quite feasible, we just spin up a docker with nginx and the needed config file for routing & then we send requests to localhost and can verify they get routed correctly. We are currently able to set up every other component locally, implying we would still be able to simulate the full stack on a local developer machine.

Preprod - to be or not to be?

Some reasons against a preprod

Risks of no Preprod Database

Seemingly these are pretty minimal since I don't think we did very much with the previous preprod database. The biggest risk I see is we make some sort of change that injects data that is bad that a previous (and running) lobby version cannot deal with. I cannot think of any examples o the preprod database was the thing that prevented a data problem. Generally it is the DB testing and local simulation that find any potential issues.

No Lobby Preprod is a non-issue

The fact that the lobby software can have a newer version running, is basically a preprod, it's only that we are interacting with prod data. This would make feature preview probably much better, and if we are careful about how we insert & update data, then arguably it's a far better environment for finding problems.

DanVanAtta commented 2 years ago

Latest Proposed System Diagram if there is no preprod

lobby-version-switch-using-nginx

bacrossland commented 2 years ago

My vote is for the versioning through headers. It's not only easier for local development, it's easier to scale when running in production. If you pin the version to a DNS entry and the IP of the destination is updated, you have to wait for that DNS record change to roll through the internet (based on TTL and syncing of DNS servers) before all clients are routing to the proper location. That problem doesn't happen when routing by header option. Once the updated nginx config is deployed, routing happens immediately.

Having the version in the path of the url is similar to passing it as a header option but locks you into maintaining that pathing in future releases. If a change is made from a path of /2.4/ to /pre-release/2.4/ then redirects have to be maintained to ensure the first path is routed to the second path so older clients still work. Those redirects return a 302 response code which then opens up the question of how does that older client react to getting a 302 on a request. Was it only expecting 200? Does it know to follow the redirect? Passing the version as a header option avoids all of that.