Mik-TF commented 6 days ago

Situation

We want to develop a load balancer that will redirect users from dashboard.grid.tf to the closest validator stack. Below is a recap the project to make sure the load balancer part is clear.

For example, we will have 16 validators running their full grid stack at dashboard.01.grid.tf, dashboard.02.grid.tf, ... dashboard.16.grid.tf

Then a user can go to dashboard.grid.tf and the load balancer points to any of the working validator stack URL (e.g. dashboard.04.grid.tf)

Users can also decide to simply go directly to a given validator URL (e.g. dashboard.04.grid.tf)

Full TFGrid Validator Stack Deployment Phase 1:

Here is a recap of the first phase of the project:

The first phase of the project Full TFGrid Validator Stack Deployment is to make it possible for anyone to run the grid independently, this means the full grid stack with tfhub and tfbootstrap.

we need to have docker-compose for hub and boostrap
- issues #49 + #50
then we need to integrate those steps into the tfgrid full stack docker-compose
once we have this, we consider the docker-compose for tfgrid full stack + hub + bootstrap to be the validator code deployment/validator stack.

Phase 2:

Once we have phase 1 ready, validators will be able to deploy the full grid stack and this grid stack will be available at some given URLs, e.g. dashboard.03.grid.tf, dashboard.04.grid.tf, etc.

We will be able to share this list of URLs on our websites (e.g. github, threefold.io, etc.). Users will be able to join these URLs to connect to specific grid instances, that are all independent from one another. This will make sure the grid is decentralized.

Once we have this, we will need to set a load balancer with all those URLs, so when a user goes to dashboard.grid.tf, the load balancer points the user to the closest grid instance (e.g. dashboard.grid.tf points to dashboard.03.grid.tf, if 03 goes down, it will point to 04, etc.)

Todo

For this issue, we want to develop a load balancer that will redirect from dashboard.grid.tf to the other validator grid stack (e.g. dashboard.05.grid.tf)

References and Suggestions

@coesensbert let me know if this is clear! I know you have some suggestions for this load balancer, please write your ideas on this issue.

LeeSmet commented 4 days ago

Reading this issue, I'm a bit confused as to what the goal is here, or what problem is being solved. If I understand correctly, the idea is to have multiple independent stacks, which are served both under a unique domain each, and a globally shared domain. This shared domain would host this load balancer/proxy, and proxy to individual stacks, potentially based on geolocation.

There are some problems with the concept as laid out above. First, this setup is considered decentralized because each stack is hosted individually, and by someone else. However if we put a centralized proxy in front of it, this removes the achieved decentralization (making the proxy the single point of failure in the process). Secondly, it makes no sense to proxy based on geolocation. At this level, the user already connected to the proxy, and from here on out al that matters is the latency between proxy and actual backend. Proxying to a physically close backend stack from the users perspective would give adverse effects if the user is far away from the proxy.

Imo a better solution to this would be to solve this on the DNS level, by running geo aware authoritative DNS servers, where the DNS reply returns the IP(s) of the nearest backend stack if this is what we want. That way there is no additional (potentially high) latency cost to a centralized proxy first, and the SPOF of the proxy is removed. For this reason it can also be considered more decentralized. Note that the TLS certificate will need to be present on every backend stack (for the shared domain), which will require some mechanism to either distribute a certificate from a central location, or some orchestration so these backends can all acquire and renew the certificate on their own.

Mik-TF commented 3 days ago

Thanks for this great feedback. Always highly appreciated.

Indeed I realized that the proxy was a SPOF. I thought worst case is that if it goes down, users will need to manually go to other URLs. It's not ideal for sure.

The idea you bring here looks way better and you seem to say it's feasible. I think perhaps @coesensbert told me something along those lines lately.

I think we could maybe close this issue and create a new one with what you suggest here.

threefoldtech / grid_deployment

Load balancer for tfgrid validators #56

Situation

Full TFGrid Validator Stack Deployment Phase 1:

Phase 2:

Todo

References and Suggestions