neondatabase / neon

Neon: Serverless Postgres. We separated storage and compute to offer autoscaling, code-like database branching, and scale to zero.
https://neon.tech
Apache License 2.0
14.33k stars 411 forks source link

proxy: grand vision #8663

Open conradludgate opened 1 month ago

conradludgate commented 1 month ago

I have wild ideas.

Write our own DNS load balancing system.

PageServer, for a given tenant, has a set AZ (can change overtime with rebalancing). Compute has a preference for the same AZ. Proxy has no preference.

Current flow:

  1. load DNS for ep-foo-bar.region.aws.neon.tech -> returning 3 IP addresses in a random order.
  2. IP points to one NLB in a each AZ. Random IP = Random NLB which points to proxy in that NLB by preference.
  3. Proxy then routes to the compute in the compute AZ.

Worst case:

Customer in us-east-2a connects to NLB in us-east-2b NLB connects to proxy in us-east-2c (no proxy currently running in us-east-2b) Proxy connects to compute in us-east-2a.

Suggested improvement:

The DNS server is aware that ep-foo-bar should be mapped to us-east-2a by preference, so puts that IP first in the order.

Write our own Load Balancer.

Our NLB setup round-robins between proxies. If we have 100 instances of proxy and 100 million endpoints in a region, we might end up caching all 100 million endpoints amongst all 100 proxy instances. It would be much better if we could use consistent hashing to make that cache a lot more efficiently packed.

Additionally, we have issues with long lived connections. Currently the pipeline is listed above, eg

client -> load balancer -> proxy -> compute.

Because of this, we keep proxy alive for a week after rollout.

It might be better if we have

client -> load balancer -> authenticator
                        -> compute

This means we can deploy proxy (the authentication system) without interrupting our long lived connections.

This load balancer would need to be postgres and TLS aware for both cases - it needs to read SNI which for now is unencrypted. If it wants to talk to compute directly, then it needs the TLS keys.

This load balancer should have much stronger network stack integration. For instance, it should not handle TCP keepalives and should forward them through to the compute directly.

We would need to keep the LB dumb so we don't need to deploy it so much.

lassizci commented 4 weeks ago

Our NLB setup round-robins between proxies.

https://www.linkedin.com/pulse/hash-flow-algorithm-aws-network-load-balancer-nlb-in-depth-mishra/

conradludgate commented 4 weeks ago

Our NLB setup round-robins between proxies.

https://www.linkedin.com/pulse/hash-flow-algorithm-aws-network-load-balancer-nlb-in-depth-mishra/

Cool, although looks like it's not public yet and I doubt it would work out of the box for postgres with TLS SNI

lassizci commented 4 weeks ago

It doesn't go in to that detailed level, but it's likely better than just round-robin from caching perspective.

The next rather big thing would be the DNS. There are surprisingly strict limits in number of records across providers and fixed order isn't even possible for all, so another thing to implement and do the management in way or another. Then I suppose it comes down to potential savings compared to the cost of running such thing.