[Feature proposal] Load balancer improvements and customisation

shadowsocks / shadowsocks-rust

A Rust port of shadowsocks

https://shadowsocks.org/

MIT License

8.37k stars 1.15k forks source link

[Feature proposal] Load balancer improvements and customisation #395

Open spyophobia opened 3 years ago

spyophobia commented 3 years ago

Rationale

The current load balancer is hard-coded to be "pseudo-ping"-based. This solution has several limitations:

Ping does not accurately reflect server "quality". For example, RTT is heavily dependent on geographical location. A server might be physically closer to the client and therefore has better ping, but actually offers very limited throughput. (This is actually my case: my VPS in San Jose has significantly higher bandwidth than my VPS in Taiwan, but the current load balancer prefers the later 😓.)
The current implementation is not a "true" ping, as it includes the RTT between the server and known endpoints (TCP: Firefox portal detection URL; UDP: Google DNS). This could potentially make the RTT less accurate, although admittedly it's an unlikely scenario.
Total bandwidth usage is not taken into account. One common use of load balancing is to distribute the data evenly (more or less evenly) across multiple endpoints (servers), or to set bandwidth caps for each endpoint.
General lack of configurability and extensibility.

What I want to accomplish in this proposal

Load balancing is a very complex subject, so there's a literal ton of things that we need to go through and decide on.

Therefore it is not my goal in this proposal (at least initially) to discuss any implementation details. Instead I would like to ask the community to brainstorm ideas regarding:

What sub-features of load balancing do we need to support?
How do we structure those sub-features in a way that is user-customisable (configurability) and somewhat future proof (extensibility)?

Of course we should still keep implementation in mind during the brainstorm. I.e. don't propose schemes that takes an entire Microsoft to develop.

General structure of the load balancing system I am envisioning

I guess the best way to illustrate this is to write some pseudo-Rust code.

This is very much my own early-stage idea, so feel free to disagree.

// Each variant is one factor that the ranking system can consider (i.e. metric)
// Each variant of Metric contains its score (and potentially other data)
enum Metric {
  ClientToServerPing(u32),
  ServerToEndpointPing(u32),
  DataRate(u64),
  DataCapReached(bool),
  ...
}

// This function returns the best server choice for this moment
//
// The load balancer calls this function every 5 seconds (or whatever interval the user specifies),
// and uses this ServerProfile for all requests until the next time this function is called
//
fn get_best_server(servers: &[ServerProfile], opts: LoadBalancer::UserOptions) -> &ServerProfile {

  let servers_metrics: Vec<Vec<Metric>> = servers.iter()
    .map(|sp: &ServerProfile| {
      // For each server profile, check its scores for all metrics defined by user
      let metrics: Vec<Metric> = opts.get_chosen_metric_suite().check_server(sp);
      return metrics;
    })
    .collect();

  let servers_final_score: Vec<SynthesisedScore> = servers_metrics.iter()
    .map(|v_m: &Vec<Metric>| {
      // For each server profile's metrics, generate an overall score (synthesise)
      let score: SynthesisedScore = opts.get_chosen_synthesiser().gen_score(v_m);
      return score;
    })
    .collect();

  let best_idx = servers_final_score.into_iter().enumerate().max_by_key(|(i, &&score)| score).unwrap().0;
  return &servers[best_idx];
}

Potential ranking metrics we can consider supporting

Ping between client and each server
Ping between each server and known endpoints
Total data usage for each server
Current data rate (inbound & outbound) for each server
User-assigned priority for each server

Feel free to suggest other potential metrics I missed.

How to build the score synthesiser?

Naive single-metric ranker

Quite self-explanatory. Only lets the user choose one metric, and just uses that metric's score to rank.

This could also be used as a baby-step towards a more comprehensive system.

A more comprehensive, user-configurable ranker

Honestly, I haven't made up my mind yet here. There are so many possibilities.

A weighted-average system? (e.g. ping & data rate)
With a "disqualification layer" placed in front? (e.g. data cap reached)
What about hysteresis? (to deal with instability in network conditions)

Maybe this last step can wait. Suggestions are more than welcomed, of course.

zonyitoo commented 3 years ago

Shadowsocks' protocol doesn't support a ping command, so that's why I chose to send requests to some known destinations for roughly estimate the real RTT between local and servers. With this method, we can also know: the connectability and functionality of servers, in other word, both inbound and outbound are working well.

Consider the application senario of shadowsocks, I think the goal is to "choose the best server and use it forever" instead of "distribute the data evenly (more or less evenly) across multiple endpoints (servers)". The latter one is my original idea, which is why this component is called "load balancer", because it was originally designed for sharing connections through all available servers. But I soonly found out:

Not all servers are in good condition: blocked, slow, packet loss occationally.
Browsers may send lots of independent connections for one website, if these connections are sent through different servers, mostly are in different location around the world, some weird things may happened.

So I think the design goal of this so called "load balancer" is to choose the server that is connectable, fast, stable.

Back to your proposal, I am totally agree about adding more metrics for describing a server. The more metrics you have, the more accurate you can get. But some of your proposed metrics are hard to get:

Ping between client and each server. Well, we don't have a ping command in protocol. Maybe you can set target address to be server itself, but that will become an obvious security flaw (shadowboom).
Ping between each server and known endpoints. Current strategy.
Total data usage for each server. If our goal is to choose and use only the best server, than other servers will not have a lot "data usage". Your goal is to calculate the amount of data that your server have been used. But you cannot get the acculate number without a database (otherwise everything is lost when sslocal is restarted).
Current data rate (inbound & outbound) for each server Same as above.
User-assigned priority for each server That's a good point.

spyophobia commented 3 years ago

I think the goal is to "choose the best server and use it forever"

Is this the current implementation? I'm a bit confused because I run my client with -v and I see the load balancer updating server scores every few seconds.

Not all servers are in good condition: blocked, slow, packet loss occasionally.

These can all become individual metrics can't they? Maybe some can act as a filter in the "disqualification layer" I mentioned.

Browsers may send lots of independent connections for one website, if these connections are sent through different servers, mostly are in different location around the world, some weird things may happened.

That's a very fair point and honestly this might be this proposal's greatest obstacle.

I don't think Shadowsocks has a concept of "session" does it? If not, are there other ways we can keep track of what requests belong to a single session ("session" in the non-technical sense of the word)?

Well, we don't have a ping command in protocol.

I know about this program named tcping but I haven't studied how it works. It seems to work on any address and port with an active TCP listener, even Shadowsocks servers (with or without plugin). This is perhaps one way of implementing a "pure" ping without modifying the Shadowsocks protocol or compromising security.

We can still keep the current "pseudo-ping" implementation as a test for the server's "functionality".

Maybe you can set target address to be server itself, but that will become an obvious security flaw (shadowboom).

That's interesting. Can you please explain this security flaw in a bit more detail?

If our goal is to choose and use only the best server, than other servers will not have a lot "data usage". Your goal is to calculate the amount of data that your server have been used. But you cannot get the accurate number without a database (otherwise everything is lost when sslocal is restarted).

This comes back to "configurability" - some users might prefer to always use the best server, while some might prefer to prioritise total bandwidth usage.

So my idea on this is actually to have the user set a "target data ratio" between the servers. For example:

Server A: data_weight=2
Server B: data_weight=5
Server C: data_weight=1 And sslocal can try to balance using those weights, so that the ratio of data going through each server will be approximately 2:5:1. We can set different counting modes for the user to choose as well, for example: in, out, sum(in, out), max(in, out).

Therefore a database in disk, while nice, is not strictly necessary.

Similar logic applies to data rate.

zonyitoo commented 3 years ago

That's interesting. Can you please explain this security flaw in a bit more detail?

https://github.com/Qv2ray/rc4md5cry

Is this the current implementation? I'm a bit confused because I run my client with -v and I see the load balancer updating server scores every few seconds.

Yes. It checks all servers' connectivity and choose the best server. All subsequence connections will go to that best server. In most cases, switching between servers is a non-frequent action.

And sslocal can try to balance using those weights, so that the ratio of data going through each server will be approximately 2:5:1.

Hmm, that's not a good strategy, to be honest. You can only do this only if Server A, B, C have the same condition, so connections sent through any of these servers will have no difference in user experience. But what if these 3 servers are located in different countries all over the world? For example, A is in Japan, and C is in US, connections that are sent to C will experience significantly higher latency than A.

So if we are going to introduce weight for servers, that weight should be put into account of servers' score for affecting servers' rank. A simple way is:

Score = Connectivity_Score * Weight

It seems to work on any address and port with an active TCP listener, even Shadowsocks servers (with or without plugin).

How can it work if you are using a QUIC plugin?

This comes back to "configurability" - some users might prefer to always use the best server, while some might prefer to prioritise total bandwidth usage.

That will become a totally different balancing strategy. I will suggest to make another independent balancer for this case. This balancer will try to distribute connections through all servers instead of choosing the best one.

I don't think Shadowsocks has a concept of "session" does it?

No. Shadowsocks' protocol is stateless.

zonyitoo commented 3 years ago

I had a glance on tcping,

It has different mode, TCP, HTTP, ICMP, ...
- TCP: measures the time of connect()
- HTTP: measures the RTT to the target URL
The strategy we are using is the same as the HTTP mode of tcping.

IceCodeNew commented 3 years ago

How about having a minimum server switching interval? Or having a "tolerance" factor in case there are several servers' latencies are roughly the same? I just countered such a situation that sslocal switches among servers in a frankly high frequency.

2021-08-03T 16:38:27.648638600+08:00 INFO  switched best TCP server from C.domain.tld:* to B.domain.tld:*
2021-08-03T 16:38:40.122831500+08:00 INFO  switched best TCP server from B.domain.tld:* to C.domain.tld:*
2021-08-03T 16:40:02.669645500+08:00 INFO  switched best TCP server from C.domain.tld:* to B.domain.tld:*
2021-08-03T 16:40:28.396512900+08:00 INFO  switched best TCP server from B.domain.tld:* to C.domain.tld:*
2021-08-03T 16:40:41.167336100+08:00 INFO  switched best TCP server from C.domain.tld:* to B.domain.tld:*
2021-08-03T 16:40:53.677215100+08:00 INFO  switched best TCP server from B.domain.tld:* to A.domain.tld:*
2021-08-03T 16:43:33.978495700+08:00 INFO  switched best TCP server from A.domain.tld:* to B.domain.tld:*
2021-08-03T 16:45:29.592901100+08:00 INFO  switched best TCP server from B.domain.tld:* to C.domain.tld:*
2021-08-03T 16:45:42.379495+08:00 INFO  switched best TCP server from C.domain.tld:* to A.domain.tld:*
2021-08-03T 16:48:20.282044700+08:00 INFO  switched best TCP server from A.domain.tld:* to B.domain.tld:*
2021-08-03T 16:48:32.781551900+08:00 INFO  switched best TCP server from B.domain.tld:* to A.domain.tld:*
2021-08-03T 16:49:25.430900500+08:00 INFO  switched best TCP server from A.domain.tld:* to B.domain.tld:*
2021-08-03T 16:50:28.129373100+08:00 INFO  switched best TCP server from B.domain.tld:* to A.domain.tld:*
2021-08-03T 16:53:13.740076200+08:00 INFO  switched best TCP server from A.domain.tld:* to B.domain.tld:*
2021-08-03T 16:55:51.749668400+08:00 INFO  switched best TCP server from B.domain.tld:* to A.domain.tld:*
2021-08-03T 16:56:57.626669500+08:00 INFO  switched best TCP server from A.domain.tld:* to B.domain.tld:*
2021-08-03T 17:01:00.713989900+08:00 INFO  switched best TCP server from B.domain.tld:* to C.domain.tld:*
2021-08-03T 17:02:05.907808+08:00 INFO  switched best TCP server from C.domain.tld:* to A.domain.tld:*
2021-08-03T 17:03:23.549524500+08:00 INFO  switched best TCP server from A.domain.tld:* to B.domain.tld:*
2021-08-03T 17:03:36.099284900+08:00 INFO  switched best TCP server from B.domain.tld:* to C.domain.tld:*
2021-08-03T 17:06:14.222675700+08:00 INFO  switched best TCP server from C.domain.tld:* to A.domain.tld:*
2021-08-03T 17:06:54.571325600+08:00 INFO  switched best TCP server from A.domain.tld:* to C.domain.tld:*
2021-08-03T 17:07:09.592625900+08:00 INFO  switched best TCP server from C.domain.tld:* to A.domain.tld:*
2021-08-03T 17:08:27.358105600+08:00 INFO  switched best TCP server from A.domain.tld:* to C.domain.tld:*
2021-08-03T 17:16:32.664660500+08:00 INFO  switched best TCP server from C.domain.tld:* to A.domain.tld:*
2021-08-03T 17:18:42.771705800+08:00 INFO  switched best TCP server from A.domain.tld:* to B.domain.tld:*
2021-08-03T 17:21:20.709584400+08:00 INFO  switched best TCP server from B.domain.tld:* to A.domain.tld:*
2021-08-03T 17:23:46.808590200+08:00 INFO  switched best TCP server from A.domain.tld:* to B.domain.tld:*
2021-08-03T 17:26:27.010577100+08:00 INFO  switched best TCP server from B.domain.tld:* to C.domain.tld:*
2021-08-03T 17:26:39.640947400+08:00 INFO  switched best TCP server from C.domain.tld:* to A.domain.tld:*

@zonyitoo

database64128 commented 3 years ago

@IceCodeNew Switching between servers won't break existing connections. You might want to group servers based on the same VPS (e.g. direct, relayed, via IPv4, via IPv6, etc) together, so that no matter how frequently it switches, your "actual" IP won't change.

IceCodeNew commented 3 years ago

Switching between servers won't break existing connections.

The real problem here is that connecting to another server takes time, while the existing connections or newly established ones will have to halt.

You might want to group servers based on the same VPS (e.g. direct, relayed, via IPv4, via IPv6, etc) together, so that no matter how frequently it switches, your "actual" IP won't change.

That's exactly what I did. And again, the problem I am concerned about is discrepant with your answer.

By proposal ideas like minimum server switching interval or "tolerance" factor, I'm trying to solve the problem which makes shadowsocks relay sluggish. Whether my hunch is right or not, I do think switching servers at such frequency is unreasonable.

zonyitoo commented 3 years ago

If your servers were switching so frequently, that means your servers had very similar scores. So I think spreading connections throughout all your best servers are completely reasonable.

This is why it is called a "balancer".

To achieve your goal, you could manually apply weights for your servers, then you can modify their ranks manually.

IceCodeNew commented 3 years ago

If your servers were switching so frequently, that means your servers had very similar scores. So I think spreading connections throughout all your best servers are completely reasonable.

This is why it is called a "balancer".

To achieve your goal, you could manually apply weights for your servers, then you can modify their ranks manually.

You have a point here.

nirui commented 2 years ago

All due respect, I think maybe the solutions mentioned above is too complex. Please allow me to share some of my thoughts on this.

Before I begin, let me address this real quick:

I think the goal is to "choose the best server and use it forever"

This was never an option. Even through it simplifies development, many Shadowsocks instances in real life runs a months at a time without restart, meaning the chosen best at the start will highly likely no longer still be the actual best in the end. Dynamic probing and selecting is the only reasonable choice here (another option is to abandon auto balancing completely, like how it was done before, and it's horrible).

Now back to the topic. Ideally, you want to make such ranking system passive (oppose to active, such as pinging), meaning it should not generate additional traffic. This is especially important when traffic privacy is a priority.

Let me start by asking "How to quantify 'Quality'?". Based on my personal experience (I worked on my own proxy program before), the factors are: Latency, Longevity and Throughput.

Latency: the responsiveness of the connection
Longevity: how long the connection can sustain before been interrupted abnormally
Throughput: the data transfer speed of the connection

The three factors can all be measured passively (as well as actively), without the need to create extra traffic at all (though it will increase the probe/warmup time if done passively):

First, let's talk about latency measurement: local knows when it received the request from a client, it also knows when it received the first respond from a server. Thus the latency can be measured by simply Latency = FirstMeaningfulRespondReceived - RequestStartTime, this works for both TCP and UDP proxy request.

As for longevity measurement: local knows when a connection is established and when it is been closed, thus it can create connection lifespan record for each connection of each server. Note the lifespan count does not indicate the maximum capability of a server (to sustain a long connection), instead it tells local that a server has successfully hosted a connection that lived at most x seconds long in the past. And based on that information, we can make better assumption about the network connectivity between local and a server.

And finally, throughput measurement: local knows the transfer rate of a connection between local and a server. Since connections can be inactive at times, just like in the longevity measurement above, it does not tell the accurate maximum transfer rate of each server, but it indicates that a server is capable of hosting a connection that can at least reach the speed of x bytes per second. We can then use the measurement to make better assumption about the network throughput.

Of course, in real life, the situation is much complex. For one, the record created for load balancing purpose must be expired after certain amount of time so the load balancer always routes to the server that offers better quality. Another thing is, the connection quality between serverA and remote target A might be different compare to the quality between serverB and remote target A, thus the connection quality record must be created for each remote target instead of each server.

(local means ss-local or equivalent, server means ss-server or equivalent, "client" means a socks5 client, and remote means the target that the socks5 client is trying to connect to)

Just my two cents.

Fundamentally, the protocol Shadowsocks employed is too simple to be optimized. I wish there will be a Shadowsocks2 in the future and it supports connection multiplex and resumption. I can only hope :)

database64128 commented 2 years ago

First, let's talk about latency measurement: local knows when it received the request from a client, it also knows when it received the first respond from a server. Thus the latency can be measured by simply Latency = FirstMeaningfulRespondReceived - RequestStartTime, this works for both TCP and UDP proxy request.

User-initiated connections are not good indicators of latency, unless you operate the TCP stack yourself, and can see how the remote peer ACKs your outgoing sequences. Your strategy won't work with UDP at all. For example, DNS servers might take up to a few seconds to send a response.

As for longevity measurement: local knows when a connection is established and when it is been closed, thus it can create connection lifespan record for each connection of each server. Note the lifespan count does not indicate the maximum capability of a server (to sustain a long connection), instead it tells local that a server has successfully hosted a connection that lived at most x seconds long in the past. And based on that information, we can make better assumption about the network connectivity between local and a server.

Measuring how long connections last does not guarantee fair competition. The longer sslocal stays on a server, the more long-lasting connections sslocal is going to observe.

And finally, throughput measurement: local knows the transfer rate of a connection between local and a server. Since connections can be inactive at times, just like in the longevity measurement above, it does not tell the accurate maximum transfer rate of each server, but it indicates that a server is capable of hosting a connection that can at least reach the speed of x bytes per second. We can then use the measurement to make better assumption about the network throughput.

sslocal does not automatically know the transfer rate, unless you implement it, which is not cheap. And again, just like your "longevity measurement", this also seems more like a server lottery.

database64128 commented 2 years ago

Fundamentally, the protocol Shadowsocks employed is too simple to be optimized. I wish there will be a Shadowsocks2 in the future and it supports connection multiplex and resumption. I can only hope :)

Quite the opposite, simplicity is how you gain performance and leave room for optimizations. Connection multiplexing is not an optimization. Connection resumption sounds like reinventing the wheel.