Open brandond opened 5 days ago
The loadbalancer server list is a bit of a mess. its behavior has been tinkered with a lot over the last year, but it's still hard to reason about. This has caused a spate of issues:
From a code perspective, the loadbalancer state is directly accessed by a number of functions that all poke at various index vars, current and default server name vars, a list of server addresses, another RANDOM list of server addresses, and a map of addresses to structs that hold state: https://github.com/k3s-io/k3s/blob/cd4ddedbc9782cbe9b5dcc411df2addae7b2f3b4/pkg/agent/loadbalancer/loadbalancer.go#L43-L53
The DialContext function is called whenever a new connection comes in, and holds a read lock while iterating (possibly twice) over the random server list, and servers may be added or removed at any time. The code is VERY hard to read and understand, given the number of variables involved: https://github.com/k3s-io/k3s/blob/cd4ddedbc9782cbe9b5dcc411df2addae7b2f3b4/pkg/agent/loadbalancer/loadbalancer.go#L162-L208
We should simplify the load-balancer behavior so that it functions more reliably, and its functionality is easier to understand and explain.
The loadbalancer server list is a bit of a mess. its behavior has been tinkered with a lot over the last year, but it's still hard to reason about. This has caused a spate of issues:
From a code perspective, the loadbalancer state is directly accessed by a number of functions that all poke at various index vars, current and default server name vars, a list of server addresses, another RANDOM list of server addresses, and a map of addresses to structs that hold state: https://github.com/k3s-io/k3s/blob/cd4ddedbc9782cbe9b5dcc411df2addae7b2f3b4/pkg/agent/loadbalancer/loadbalancer.go#L43-L53
The DialContext function is called whenever a new connection comes in, and holds a read lock while iterating (possibly twice) over the random server list, and servers may be added or removed at any time. The code is VERY hard to read and understand, given the number of variables involved: https://github.com/k3s-io/k3s/blob/cd4ddedbc9782cbe9b5dcc411df2addae7b2f3b4/pkg/agent/loadbalancer/loadbalancer.go#L162-L208
We should simplify the load-balancer behavior so that it functions more reliably, and its functionality is easier to understand and explain.