lablup / backend.ai

Backend.AI is a streamlined, container-based computing cluster platform that hosts popular computing/ML frameworks and diverse programming languages, with pluggable heterogeneous accelerator support including CUDA GPU, ROCm GPU, TPU, IPU and other NPUs.
https://www.backend.ai
GNU Lesser General Public License v3.0
519 stars 153 forks source link

Improvement for multi-endpoint support in client SDK #230

Closed achimnol closed 2 years ago

achimnol commented 4 years ago

Currently it randomize the selection of endpoints upon startup and rotates the endpoints only when there are connection failures. Let's extend it to support endpoint rotation based on timing and connection counts.

achimnol commented 3 years ago

There are two load balancing points in the client SDK:

  1. When creating the API session, random.shuffle the endpoint list.
  2. When repeating requests using a single API session, rotate the endpoint list. The first one is already implemented, and lablup/backend.ai-client-py#150 is about the second one referred in this issue.

Currently the console-server uses the first method because it instantiates and discards API session objects per HTTP request taking the credential information from the HTTP cookie. So we are postponing this issue as a future work with no deadline.

achimnol commented 3 years ago

Instead, we decided to improve the wsproxy related components to spread their connections to multiple manager endpoints, while keeping its the cookie-based session handlers interacting with the console server.

achimnol commented 2 years ago

Closing it as stale.