CarsonCook commented 3 years ago

Is your feature request related to a problem? Please describe. Services want to handle user requests from the same instance of that service, but the gateway only balances workloads in a round robin fashion.

Describe the solution you'd like Deterministic routing for user requests, where each request with a user ID goes to the same service instance it originally was routed to. Additionally, a configurable limit for the number of users each instance can handle.

MVP see the outline in the discussion. The following is outdated.

[ ] The user ID for the request is routed to the same server as its original request
[ ] Configurable limits for the max number of users that each service instance can handle
[ ] When all instances are at or over their user limit and a new user makes a request, round robin selection and log a warning
[ ] If the instance for a user request goes down, then route to a new instance and count this user against the new instance's limit

Extensions

[ ] Make the behaviour when all instances are at or over their user limit configurable, with options of a) logging a warning; or b) returning a 500 error
[ ] Make the load balancing behaviour configurable, with options of a) go round robin until an instance limit is hit; or b) send all requests to the first instance until its limit is hit, then requests to the 2nd instance until its instances is hit, etc.

Additional context @dmcknight

anton-brezina commented 3 years ago

Keep in mind we need to support the HA setup here.

Does DVIPA support this?
Can we guarantee this behaviour in HA setup?
Shall we implement this behaviour in the caching service?
Will there be any performance impact (i.e. go to cache every time before routing)?

anton-brezina commented 3 years ago

Depends on #1359

jandadav commented 3 years ago

Goals:

We heard from multiple extenders and from core services that they have a case: I want to talk to one particular instance There are different motivations for this, we heard: Because the instance is the one i need to talk to (a specific system, specific console ...) Because there might be state that does not get distributed around to other instances (session)

Joe’s Tomcat server with 100 Java threads. When a user logs in, they keep a thread in anticipation of the user coming back. One user can have a lots of threads across multiple instances of the one user. The user can log off, when the user logs off, possibly free the thread. zOSMF actually has the same issue. The TSO session is actually long-lived and use the same optimization. Mainframe workloads related to development or CICD tend to be isolated to particular LPAR’s

Problems:

Load balancing (LB/DLB) Rate Limiting (RL/DRL) User Limiting - we need more information

I would argue that Rate Limiting, while useful for obvious reasons, should not be part of our MVP. Both have value on their own. One does not require the other. It seems that we are in agreement that Rate Limiting can be sacrificed.

Session based solution:

Based on a token that represents client (apiml auth cookie), we can recognize a client and provide deterministic routing. That means we would store where the client has been routed in past and distribute this knowledge between Gateways. Upon next request from the client, we recognize him by the token and route the request to the previously routed server. Positives: Without client’s interaction Negatives: We have to store, resolve and lifecycle the session Client must be identifiable

Client based solution:

When any client gets routed, we would return (cookie, header) an information, where the request routed to. On next call, the client can provide a token (cookie, header) and request the same instance as last time. Gateway will see this request and route as requested.

Benefits: Less code to break Does not suffer from synchronization issues across Gateways Works also for unauthenticated requests Client can choose what instance he wants

Negatives: Client has to take action (could be alleviated by using cookies) Does not carry the rate limiting capabilities

Hybrid

. . .

Deterministic route based on token / Sharding

When user authenticates for instance, the token will dictate where the users get routed in predictable fashion

Positives: Client does not need to take action

Negatives: How to manage changing services?

Considerations:

Identifying the user/session: Token IP

Transferring the session Cookie Header

Configuration Default (off) Service can say what it wants How deep do we want to load balance? (ServiceId <-> path, Composite API’s like zosmf)

Transferability of solution to SC Gateway

Model rejection strategy Reject

Security of headers Header spoofing

CarsonCook commented 3 years ago

@jandadav the extender has confirmed the client based solution will work for them.

jandadav commented 3 years ago

Proposal for follow-up stories to finish the load balancing implementation

Configurable load balancer setup for individual services

As a Zowe conformant application developer I can Configure the load balancer for my service with predefined load balancing schemas So that I can Achieve the load balancing scheme that is desirable for my application

This will mean to implement: PredicateFactory that is aware of the service's registration metadata Enhances the context's Environment with the metadata Constructs the load balancing beans conditionally

Authentication based server side load balancing

As a Zowe conformant application developer I can Call my application's API with Zowe authentication through single instance of API Gateway and always get to the same instance of my service for a given period of time. So that I can Protect against additional user-related address spaces spawned by my application without changing its code.

This will mean to implement: A balancing bean that: Recognizes requests by Zowe authentication - User. User has multiple JWT's so we have to understand who is calling. Unauthenticated requests? - not sure if it's universal, Carson will check with the extender (pervasive or restrictive) If there is no preference, routes the request to round robin and stores preference. If there is preference, routes the requests to the same instanceId as the preference Lifecycle: Expiry of preference after configurable time period is exceeded since last request

Authentication based distributed server side load balancing

As a Zowe conformant application developer I can Call my application's API with Zowe authentication through any instance of API Gateway and consistently get to the same instance of my service for a given period of time. So that I can Protect against additional user-related address spaces spawned by my application without changing its code. And I can do that against any Gateway instance and get consistent behavior.

This will mean to implement: Whatever was developed for the previous story will have to be stored in caching service

Spike: Investigate and document performance of deterministic routing in HA setup

1413

jandadav commented 3 years ago

The current state of implemented infrastructure looks like this:

zowe / api-layer

Configurable deterministic routing #1378

Goals:

Problems:

Session based solution:

Client based solution:

Hybrid

Deterministic route based on token / Sharding

Considerations:

Configurable load balancer setup for individual services

Authentication based server side load balancing

Authentication based distributed server side load balancing

Spike: Investigate and document performance of deterministic routing in HA setup

1413