Add CLI options for the Hot Rod client

pruivo commented 1 month ago

Description

Most of the Hot Rod client is hardcoded in the Keycloak source code, and the only available CLI options are cache-remote-host, cache-remote-port, cache-remote-username, cache-remote-password, and cache-remote-tls-enabled. This GHI proposes to add a couple more options.

Host List

Configure multiple Infinispan servers. The client uses this list for the initial discovery exception when client intelligence is set to BASIC.

Approved?	Option	Type	Default	Description
:question:	`cache-remote-servers`	List	--	It configures the list of Infinispan server hosts in the format `host[:port]`.

Connection Pool

Configuring the connection pool allows the Hot Rod client to control the resource usage.

Approved?	Option	Type	Default	Description
:question:	`cache-remote-connection-pool-max-active`	Integer	4	It configures the maximum number of connections per each Infinispan server.
:question:	`cache-remote-connection-pool-exhausted-action`	Enum `ExhaustedAction`	--	It configures the action when the pool has become exhausted. Infinispan default is `WAIT`
:question:	`cache-remote-connection-pool-max-wait`	Integer	5000	It configures the time in milliseconds to wait for a connection to become available when the exhausted action is `WAIT`.
:x:	`cache-remote-connect-timeout`	Integer	--	It configures the maximum socket connect timeout in milliseconds before giving up connecting to the Infinispn server. Infinispan default is 2s.
:white_check_mark:	`cache-remote-socket-timeout`	Integer	5000	It configures the maximum socket read timeout in milliseconds before giving up waiting for bytes from the Infinispan server.

Notes:

ExhaustedAction Enum Javadoc
Setting max-active to 4 based on the nightly cross-site runs, which use between 3 and 5 connections.
The max-wait is set to 5000 seconds to avoid waiting forever.
Increased the socket-timeout to at least 5 seconds to match the cross-site remote timeout (currently at 5 seconds); it prevents the client from failing prematurely.
The remaining options use the defaults from Infinispan.
Other Infinispan configuration are minIdle=1, minEvictableIdleTime=30min, maxPendingRequests=5. Those defaults are fine for Keycloak and I don't have a strong motivation to make them configurable.

Client Intelligence

The client intelligence configures the information about the Infinispan cluster that is sent back to the Hot Rod client. It can be nothing, which means the client only uses the hostname/port configured or full information where the Infinispan server IPs and data location are sent to the client.

Approved?	Option	Type	Default	Description
:white_check_mark:	`cache-remote-client-inteligence`	Enum `ClientIntelligence`	--	It configures the Hot Rod client intelligence. Infinispan default is `HASH_DISTRIBUTION_AWARE`

ClientIntelligence Enum Javadoc

Discussion

No response

Motivation

The client intelligence is strongly dependent on the deployment architecture. For example, if the Infinispan cluster is behind a load balancer or a proxy, it is currently impossible to connect Keycloak to it. It requires BASIC intelligence.

The connection pool option allows the user to fine-tune the resources and response times, which directly impacts the Keycloak response time in failover scenarios. From the list above, the weakest is the connect-timeout, and it may be dropped.

Details

No response

pruivo commented 1 month ago

@ryanemerson @mhajas @ahus1 @keycloak/cloud-native please provide feedback. @wburns, if you have the chance, can you review the connection pool defaults, please?

ryanemerson commented 1 month ago

+1 on allowing the Client intelligence to be configurable via the CLI as, like you say, that setting really does depend on the deployment architecture.

I'm a little more hesitant about adding the other options. Adding these means that there's more options to document and potentially confuse users, so we have the trade-off between usability and tunability.

Do we have any specific use-cases, architectures or customer cases where we know that the default values are not sufficient?

pruivo commented 1 month ago

Only a support case where we found out that the TLS handshake was causing tail latencies and it was better to wait for a connection to be available.

There is another reason for not adding the options for the connection pool. Will's new client that will be available in ISPN 16 does not use any connection pool and just a single connection per server.

Our current default is an unbounded connection pool (that's my fault).

https://github.com/keycloak/keycloak/blob/464fc90519088e289c84c5a8da248490665414f3/quarkus/runtime/src/main/java/org/keycloak/quarkus/runtime/storage/infinispan/CacheManagerFactory.java#L154

I am still making pressure for 2 things:

Changing the default values in the code, 4 connections top, wait with a maximum of 5 seconds.
An option to configure the timeout (cache-remote-socket-timeout or any other name). It must not be smaller than the one configured in the cache: https://www.keycloak.org/high-availability/deploy-infinispan-kubernetes-crossdc

mhajas commented 1 month ago

Personally I don't have any strong opinion on the connection pool configs. I am not sure how often people want to change that but sounds like something that should be configurable. On the other hand as mentioned by @ryanemerson, we need to then properly document when and how people should be setting those.

I have one question though. When HASH_DISTRIBUTION_AWARE is used, Hot Rod client distributes the keys across all Infinispan nodes based on a hash function.

However, from where Keycloak knows about all Infinispan nodes? Is it possible to use cache-remote-host to provide some list of hosts/ip addresses? If not, does the client load the hosts automatically from the node that cache-remote-host points to?

How does it work in current multi-site setup? We specify Infinispan host and this host is pointing to some load balancer on Kubernetes level. Should we then use BASIC instead of the default?

ahus1 commented 1 month ago

+1 for adding these options, I see use cases for them, and (as Pedro mentioned) with cache-remote-connect-timeout being the least used one. I'm also fine with the naming.

I'm a bit skeptic to set cache-remote-connection-pool-max-active to 4. I'd think that 16 would be more on the safe side? I understand such a connection would be blocked for while an xsite request is being processed.

I would guess identifying a bottleneck here would be hard to find. Do we have any metrics from our current tests on KCB (I kind of remember we didn't have metrics on the client side today, but maybe we have on the server side)? If we can't provide any good metrics or sizing for it, I would assume that ExhaustedAction with CREATE_NEW might still be valid in our setup.

ryanemerson commented 1 month ago

There is another reason for not adding the options for the connection pool. Will's new client that will be available in ISPN 16 does not use any connection pool and just a single connection per server.

My understanding was the plan is to drop the old client entirely in 16.0.x, is that the case? Even if it isn't, I assume the intention would be to migrate to the new offering due to the various perf benefits. If so, then it's not worth adding the various configuration pool options given that they won't have any effect in the future.

Changing the default values in the code, 4 connections top, wait with a maximum of 5 seconds.

I don't have strong views on 4 vs 16, but +1 to a sensible default.

An option to configure the timeout (cache-remote-socket-timeout or any other name). It must not be smaller than the one configured in the cache: https://www.keycloak.org/high-availability/deploy-infinispan-kubernetes-crossdc

+1

pruivo commented 1 month ago

@mhajas The key distribution is assigned by the servers and not the clients. Each request sent by the client has the topology ID with it and, when it doesn't match the server's topology Id, the server appends the new topology in the response. The topology response contains the pods IP, hashing function name, and segment ownership.

In our deployment, the first connection goes through the Kubernetes load balancer but is closed after getting the response, together with the Pods IP. The client then skips the load balancer and directs the requests to the key owners using the Pod IP.

The BASIC must be used when the Keycloak does not have access to the Pod IP (for example, it is outside the cluster).

mhajas commented 1 month ago

Thanks for the explanation @pruivo!

Does this line: https://github.com/keycloak/keycloak/blob/42251b3a13b0952e6c36dcadd7469f373575a81e/quarkus/runtime/src/main/java/org/keycloak/quarkus/runtime/storage/infinispan/CacheManagerFactory.java#L432

support specifying more hosts then? Or we need to call addServer for each host when the BASIC intelligence is used? If the latter we should make it part of this issue. Not sure how we will then match host and port if they differ.

pruivo commented 1 month ago

@ahus1 The connection is not blocked for the entire request. The request fetches one connection from the pool, writes and flushes the channel, and returns it to the pool. We also found out that it is better to wait than create a new one (it is costly to create a new TCP connection + a new SSL session).

pruivo commented 1 month ago

@mhajas unfortunately, no. That method only accepts a single hostname. For multiple servers, you can invoke in a loop builder.addServer()

builder.addServer().host("host").port(11222)

Or use the method builder.addServers(String)

pruivo commented 1 month ago

@ryanemerson that's the plan. When the new client is available, test it and use it. We should be able to use it in Keycloak 27, if the product's timeline does not screw us :)

ahus1 commented 1 month ago

The request fetches one connection from the pool, writes and flushes the channel, and returns it to the pool.

How does the response from the server reaches the client if it doesn't hold on to the channel?

We also found out that it is better to wait than create a new one (it is costly to create a new TCP connection + a new SSL session).

I can see why this is reasonable to do.

ahus1 commented 1 month ago

When we're adding BASIC, it could make sense to enhance the current option to specific multiple hostnames in a commaseparated way.

I see that HASH_DISTRIBUTION_AWARE really makes a difference here, so I'm not keep to drop down to BASIC, although it would be the safe default. What kind of error message would you see if HASH_DISTRIBUTION_AWARE wouldn't work? We could reference the error message in our docs, so people would know what to do.

I assume it would fail pretty fast (during startup)?

pruivo commented 1 month ago

@ahus1 Each request has a unique ID generated by the client. So, when the client receives a response with id=x, it knows it belongs to the request with id=x.

What kind of error message would you see if HASH_DISTRIBUTION_AWARE wouldn't work?

Connection refused. The Pod IP is not accessible from the internet.

When we're adding BASIC, it could make sense to enhance the current option to specific multiple hostnames in a commaseparated way.

When deploying in Kubernetes/Openshift, you get a single hostname for the NLB or the Route. If a user deploys on premised with multiple load balancers, then we need to change our code. I'm not sure how frequently this architecture is used.

ahus1 commented 1 month ago

When deploying in Kubernetes/Openshift, you get a single hostname for the NLB or the Route.

... or the service. I know. Using a single hostname works for me here.

If a user deploys on premised with multiple load balancers, then we need to change our code. I'm not sure how frequently this architecture is used.

On premise, there could be the following setup: Three Infinispan nodes that create a cluster, with no LB in front of them and the option to directly connect to them. Which of the three IPs should I place in Keycloak's config? If I could place all three IPs in Keycloak's configuration, I would hope the HotRod client would try to connect to each of them until it is successful and then retrieves the list of active nodes. Still, we could say "always use a LB even for on-premise deployments".

pruivo commented 1 month ago

On premise, there could be the following setup: Three Infinispan nodes that create a cluster, with no LB in front of them and the option to directly connect to them. Which of the three IPs should I place in Keycloak's config? If I could place all three IPs in Keycloak's configuration, I would hope the HotRod client would try to connect to each of them until it is successful and then retrieves the list of active nodes. Still, we could say "always use a LB even for on-premise deployments".

Having a single IP configured would be sufficient. Having all of them would make it more robust though.

pruivo commented 1 month ago

Multi-values are supported: https://github.com/keycloak/keycloak/blob/f0162db56f4cf7219361467c0de1a325dac9d094/quarkus/config-api/src/main/java/org/keycloak/config/OptionBuilder.java#L30

Should we create a new option? cache-remote-servers that accepts a list of servers in the format host[:port]?

ahus1 commented 1 month ago

Should we create a new option? cache-remote-servers that accepts a list of servers in the format host[:port]?

+1 for having an option to add multiple hosts for on-premise setups

I assume when the connection is lost to all known hosts, the HotRod client will restart the discovery with the configured values?

pruivo commented 1 month ago

I assume when the connection is lost to all known hosts, the HotRod client will restart the discovery with the configured values?

Yes, it discards the topology information and fallbacks to the configured servers until one of them replies and sends the up-to-date topology information.

pruivo commented 1 month ago

Updated Description

Added cache-remote-servers to the table
Added a column with approved based on the SRE team feedback. Both cache-remote-socket-timeout and cache-remote-client-inteligence agreed and required The cache-remote-connect-timeout is not required and will not be added The cache-remote-connection-pool-* is unknown but the sentiment is to drop them The cache-remote-servers requires feedback from the SRE team.

A final approval from the cloud-native team is required too.

ryanemerson commented 1 month ago

The cache-remote-servers requires feedback from the SRE team.

I think it makes sense to add. Do we deprecate cache-remote-host and cache-remote-port with a view to remove in 27?

pruivo commented 1 month ago

The cache-remote-servers requires feedback from the SRE team.

I think it makes sense to add. Do we deprecate cache-remote-host and cache-remote-port with a view to remove in 27?

I don't have a strong opinion. I'm fine either way.

pruivo commented 1 month ago

For the record, this is the total number of connections in the Infinispan server for this nightly performance testing. Since we have 3 KC pods, each KC pod has an average of 3 to 4 connections. Screenshot 2024-10-16 at 12-54-28 Explore - Prometheus - Grafana

keycloak / keycloak