Closed ahus1 closed 1 year ago
Thanks for reporting this issue @ahus1 !
I checked locally and I can reproduce, I have a few considerations:
KUBE_PING
resolves the issue (should we try it out?)Even when this message appears Infinispan eventually recovers and the cluster is always formed, I'd tend to say that this is not causing any user-facing issue but is just an internal "retry mechanism" and can be considered "normal" behavior. Waiting for confirmation(or not) from the ppl pinged in this message.
@andreaTP @ahus1 I can confirm this does not only happen on Minikube but also on Azure AKS. A typical picture of pod state after around five minutes of scaling up from 1 to 10 is this: After a while all pods started up succesfully, but I don't think this is desirable and should be ignored. Please also note that in all my experiments the caches were empty. I assume the behaviour/errors get much worse when Infinispan is actually transferring data to balance caches.
@belaban Is this warning safe to ignore and how can it be avoided?
@ryanemerson What happens is that JGroups tries to send a message to B (logical name). To do this, it does a lookup with key=B in the uuids hashmap in the transport, returning IP address 1.2.3.4. It then uses this IP address to establish a TCP connection to 1.2.3.4 and sends the message. The warning you're seeing is because there is no key/value pair for B in the uuids hashmap. This hashmap is populated by the discovery protocol, IIUIC DNS_PING in this case. I've experienced slowness with DNS_PING under Kubernetes before, providing information (IP address) of newly (re-)started pods. IIRC, this can be configured (tolerate-unready-endpoints?) in the YAML... In any case, this warning should disappear once the DNS lookup for B does indeed return 1.2.3.4...
Reading @belaban's comment, I assume those are the things to try out:
publishNotReadyAddresses indicates that any agent which deals with endpoints for this Service should disregard any indications of ready/not-ready. The primary use case for setting this field is for a StatefulSet's Headless Service to propagate SRV DNS records for its Pods for the purpose of peer discovery. The Kubernetes controllers that generate Endpoints and EndpointSlice resources for Services interpret this to mean that all endpoints are considered "ready" even if the Pods themselves are not. Agents which consume only Kubernetes generated endpoints through the Endpoints or EndpointSlice resources can safely assume this behavior.
Although this is not a stateful set, we're behaving like one with service discovery. The spinning up of Infinispan happens way before Keycloak is ready, so it make sense to me. I'm quite surprised that it works without this so well.
Besides that, I'd also be happy to give Kubeping a try if we can get it running with a realmroles and not clusterroles as described in the https://github.com/jgroups-extras/jgroups-kubernetes repo.
I've been thinking about @belaban's comment again, and I wonder if it doesn't explain all of the behavior I see:
What I understand: The b4432256-a5bd-4023-9992-6da854e9e973
is the ID (?) of a node that the other nodes try to resolve to an IP address using DNS. This leads to the "no physical address" message on any node that tries to resolve it.
What I don't yet understand: How does this ID that looks random and like a UUID ends up in the list of nodes in the first place? For all other nodes it is the pod name ("keycloak-84b48bc8dc-h9dms-15950"). That UUID is nothing that would come up via DNS. If it is something that the node itself published to the other peers as its own host name, then something seems to be broken or at least fragile there.
To figure out its own hostname, there is the chance to pass down the pod name as an environment variable to the pod using the "downward API": https://kubernetes.io/docs/tasks/inject-data-application/environment-variable-expose-pod-information/
That would always be a local operation to determine one's hostname, and might be less prone to failure.
Would that help here? WDYT?
What I'm getting is that DNS(especially on Kubernetes) has some latency and this affects Infinispan.
Other than trying to fight the DNS settings etc. (it's likely to be a rabbit hole and possibly dependent on the host/provider) I'm more inclined to give a spin to KUBE_PING
which has been reported to be more stable.
@ahus1 can you provide a link for "with a realmroles and not clusterroles" as, all I can see in the Readme, are ClusterRoles
.
@andreaTP - I see that I mixed up the wording here, it is Role
and not RealmRole
- some wording from Keycloak sneaked in here.
This is all about minimal privileges, and I would assume that as long as it stays within the same namespace, it wouldn't require cluster permissions.
Therefore I suggest the following:
If that doesn't work as expected, I'd ask the Infinispan team for help. If it does work, create a PR for KubePing's README :-)
I agree that delays with DNS_PING might depend on the DNS service being used inside the cluster. We never had issues with it on Azure AKS, thats why I probably also did not see the errors @ahus1 saw scaling up a stateful set. But using KUBE_PING instead might be worth a try.
It was before my time working on the Infinispan Operator, but I believe the need for a ClusterRole with KUBE_PING was why DNS_PING was favoured for our implementation (@rigazilla might remember the exact details) . In general we haven't had many issues using DNS_PING.
This bug is not restricted to Keycloak Operator. I am seeing it on Keycloak 19.0.2 without operator running on a Nomad cluster using DNS ping.
When I bring up a cluster, e.g. with 8 nodes, often only 6 nodes enter the cluster. When I see the error, there is a general problem because all the nodes are added to the servers' DNS servicename because they pass their health checks (basically curl to the health endpoint and checking port 7800 is open). So the 2 nodes not on the cluster are also reachable by clients and thus causing problems.
I've been thinking about @belaban's comment again, and I wonder if it doesn't explain all of the behavior I see:
What I understand: The
b4432256-a5bd-4023-9992-6da854e9e973
is the ID (?) of a node that the other nodes try to resolve to an IP address using DNS. This leads to the "no physical address" message on any node that tries to resolve it.What I don't yet understand: How does this ID that looks random and like a UUID ends up in the list of nodes in the first place? For all other nodes it is the pod name ("keycloak-84b48bc8dc-h9dms-15950"). That UUID is nothing that would come up via DNS. If it is something that the node itself published to the other peers as its own host name, then something seems to be broken or at least fragile there.
To figure out its own hostname, there is the chance to pass down the pod name as an environment variable to the pod using the "downward API": https://kubernetes.io/docs/tasks/inject-data-application/environment-variable-expose-pod-information/
That would always be a local operation to determine one's hostname, and might be less prone to failure.
Would that help here? WDYT?
Yes, b4432256-a5bd-4023-9992-6da854e9e973
is the address of a member. It is a UUID and is used for all communication. "keycloak-84b48bc8dc-h9dms-15950"
is the logical name associated with a member, and is not required for communication; it acts as mere syntactic sugar to get more legible logs.
There are 2 caches: one between UUIDs and IP addresses/ports and one between UUIDs and logical names. Both are populated on startup of a member, and when new members join. The former is essential and communication between members cannot happen without it. The latter is not needed for communication, but your logs will look terrible until it is populated.
If you have a JGroups JChannel
you can always get the local address, IP address for any member etc, by fetching a ref to TP (transport):
TP tp=channel.getProtocolStack().getTransport();
I'm running into issues with using DNS_PING
on my Kubernetes cluster, how do I switch the new quarkus distribution to use KUBE_PING
instead?
When my nodes get rolled/pods get restarted the Keycloak pods will have just two out of three join the cluster when using DNS_PING
.
I tried setting the headless server to ignore the health check thereby allowing JGROUP to see the full cluster state, but that does not help and one of the pods continues to spin and never join the cluster.
These are the messages seen in the pod logs:
2023-01-04 20:42:46,677 WARN [org.jgroups.protocols.pbcast.GMS] (VERIFY_SUSPECT.TimerThread-38,keycloak-774bdfb494-js4sw-43683) keycloak-774bdfb494-js4sw-43683: I'm not the coordinator (or next-in-line); dropping LEAVE request
2023-01-04 20:42:51,740 WARN [org.jgroups.protocols.TCP] (TQ-Bundler-7,keycloak-774bdfb494-js4sw-43683) JGRP000032: keycloak-774bdfb494-js4sw-43683: no physical address for f968cc16-cd7e-4c7c-bbe3-756454a72129, dropping message
2023-01-04 20:42:55,679 WARN [org.jgroups.protocols.TCP] (TQ-Bundler-7,keycloak-774bdfb494-js4sw-43683) JGRP000032: keycloak-774bdfb494-js4sw-43683: no physical address for f968cc16-cd7e-4c7c-bbe3-756454a72129, dropping message
Here's the yaml files for defining the Keycloak cluster:
apiVersion: v1
kind: Service
metadata:
namespace: keycloak-test
name: keycloak
labels:
app: keycloak
spec:
publishNotReadyAddresses: true
ports:
- name: http
port: 8080
targetPort: 8080
selector:
app: keycloak
clusterIP: None
---
apiVersion: apps/v1
kind: Deployment
metadata:
namespace: keycloak-test
name: keycloak
labels:
app: keycloak
spec:
replicas: 3
selector:
matchLabels:
app: keycloak
template:
metadata:
labels:
app: keycloak
spec:
containers:
- name: keycloak
image: quay.io/keycloak/keycloak:20.0.2
args: ["start"]
env:
- name: KEYCLOAK_ADMIN
value: "admin"
- name: KEYCLOAK_ADMIN_PASSWORD
value: "badadminpassword"
- name: KC_PROXY
value: "edge"
- name: JAVA_OPTS_APPEND
value: "-Djava.net.preferIPv4Stack=false -Djava.net.preferIPv6Addresses=true -Djgroups.dns.query=keycloak.keycloak-test.svc.cluster.local"
- name: KC_DB
value: "postgres"
- name: KC_DB_PASSWORD
value: "simpletestpassword"
- name: KC_DB_URL
value: jdbc:postgresql://someuri.rds.amazonaws.com/keycloak
- name: KC_DB_USERNAME
value: "keycloak"
- name: KC_FEATURES
value: "admin-fine-grained-authz,client-secret-rotation,recovery-codes,token-exchange"
- name: KC_HEALTH_ENABLED
value: "true"
- name: KC_HOSTNAME
value: "sso.test.example.com"
- name: KC_CACHE_STACK
value: "kubernetes"
ports:
- name: http
containerPort: 8080
readinessProbe:
httpGet:
path: /realms/master
port: 8080
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: keycloak-external
namespace: keycloak-test
annotations:
alb.ingress.kubernetes.io/backend-protocol: HTTP
alb.ingress.kubernetes.io/healthcheck-protocol: HTTP
alb.ingress.kubernetes.io/ip-address-type: dualstack
alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443},{"HTTP":80}]'
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/ssl-redirect: '443'
alb.ingress.kubernetes.io/target-type: ip
alb.ingress.kubernetes.io/group.name: keycloak-test
spec:
ingressClassName: alb
tls:
- hosts:
- sso.test.example.com
rules:
- host: sso.test.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: keycloak
port:
number: 8080
I'm coming back to this issue because I was able to fix the cause of the issue on my end... I now no longer have the follow log entries:
2023-01-04 20:42:55,679 WARN [org.jgroups.protocols.TCP] (TQ-Bundler-7,keycloak-774bdfb494-js4sw-43683) JGRP000032: keycloak-774bdfb494-js4sw-43683: no physical address for f968cc16-cd7e-4c7c-bbe3-756454a72129, dropping message
What I did is to force infinispan to bind to the global
IP's only:
-Djgroups.bind_addr=global
Added to JAVA_OPTS_APPEND
. I am running in an all IPv6 cluster on AWS, so all the pods receive global IPv6 addresses, this change forces jgroups
to ignore the link-local IPv6 address and instead bind to the global IPv6 address.
I assume that upon starting up, it attempts to advertise itself to the remote system, but the IP address in that advertisement did not match the IP that was published in the k8s service, and that is why the physical address for blah
message would appear. It appears there is some sort of race condition where sometimes maybe a remote node would reach out before it tried to self-advertise? I am not sure actually... but this fixed the issue for me.
Now my cluster comes alive and joins without issues at all, I was able to roll my entire cluster without having any container restarts, and haven't had any more issues with Keycloak failing to start due to the cache not coming online.
See documentation for Jgroups/Infispan here: http://jgroups.org/manual4/index.html#Transport which specifies the special values you can use for the bind_addr
variable.
@ahus1 I was unable to reproduce anymore with 21.1.1 (using IPv4-only cluster, IPv6/dualstack is a different story). Not sure what fixed it, possibly some ISPN upgrade. Do you think it's still valid?
I also tested this with nightly, can't reproduce it. Also tried to force it by re-configuring the StatefulSet without startup probes, so that pods starts more concurrently.
So I'm closing it as it is no longer reproducible.
Describe the bug
When scaling Keycloak on Minikube with the new operator, I receive a warning about "no physical address" like this, and the scaling is delayed. Eventually it settles, but I am not sure if that node is really a healthy member of the cluster.
Version
999-SNAPSHOT
Expected behavior
Smooth scaling/up-down, no error or warning messages from Infinispan
Actual behavior
Delayed scaling up/down, warning messages from Infinispan
How to Reproduce?
Approx. 1 out of 10 newly started pod instances will trigger that error message. The error message appears on that pod, and also on the other pods that are part of the Infinispan cluster being formed.
Anything else?
Previous discussions and PRs: #12447 and #11763