When scaling Keycloak with the new operator, there are warnings "no physical address" from Infinispan

keycloak / keycloak

Open Source Identity and Access Management For Modern Applications and Services

https://www.keycloak.org

Apache License 2.0

22.8k stars 6.69k forks source link

When scaling Keycloak with the new operator, there are warnings "no physical address" from Infinispan #12554

Closed ahus1 closed 1 year ago

ahus1 commented 2 years ago

Describe the bug

When scaling Keycloak on Minikube with the new operator, I receive a warning about "no physical address" like this, and the scaling is delayed. Eventually it settles, but I am not sure if that node is really a healthy member of the cluster.

{"timestamp":"2022-06-15T07:24:14.858Z","sequence":8428,"loggerClassName":"org.infinispan.util.logging.Log_$logger","loggerName":"org.infinispan.CLUSTER","level":"INFO","message":"ISPN000094: Received new cluster view for channel ISPN: [keycloak-84b48bc8dc-h9dms-15950|16] (7) [keycloak-84b48bc8dc-h9dms-15950, keycloak-84b48bc8dc-pmprq-63712, keycloak-84b48bc8dc-pxjdq-42791, keycloak-84b48bc8dc-lq5ds-50869, keycloak-84b48bc8dc-rtsjb-42104, keycloak-84b48bc8dc-5pj7r-54145, b4432256-a5bd-4023-9992-6da854e9e973]","threadName":"jgroups-17,keycloak-84b48bc8dc-5pj7r-54145","threadId":38,"mdc":{},"ndc":"","hostName":"keycloak-84b48bc8dc-5pj7r","processName":"QuarkusEntryPoint","processId":1} {"timestamp":"2022-06-15T07:24:14.871Z","sequence":8429,"loggerClassName":"org.jboss.logging.Logger","loggerName":"org.infinispan.CLUSTER","level":"INFO","message":"ISPN100000: Node b4432256-a5bd-4023-9992-6da854e9e973 joined the cluster","threadName":"jgroups-17,keycloak-84b48bc8dc-5pj7r-54145","threadId":38,"mdc":{},"ndc":"","hostName":"keycloak-84b48bc8dc-5pj7r","processName":"QuarkusEntryPoint","processId":1} {"timestamp":"2022-06-15T07:24:16.98Z","sequence":8430,"loggerClassName":"org.slf4j.impl.Slf4jLogger","loggerName":"org.jgroups.protocols.TCP","level":"WARN","message":"JGRP000032: keycloak-84b48bc8dc-5pj7r-54145: no physical address for b4432256-a5bd-4023-9992-6da854e9e973, dropping message","threadName":"TQ-Bundler-6,keycloak-84b48bc8dc-5pj7r-54145","threadId":26,"mdc":{},"ndc":"","hostName":"keycloak-84b48bc8dc-5pj7r","processName":"QuarkusEntryPoint","processId":1}

Version

999-SNAPSHOT

Expected behavior

Smooth scaling/up-down, no error or warning messages from Infinispan

Actual behavior

Delayed scaling up/down, warning messages from Infinispan

How to Reproduce?

install the the Keycloak operator nightly in minikube
create a Keycloak deployment with one instance, wait until it has spun up completely
change the number of instances to 10

Approx. 1 out of 10 newly started pod instances will trigger that error message. The error message appears on that pod, and also on the other pods that are part of the Infinispan cluster being formed.

Anything else?

Previous discussions and PRs: #12447 and #11763

andreaTP commented 2 years ago

Thanks for reporting this issue @ahus1 !

I checked locally and I can reproduce, I have a few considerations:

the issue is coming from Infinispan/JGroups I try to summon @ryanemerson or @rigazilla for further context
I found similar references here and they mention that switching to KUBE_PING resolves the issue (should we try it out?)
found another very old reference here and maybe @thomasdarimont can chip in
according to this troubleshooting guide the message can be ignored, assuming we are falling into the case "restart all nodes"

Even when this message appears Infinispan eventually recovers and the cluster is always formed, I'd tend to say that this is not causing any user-facing issue but is just an internal "retry mechanism" and can be considered "normal" behavior. Waiting for confirmation(or not) from the ppl pinged in this message.

sschu commented 2 years ago

@andreaTP @ahus1 I can confirm this does not only happen on Minikube but also on Azure AKS. A typical picture of pod state after around five minutes of scaling up from 1 to 10 is this: Screenshot2 After a while all pods started up succesfully, but I don't think this is desirable and should be ignored. Please also note that in all my experiments the caches were empty. I assume the behaviour/errors get much worse when Infinispan is actually transferring data to balance caches.

ryanemerson commented 2 years ago

@belaban Is this warning safe to ignore and how can it be avoided?

belaban commented 2 years ago

@ryanemerson What happens is that JGroups tries to send a message to B (logical name). To do this, it does a lookup with key=B in the uuids hashmap in the transport, returning IP address 1.2.3.4. It then uses this IP address to establish a TCP connection to 1.2.3.4 and sends the message. The warning you're seeing is because there is no key/value pair for B in the uuids hashmap. This hashmap is populated by the discovery protocol, IIUIC DNS_PING in this case. I've experienced slowness with DNS_PING under Kubernetes before, providing information (IP address) of newly (re-)started pods. IIRC, this can be configured (tolerate-unready-endpoints?) in the YAML... In any case, this warning should disappear once the DNS lookup for B does indeed return 1.2.3.4...

ahus1 commented 2 years ago

Reading @belaban's comment, I assume those are the things to try out:

trying tolerate-unready-endpoints, that has now been renamed to "publishNotReadyAddresses" from what I read. The Kubernetes docs tell about it:

publishNotReadyAddresses indicates that any agent which deals with endpoints for this Service should disregard any indications of ready/not-ready. The primary use case for setting this field is for a StatefulSet's Headless Service to propagate SRV DNS records for its Pods for the purpose of peer discovery. The Kubernetes controllers that generate Endpoints and EndpointSlice resources for Services interpret this to mean that all endpoints are considered "ready" even if the Pods themselves are not. Agents which consume only Kubernetes generated endpoints through the Endpoints or EndpointSlice resources can safely assume this behavior.

Although this is not a stateful set, we're behaving like one with service discovery. The spinning up of Infinispan happens way before Keycloak is ready, so it make sense to me. I'm quite surprised that it works without this so well.

Setting Java's networkaddress.cache.negative.ttl and networkaddress.cache.ttl to 0 to prevent delays for a DNS lookup, and having stale DNS information about other systems as well: https://docs.oracle.com/javase/7/docs/technotes/guides/net/properties.html

Besides that, I'd also be happy to give Kubeping a try if we can get it running with a realmroles and not clusterroles as described in the https://github.com/jgroups-extras/jgroups-kubernetes repo.

ahus1 commented 2 years ago

I've been thinking about @belaban's comment again, and I wonder if it doesn't explain all of the behavior I see:

What I understand: The b4432256-a5bd-4023-9992-6da854e9e973 is the ID (?) of a node that the other nodes try to resolve to an IP address using DNS. This leads to the "no physical address" message on any node that tries to resolve it.

What I don't yet understand: How does this ID that looks random and like a UUID ends up in the list of nodes in the first place? For all other nodes it is the pod name ("keycloak-84b48bc8dc-h9dms-15950"). That UUID is nothing that would come up via DNS. If it is something that the node itself published to the other peers as its own host name, then something seems to be broken or at least fragile there.

To figure out its own hostname, there is the chance to pass down the pod name as an environment variable to the pod using the "downward API": https://kubernetes.io/docs/tasks/inject-data-application/environment-variable-expose-pod-information/

That would always be a local operation to determine one's hostname, and might be less prone to failure.

Would that help here? WDYT?

andreaTP commented 2 years ago

What I'm getting is that DNS(especially on Kubernetes) has some latency and this affects Infinispan.

Other than trying to fight the DNS settings etc. (it's likely to be a rabbit hole and possibly dependent on the host/provider) I'm more inclined to give a spin to KUBE_PING which has been reported to be more stable.

@ahus1 can you provide a link for "with a realmroles and not clusterroles" as, all I can see in the Readme, are ClusterRoles.

ahus1 commented 2 years ago

@andreaTP - I see that I mixed up the wording here, it is Role and not RealmRole - some wording from Keycloak sneaked in here.

This is all about minimal privileges, and I would assume that as long as it stays within the same namespace, it wouldn't require cluster permissions.

Therefore I suggest the following:

Get it to work with ClusterRole first
Then replace "ClusterRole" with "Role" and give it a try.

If that doesn't work as expected, I'd ask the Infinispan team for help. If it does work, create a PR for KubePing's README :-)

sschu commented 2 years ago

I agree that delays with DNS_PING might depend on the DNS service being used inside the cluster. We never had issues with it on Azure AKS, thats why I probably also did not see the errors @ahus1 saw scaling up a stateful set. But using KUBE_PING instead might be worth a try.

ryanemerson commented 2 years ago

It was before my time working on the Infinispan Operator, but I believe the need for a ClusterRole with KUBE_PING was why DNS_PING was favoured for our implementation (@rigazilla might remember the exact details) . In general we haven't had many issues using DNS_PING.

nxadm commented 2 years ago

This bug is not restricted to Keycloak Operator. I am seeing it on Keycloak 19.0.2 without operator running on a Nomad cluster using DNS ping.

When I bring up a cluster, e.g. with 8 nodes, often only 6 nodes enter the cluster. When I see the error, there is a general problem because all the nodes are added to the servers' DNS servicename because they pass their health checks (basically curl to the health endpoint and checking port 7800 is open). So the 2 nodes not on the cluster are also reachable by clients and thus causing problems.

belaban commented 1 year ago

I've been thinking about @belaban's comment again, and I wonder if it doesn't explain all of the behavior I see:

What I understand: The b4432256-a5bd-4023-9992-6da854e9e973 is the ID (?) of a node that the other nodes try to resolve to an IP address using DNS. This leads to the "no physical address" message on any node that tries to resolve it.

What I don't yet understand: How does this ID that looks random and like a UUID ends up in the list of nodes in the first place? For all other nodes it is the pod name ("keycloak-84b48bc8dc-h9dms-15950"). That UUID is nothing that would come up via DNS. If it is something that the node itself published to the other peers as its own host name, then something seems to be broken or at least fragile there.

To figure out its own hostname, there is the chance to pass down the pod name as an environment variable to the pod using the "downward API": https://kubernetes.io/docs/tasks/inject-data-application/environment-variable-expose-pod-information/

That would always be a local operation to determine one's hostname, and might be less prone to failure.

Would that help here? WDYT?

Yes, b4432256-a5bd-4023-9992-6da854e9e973 is the address of a member. It is a UUID and is used for all communication. "keycloak-84b48bc8dc-h9dms-15950" is the logical name associated with a member, and is not required for communication; it acts as mere syntactic sugar to get more legible logs.

There are 2 caches: one between UUIDs and IP addresses/ports and one between UUIDs and logical names. Both are populated on startup of a member, and when new members join. The former is essential and communication between members cannot happen without it. The latter is not needed for communication, but your logs will look terrible until it is populated.

If you have a JGroups JChannel you can always get the local address, IP address for any member etc, by fetching a ref to TP (transport):

TP tp=channel.getProtocolStack().getTransport();

archoversight commented 1 year ago

I'm running into issues with using DNS_PING on my Kubernetes cluster, how do I switch the new quarkus distribution to use KUBE_PING instead?

When my nodes get rolled/pods get restarted the Keycloak pods will have just two out of three join the cluster when using DNS_PING.

I tried setting the headless server to ignore the health check thereby allowing JGROUP to see the full cluster state, but that does not help and one of the pods continues to spin and never join the cluster.

These are the messages seen in the pod logs:

2023-01-04 20:42:46,677 WARN  [org.jgroups.protocols.pbcast.GMS] (VERIFY_SUSPECT.TimerThread-38,keycloak-774bdfb494-js4sw-43683) keycloak-774bdfb494-js4sw-43683: I'm not the coordinator (or next-in-line); dropping LEAVE request
2023-01-04 20:42:51,740 WARN  [org.jgroups.protocols.TCP] (TQ-Bundler-7,keycloak-774bdfb494-js4sw-43683) JGRP000032: keycloak-774bdfb494-js4sw-43683: no physical address for f968cc16-cd7e-4c7c-bbe3-756454a72129, dropping message
2023-01-04 20:42:55,679 WARN  [org.jgroups.protocols.TCP] (TQ-Bundler-7,keycloak-774bdfb494-js4sw-43683) JGRP000032: keycloak-774bdfb494-js4sw-43683: no physical address for f968cc16-cd7e-4c7c-bbe3-756454a72129, dropping message

Here's the yaml files for defining the Keycloak cluster:

apiVersion: v1
kind: Service
metadata:
  namespace: keycloak-test
  name: keycloak
  labels:
    app: keycloak
spec:
  publishNotReadyAddresses: true
  ports:
  - name: http
    port: 8080
    targetPort: 8080
  selector:
    app: keycloak
  clusterIP: None
---
apiVersion: apps/v1
kind: Deployment
metadata:
  namespace: keycloak-test
  name: keycloak
  labels:
    app: keycloak
spec:
  replicas: 3
  selector:
    matchLabels:
      app: keycloak
  template:
    metadata:
      labels:
        app: keycloak
    spec:
      containers:
      - name: keycloak
        image: quay.io/keycloak/keycloak:20.0.2
        args: ["start"]
        env:
        - name: KEYCLOAK_ADMIN
          value: "admin"
        - name: KEYCLOAK_ADMIN_PASSWORD
          value: "badadminpassword"
        - name: KC_PROXY
          value: "edge"
        - name: JAVA_OPTS_APPEND
          value: "-Djava.net.preferIPv4Stack=false -Djava.net.preferIPv6Addresses=true -Djgroups.dns.query=keycloak.keycloak-test.svc.cluster.local"
        - name: KC_DB
          value: "postgres"
        - name: KC_DB_PASSWORD
          value: "simpletestpassword"
        - name: KC_DB_URL
          value: jdbc:postgresql://someuri.rds.amazonaws.com/keycloak
        - name: KC_DB_USERNAME
          value: "keycloak"
        - name: KC_FEATURES
          value: "admin-fine-grained-authz,client-secret-rotation,recovery-codes,token-exchange"
        - name: KC_HEALTH_ENABLED
          value: "true"
        - name: KC_HOSTNAME
          value: "sso.test.example.com"
        - name: KC_CACHE_STACK
          value: "kubernetes"
        ports:
        - name: http
          containerPort: 8080
        readinessProbe:
          httpGet:
            path: /realms/master
            port: 8080
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: keycloak-external
  namespace: keycloak-test
  annotations:
    alb.ingress.kubernetes.io/backend-protocol: HTTP
    alb.ingress.kubernetes.io/healthcheck-protocol: HTTP
    alb.ingress.kubernetes.io/ip-address-type: dualstack
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443},{"HTTP":80}]'
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/ssl-redirect: '443'
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/group.name: keycloak-test
spec:
  ingressClassName: alb
  tls:
    - hosts:
        - sso.test.example.com
  rules:
    - host: sso.test.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: keycloak
                port:
                  number: 8080

archoversight commented 1 year ago

I'm coming back to this issue because I was able to fix the cause of the issue on my end... I now no longer have the follow log entries:

2023-01-04 20:42:55,679 WARN  [org.jgroups.protocols.TCP] (TQ-Bundler-7,keycloak-774bdfb494-js4sw-43683) JGRP000032: keycloak-774bdfb494-js4sw-43683: no physical address for f968cc16-cd7e-4c7c-bbe3-756454a72129, dropping message

What I did is to force infinispan to bind to the global IP's only:

-Djgroups.bind_addr=global

Added to JAVA_OPTS_APPEND. I am running in an all IPv6 cluster on AWS, so all the pods receive global IPv6 addresses, this change forces jgroups to ignore the link-local IPv6 address and instead bind to the global IPv6 address.

I assume that upon starting up, it attempts to advertise itself to the remote system, but the IP address in that advertisement did not match the IP that was published in the k8s service, and that is why the physical address for blah message would appear. It appears there is some sort of race condition where sometimes maybe a remote node would reach out before it tried to self-advertise? I am not sure actually... but this fixed the issue for me.

Now my cluster comes alive and joins without issues at all, I was able to roll my entire cluster without having any container restarts, and haven't had any more issues with Keycloak failing to start due to the cache not coming online.

See documentation for Jgroups/Infispan here: http://jgroups.org/manual4/index.html#Transport which specifies the special values you can use for the bind_addr variable.

vmuzikar commented 1 year ago

@ahus1 I was unable to reproduce anymore with 21.1.1 (using IPv4-only cluster, IPv6/dualstack is a different story). Not sure what fixed it, possibly some ISPN upgrade. Do you think it's still valid?

ahus1 commented 1 year ago

I also tested this with nightly, can't reproduce it. Also tried to force it by re-configuring the StatefulSet without startup probes, so that pods starts more concurrently.

So I'm closing it as it is no longer reproducible.