With this setup, all pods will take ~60s to startup.
The root cause is a bad interaction between these causing a circular dependency. When a pod starts, istio-cni will be invoked. At this point, the Pod is created but not containers are running yet. istio-cni more or less calls nsenter -- iptables-restore. Our iptables commands use the xt_owner module (-m owner) which in turn calls getpwnam (https://git.netfilter.org/iptables/tree/extensions/libxt_owner.c#n150). This triggers PAM for authentication. When OSLogin is enabled on the GCE machine, a PAM module will be loaded: https://github.com/GoogleCloudPlatform/guest-oslogin. When triggered, this will call the metadata server (request like /computeMetadata/v1/oslogin/users?username=...).
The GKE Metadata Server will detect this as a request coming from the pod. The pod, however, has not yet started, so the request is denied. The PAM module will retry this request a number of times before giving up. Once it gives up, execution can continue as usual.
This impacts all Istio versions, but only recent GKE versions (later patch releases in GKE 1.25+). It does not impact Autopilot, which cannot use oslogin.
What happened?
We are basically susceptible to the same issue as described in https://github.com/istio/istio/issues/48416