Open jtafurth opened 5 years ago
Yeah i'm having similar situation with Jenkins + k8s agent. Sometimes the pod got the corrected assumed role, sometimes it doesn't. Not sure why
Also experiencing this issue. Most of the time the correct role gets assumed but it is intermittently falling back to the worker group role. If anyone needs any more info please feel free to reach out.
Based on our current finding, mostly the root cause is coming from the case when we use a random name for pod label. When we let Jenkins to generate the label randomly, the error rate (the case in which the pod is not able to get the correct credentials) was very high, up to 50%. After that, when we change to a static label for each Jenkins pod specs per job, the error rate was reduced dramatically.
It is not a complete solution, since we still encountered this problem, even just a small percentage. Usually normal workloads (like application deployments) are working fine because it can tolerate a few seconds failure in getting iam credentials before retry again. I guess there are nothing we can do about this since this is depends on kube2iam. For now we just retry the job when it fails.
A gitlab runner pod is running on a kubernetes cluster running kube2iam which spins builds pods with two containers "build" and "helper".
The "build" container AWS calls are intercepted correctly and the correct role is assumed but the "helper" container does not get intercepted or the annotation is not recognized and then kube2iam seems to default to the default role. Eventually this causes the cache functionality of gitlab to return a 403 due to the incorrect role being assumed.
Has anyone experienced this issue before?
kube2iam logs for build pod:
kube2iam logs for helper pod:
The pod configuration:
As . you can see the annotation is there and the pod has two containers.
The IP 10.137.4.156 in the logs corresponds to the parent runner IP (the one that launches the children pod with the 2 containers).