Open Jaykah opened 5 years ago
Are you using the CNI from Amazon or a different one? Was this working and just broke with a cluster upgrade or the first time attempting the deployment?
@jrnt30 using the EKS Ubuntu AMI (https://aws.amazon.com/blogs/opensource/optimized-support-amazon-eks-ubuntu-1804/) and have not made any CNI changes. This is a new deployment.
Just tried deploying https://github.com/aws/amazon-vpc-cni-k8s and redeploying kube2iam and my deployment and everything is still the same
In the Kube2iam container or the host, can you run a few commands to see if we can get you figured out.
kubectl get daemonset -o yaml -n kube-system kube2iam
: The pod seems fine but just to double checkifconfig
: We are looking to see if your interfaces are in fact eni+
named iptables-save
: Just looking to see if the links are setup properly Also a question:
Ex:
kube2iam-z57bq kube2iam time="2019-01-18T12:27:27Z" level=debug msg="Pod OnUpdate" pod.iam.role="arn:aws:iam::XXXKKKKDDDD:role/kube2iam-external-dns" pod.name=external-dns-85b957cc5f-pwbtk pod.namespace=kube-system pod.status.ip=10.0.21.175 pod.status.phase=Running
/home/ubuntu# kubectl get daemonset -o yaml -n kube-system kube2iam
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
creationTimestamp: 2019-01-18T02:03:52Z
generation: 1
labels:
app: kube2iam
name: kube2iam
namespace: kube-system
resourceVersion: "1179001"
selfLink: /apis/extensions/v1beta1/namespaces/kube-system/daemonsets/kube2iam
uid: 4d6281b9-1ac5-11e9-8b5c-12b6e77f5ea2
spec:
revisionHistoryLimit: 10
selector:
matchLabels:
name: kube2iam
template:
metadata:
creationTimestamp: null
labels:
name: kube2iam
spec:
containers:
- args:
- --iptables=true
- --host-interface=eni+
- --host-ip=$(HOST_IP)
- --auto-discover-default-role=true
- --node=$(NODE_NAME)
- --debug
- --verbose
- --auto-discover-base-arn
env:
- name: HOST_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: AWS_REGION
value: us-east-1
image: jtblin/kube2iam:latest
imagePullPolicy: Always
name: kube2iam
ports:
- containerPort: 8181
hostPort: 8181
name: http
protocol: TCP
resources: {}
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
hostNetwork: true
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: kube2iam
serviceAccountName: kube2iam
terminationGracePeriodSeconds: 30
templateGeneration: 1
updateStrategy:
type: OnDelete
status:
currentNumberScheduled: 2
desiredNumberScheduled: 2
numberAvailable: 2
numberMisscheduled: 0
numberReady: 2
observedGeneration: 1
updatedNumberScheduled: 2
docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 172.17.0.1 netmask 255.255.0.0 broadcast 0.0.0.0
ether 02:62:c0:85:ab:00 txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eni239df5d4736: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet6 fe80::7c41:6bff:fe3d:4bf1 prefixlen 64 scopeid 0x20<link>
ether 7e:39:6b:3f:4b:f1 txqueuelen 0 (Ethernet)
RX packets 226966 bytes 18386501 (18.3 MB)
RX errors 0 dropped 2 overruns 0 frame 0
TX packets 267442 bytes 104958394 (104.9 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enif5722505ef5: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet6 fe80::66b4:abff:feb6:6648 prefixlen 64 scopeid 0x20<link>
ether 66:b4:ac:b6:66:48 txqueuelen 0 (Ethernet)
RX packets 226420 bytes 18350777 (18.3 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 267363 bytes 104947144 (104.9 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9001
inet 10.10.10.10 netmask 255.255.255.0 broadcast 10.2.0.255
inet6 fe80::103b:68fd:fe5d:16ae prefixlen 64 scopeid 0x20<link>
ether 12:3c:68:5d:16:ae txqueuelen 1000 (Ethernet)
RX packets 1726767 bytes 718253732 (718.2 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 1449418 bytes 140060004 (140.0 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9001
inet6 fe80::14aa:10ff:fe6a:fa8a prefixlen 64 scopeid 0x20<link>
ether 12:aa:10:4a:fa:8a txqueuelen 1000 (Ethernet)
RX packets 8512 bytes 357628 (357.6 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 60 bytes 4296 (4.2 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 325 bytes 36118 (36.1 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 325 bytes 36118 (36.1 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
and the iptables:
# Generated by iptables-save v1.6.1 on Fri Jan 18 16:10:22 2019
*mangle
:PREROUTING ACCEPT [2036329:741745145]
:INPUT ACCEPT [1389681:535852374]
:FORWARD ACCEPT [646646:205892101]
:OUTPUT ACCEPT [1333531:116178407]
:POSTROUTING ACCEPT [1980177:322070508]
-A PREROUTING -i eth0 -m comment --comment "AWS, primary ENI" -m addrtype --dst-type LOCAL --limit-iface-in -j CONNMARK --set-xmark 0x80/0x80
-A PREROUTING -i eni+ -m comment --comment "AWS, primary ENI" -j CONNMARK --restore-mark --nfmask 0x80 --ctmask 0x80
COMMIT
# Completed on Fri Jan 18 16:10:22 2019
# Generated by iptables-save v1.6.1 on Fri Jan 18 16:10:22 2019
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [26:1560]
:POSTROUTING ACCEPT [3:180]
:AWS-SNAT-CHAIN-0 - [0:0]
:AWS-SNAT-CHAIN-1 - [0:0]
:DOCKER - [0:0]
:KUBE-MARK-DROP - [0:0]
:KUBE-MARK-MASQ - [0:0]
:KUBE-NODEPORTS - [0:0]
:KUBE-POSTROUTING - [0:0]
:KUBE-SEP-4SE5NWE5RK7SG27W - [0:0]
:KUBE-SEP-B4U2U62NREYAJOZR - [0:0]
:KUBE-SEP-H25NROOSKFH236BM - [0:0]
:KUBE-SEP-QGETDQVKLMWZPUGG - [0:0]
:KUBE-SEP-XZ3FSTCUDVIESQ2M - [0:0]
:KUBE-SEP-YII6FGRXH4XM4AA2 - [0:0]
:KUBE-SERVICES - [0:0]
:KUBE-SVC-ERIFXISQEP7F7OF4 - [0:0]
:KUBE-SVC-NPX46M4PTMTKRN6Y - [0:0]
:KUBE-SVC-TCOU7JCQXEZGVUNU - [0:0]
-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A PREROUTING -d 169.254.169.254/32 -i eni+ -p tcp -m tcp --dport 80 -j DNAT --to-destination 10.10.10.10:8181
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING
-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
-A POSTROUTING -m comment --comment "AWS SNAT CHAN" -j AWS-SNAT-CHAIN-0
-A AWS-SNAT-CHAIN-0 ! -d 10.2.0.0/16 -m comment --comment "AWS SNAT CHAN" -j AWS-SNAT-CHAIN-1
-A AWS-SNAT-CHAIN-1 -m comment --comment "AWS, SNAT" -m addrtype ! --dst-type LOCAL -j SNAT --to-source 10.10.10.10
-A DOCKER -i docker0 -j RETURN
-A KUBE-MARK-DROP -j MARK --set-xmark 0x8000/0x8000
-A KUBE-MARK-MASQ -j MARK --set-xmark 0x4000/0x4000
-A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -m mark --mark 0x4000/0x4000 -j MASQUERADE
-A KUBE-SEP-4SE5NWE5RK7SG27W -s 10.10.10.159/32 -m comment --comment "default/kubernetes:https" -j KUBE-MARK-MASQ
-A KUBE-SEP-4SE5NWE5RK7SG27W -p tcp -m comment --comment "default/kubernetes:https" -m tcp -j DNAT --to-destination 10.10.10.159:443
-A KUBE-SEP-B4U2U62NREYAJOZR -s 10.10.10.208/32 -m comment --comment "kube-system/kube-dns:dns" -j KUBE-MARK-MASQ
-A KUBE-SEP-B4U2U62NREYAJOZR -p udp -m comment --comment "kube-system/kube-dns:dns" -m udp -j DNAT --to-destination 10.10.10.208:53
-A KUBE-SEP-H25NROOSKFH236BM -s 10.10.10.208/32 -m comment --comment "kube-system/kube-dns:dns-tcp" -j KUBE-MARK-MASQ
-A KUBE-SEP-H25NROOSKFH236BM -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp" -m tcp -j DNAT --to-destination 10.10.10.208:53
-A KUBE-SEP-QGETDQVKLMWZPUGG -s 10.10.10.198/32 -m comment --comment "kube-system/kube-dns:dns-tcp" -j KUBE-MARK-MASQ
-A KUBE-SEP-QGETDQVKLMWZPUGG -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp" -m tcp -j DNAT --to-destination 10.10.10.198:53
-A KUBE-SEP-XZ3FSTCUDVIESQ2M -s 10.10.10.198/32 -m comment --comment "kube-system/kube-dns:dns" -j KUBE-MARK-MASQ
-A KUBE-SEP-XZ3FSTCUDVIESQ2M -p udp -m comment --comment "kube-system/kube-dns:dns" -m udp -j DNAT --to-destination 10.10.10.198:53
-A KUBE-SEP-YII6FGRXH4XM4AA2 -s 10.10.10.28/32 -m comment --comment "default/kubernetes:https" -j KUBE-MARK-MASQ
-A KUBE-SEP-YII6FGRXH4XM4AA2 -p tcp -m comment --comment "default/kubernetes:https" -m tcp -j DNAT --to-destination 10.10.10.28:443
-A KUBE-SERVICES -d 172.20.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-SVC-TCOU7JCQXEZGVUNU
-A KUBE-SERVICES -d 172.20.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-SVC-ERIFXISQEP7F7OF4
-A KUBE-SERVICES -d 172.20.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y
-A KUBE-SERVICES -m comment --comment "kubernetes service nodeports; NOTE: this must be the last rule in this chain" -m addrtype --dst-type LOCAL -j KUBE-NODEPORTS
-A KUBE-SVC-ERIFXISQEP7F7OF4 -m comment --comment "kube-system/kube-dns:dns-tcp" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-QGETDQVKLMWZPUGG
-A KUBE-SVC-ERIFXISQEP7F7OF4 -m comment --comment "kube-system/kube-dns:dns-tcp" -j KUBE-SEP-H25NROOSKFH236BM
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-4SE5NWE5RK7SG27W
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https" -j KUBE-SEP-YII6FGRXH4XM4AA2
-A KUBE-SVC-TCOU7JCQXEZGVUNU -m comment --comment "kube-system/kube-dns:dns" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-XZ3FSTCUDVIESQ2M
-A KUBE-SVC-TCOU7JCQXEZGVUNU -m comment --comment "kube-system/kube-dns:dns" -j KUBE-SEP-B4U2U62NREYAJOZR
COMMIT
# Completed on Fri Jan 18 16:10:22 2019
# Generated by iptables-save v1.6.1 on Fri Jan 18 16:10:22 2019
*filter
:INPUT ACCEPT [214:50333]
:FORWARD ACCEPT [63:19508]
:OUTPUT ACCEPT [205:19148]
:DOCKER - [0:0]
:DOCKER-ISOLATION - [0:0]
:KUBE-EXTERNAL-SERVICES - [0:0]
:KUBE-FIREWALL - [0:0]
:KUBE-FORWARD - [0:0]
:KUBE-SERVICES - [0:0]
-A INPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes externally-visible service portals" -j KUBE-EXTERNAL-SERVICES
-A INPUT -j KUBE-FIREWALL
-A FORWARD -m comment --comment "kubernetes forwarding rules" -j KUBE-FORWARD
-A FORWARD -j DOCKER-ISOLATION
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A OUTPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT -j KUBE-FIREWALL
-A DOCKER-ISOLATION -j RETURN
-A KUBE-FIREWALL -m comment --comment "kubernetes firewall for dropping marked packets" -m mark --mark 0x8000/0x8000 -j DROP
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding rules" -m mark --mark 0x4000/0x4000 -j ACCEPT
COMMIT
# Completed on Fri Jan 18 16:10:22 2019
In regards to the debug - those messages are not displaying for some reason, with both debug and verbose - no idea what might be causing that.
Thanks for your help!
Any thoughts on how I can troubleshoot this further?
hate to ping this, but got completely stuck - perhaps anyone knows why the logs are not showing anything with their highest verbosity?
thanks again
Sorry for the delay, does the ServiceAccount you are using have permissions to view the pods across your namespaces? https://github.com/jtblin/kube2iam#rbac-setup outlines a ClusterRole & ClusterRoleBinding that should give you what you need in that respect.
@jrnt30 yes, the service account was in place, and I have just double-checked the permissions and they seem fine as well:
apiVersion: v1
kind: ServiceAccount
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","kind":"ServiceAccount","metadata":{"annotations":{},"name":"kube2iam","namespace":"kube-system"}}
creationTimestamp: 2019-01-17T00:55:22Z
name: kube2iam
namespace: kube-system
resourceVersion: "1052054"
selfLink: /api/v1/namespaces/kube-system/serviceaccounts/kube2iam
uid: 915c8ea7-19f2-11e9-8b5c-12b6e77f5ea2
secrets:
- name: kube2iam-token-f4mrh
~
apiVersion: v1
items:
- apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"rbac.authorization.k8s.io/v1","kind":"ClusterRole","metadata":{"annotations":{},"name":"kube2iam","namespace":""},"rules":[{"apiGroups":[""],"resources":["namespaces","pods"],"verbs":["get","watch","list"]}]}
creationTimestamp: 2019-01-17T00:55:54Z
name: kube2iam
namespace: ""
resourceVersion: "1052099"
selfLink: /apis/rbac.authorization.k8s.io/v1/clusterroles/kube2iam
uid: a46a254c-19f2-11e9-a519-0ac71ff3204c
rules:
- apiGroups:
- ""
resources:
- namespaces
- pods
verbs:
- get
- watch
- list
- apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"rbac.authorization.k8s.io/v1","kind":"ClusterRoleBinding","metadata":{"annotations":{},"name":"kube2iam","namespace":""},"roleRef":{"apiGroup":"rbac.authorization.k8s.io","kind":"ClusterRole","name":"kube2iam"},"subjects":[{"kind":"ServiceAccount","name":"kube2iam","namespace":"kube-system"}]}
creationTimestamp: 2019-01-17T00:55:54Z
name: kube2iam
namespace: ""
resourceVersion: "1052100"
selfLink: /apis/rbac.authorization.k8s.io/v1/clusterrolebindings/kube2iam
uid: a46b9ed2-19f2-11e9-a519-0ac71ff3204c
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: kube2iam
subjects:
- kind: ServiceAccount
name: kube2iam
namespace: kube-system
kind: List
metadata: {}
If I curl http://169.254.169.254/latest/meta-data/iam/security-credentials/ from inside the container that should be trying to assume another role, I just get its default worker instance role
So one thing I have missed myself before is that the --debug enables the endpoint, add --log-level debug to get those additional logs On Mon, Jan 28, 2019 at 19:01 Jay J notifications@github.com wrote:
@jrnt30 https://github.com/jrnt30 yes, the service account was in place, and I have just double-checked the permissions and they seem fine as well:
apiVersion: v1 kind: ServiceAccount metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"v1","kind":"ServiceAccount","metadata":{"annotations":{},"name":"kube2iam","namespace":"kube-system"}} creationTimestamp: 2019-01-17T00:55:22Z name: kube2iam namespace: kube-system resourceVersion: "1052054" selfLink: /api/v1/namespaces/kube-system/serviceaccounts/kube2iam uid: 915c8ea7-19f2-11e9-8b5c-12b6e77f5ea2 secrets:
- name: kube2iam-token-f4mrh ~
apiVersion: v1 items:
- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"rbac.authorization.k8s.io/v1","kind":"ClusterRole","metadata":{"annotations":{},"name":"kube2iam","namespace":""},"rules":[{"apiGroups":[""],"resources":["namespaces","pods"],"verbs":["get","watch","list"]}]} creationTimestamp: 2019-01-17T00:55:54Z name: kube2iam namespace: "" resourceVersion: "1052099" selfLink: /apis/rbac.authorization.k8s.io/v1/clusterroles/kube2iam uid: a46a254c-19f2-11e9-a519-0ac71ff3204c rules:
- apiGroups:
- "" resources:
- namespaces
- pods verbs:
- get
- watch
- list
- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"rbac.authorization.k8s.io/v1","kind":"ClusterRoleBinding","metadata":{"annotations":{},"name":"kube2iam","namespace":""},"roleRef":{"apiGroup":"rbac.authorization.k8s.io","kind":"ClusterRole","name":"kube2iam"},"subjects":[{"kind":"ServiceAccount","name":"kube2iam","namespace":"kube-system"}]} creationTimestamp: 2019-01-17T00:55:54Z name: kube2iam namespace: "" resourceVersion: "1052100" selfLink: /apis/rbac.authorization.k8s.io/v1/clusterrolebindings/kube2iam uid: a46b9ed2-19f2-11e9-a519-0ac71ff3204c roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: kube2iam subjects:
- kind: ServiceAccount name: kube2iam namespace: kube-system kind: List metadata: {}
If I curl http://169.254.169.254/latest/meta-data/iam/security-credentials/ from inside the container that should be trying to assume another role, I just get it's default worker instance role
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jtblin/kube2iam/issues/184#issuecomment-458364828, or mute the thread https://github.com/notifications/unsubscribe-auth/AAWanHSAbTjwmaZNnO34lFFUbAKhDZbtks5vH51dgaJpZM4aEZCX .
Added to the daemonset
spec:
containers:
- args:
- --iptables=true
- --host-interface=eni+
- --host-ip=$(HOST_IP)
- --auto-discover-default-role=true
- --node=$(NODE_NAME)
- --debug
- --verbose
- --auto-discover-base-arn
- --log-level=debug
Still not seeing any extra verbosity. Should I get rid of --verbose?
tried that as well, still the same amount of logs :(
Have you manually deleted the pods? You can tell by looking at their duration via kubectl get pods. Your updateStratey is set to "OnDelete" so they will stay until you manually roll them. On Mon, Jan 28, 2019 at 19:14 Jay J notifications@github.com wrote:
Added to the daemonset
spec: containers: - args: - --iptables=true - --host-interface=eni+ - --host-ip=$(HOST_IP) - --auto-discover-default-role=true - --node=$(NODE_NAME) - --debug - --verbose - --auto-discover-base-arn - --log-level=debug
Still not seeing any extra verbosity. Should I get rid of --verbose?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jtblin/kube2iam/issues/184#issuecomment-458367656, or mute the thread https://github.com/notifications/unsubscribe-auth/AAWanO5Eg8mpdQ1bwE6u99U6geUwCYaWks5vH6BygaJpZM4aEZCX .
Hrmmm, and execing in you see the flag being passed with ps -ef and getting the pod spec for a new pod has that flag? On Mon, Jan 28, 2019 at 19:17 Justin Nauman justin.r.nauman@gmail.com wrote:
Have you manually deleted the pods? You can tell by looking at their duration via kubectl get pods. Your updateStratey is set to "OnDelete" so they will stay until you manually roll them. On Mon, Jan 28, 2019 at 19:14 Jay J notifications@github.com wrote:
Added to the daemonset
spec: containers: - args: - --iptables=true - --host-interface=eni+ - --host-ip=$(HOST_IP) - --auto-discover-default-role=true - --node=$(NODE_NAME) - --debug - --verbose - --auto-discover-base-arn - --log-level=debug
Still not seeing any extra verbosity. Should I get rid of --verbose?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jtblin/kube2iam/issues/184#issuecomment-458367656, or mute the thread https://github.com/notifications/unsubscribe-auth/AAWanO5Eg8mpdQ1bwE6u99U6geUwCYaWks5vH6BygaJpZM4aEZCX .
Tried deleting the pods and re-creating - same thing, no logs.
Did not quite get the part about the flag, could you please elaborate?
Thanks again for the help!
Oh boy, just looked closer at your pod spec you were trying to get the credentials for. If you use hostNetwork: true
on the target pod then it is not going to be using your eni+
interfaces (it will be the host network's interface) and will not be intercepted by Kube2iam.
Try removing the hostNetwork: true
from the Deployment
named testns
.
@jrnt30 so is there no way to use the host networking in docker for that deployment with kube2iam? the workload on this container is a sip application, so the host networking is a must
looks like that did it though, getting another error now, assuming it's unrelated:
NoCredentialProviders: no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors
That's what I'm getting from the metadata endpoint:
docker exec 45 curl 169.254.169.254/latest/meta-data/iam/security-credentials/
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 33 100 33 0 0 13 0 0:00:02 0:00:02 --:--:-- 13
pod with specificed IP not found
error coming from kube2iam (my pods are not terminating and are in a running state)
https://github.com/jtblin/kube2iam/blob/42bea9880c50e88fc9fc544320c64573f66086c8/k8s/k8s.go#L96
Kube2iam currently is unable to intercept the calls for the host networking by default. You may be able to hack in some IP Tables rules to force it to do so but I have not attempted that as of yet.
In regards to the issue you mentioned, can you outline what you changed or post your new configurations? The docker exec
is throwing me for a bit of a loop.
At this point you should:
/debug
endpoint of Kube2iam to ensure that it has the cache entry for the pod IP in question I do see activity in kube2iam:
time="2019-01-29T05:22:38Z" level=debug msg="Proxy ec2 metadata request" metadata.url=169.254.169.254 req.method=GET req.path=/latest/meta-data/placement/availability-zone req.remote=10.2.2.66
time="2019-01-29T05:22:38Z" level=info msg="GET /latest/meta-data/placement/availability-zone (200) took 814348.000000 ns" req.method=GET req.path=/latest/meta-data/placement/availability-zone req.remote=10.2.2.66 res.duration=814348 res.status=200
time="2019-01-29T05:22:41Z" level=info msg="GET /latest/meta-data/iam/security-credentials (404) took 2575162258.000000 ns" req.method=GET req.path=/latest/meta-data/iam/security-credentials req.remote=10.2.2.66 res.duration=2.5751622579999995e+09 res.status=404
However, as to the debug endpoint - there is no data there:
{"namespaceByIP":{},"rolesByIP":{},"rolesByNamespace":{"default":null,"testns":null,"kube-public":null,"kube-system":null}}
It looks like the ip is not being determined for some reason.
Did you remote the hostNetwork: true
then? And the debug endpoint by default only caches the pods that are on the same physical host to reduce memory pressure.
If you haven't done so yet you should:
/debug
endpoint infoCorrect, the hostnetwork:true has been removed prior to testing this, and the logs were gathered by docker exec on the same host where the workload is located (the same applies to the debug endpoint).
Also, I've noticed that the eni interface on the host does not have an ipv4 address (eth0 does, however) - is that expected behavior is that causing the aforementioned error?
@Jaykah I believe the eni1234567890 interfaces created by Calico do not require IP addresses, as my (very limited) understanding is Calico operates off the MAC address, making the host the next MAC address (to apply policy), rather than acting an IP router.
Having the same issue. If you need any information about the case just tell me and I'll post here all the info
I have the same problem with 0.10.8 but only sporadically on random nodes and recreating the bad kube2iam pod usually fixes the problem.
I notice that the bad node is not logging the http calls like the good node is doing.
Any help in debugging this would be appreciated.
Bad Node time="2019-10-09T04:25:45Z" level=debug msg="Pod OnAdd" pod.iam.role= pod.name=dns-client pod.namespace=default pod.status.ip= pod.status.phase=Pending time="2019-10-09T04:25:45Z" level=debug msg="Pod OnUpdate" pod.iam.role= pod.name=dns-client pod.namespace=default pod.status.ip= pod.status.phase=Pending time="2019-10-09T04:25:46Z" level=debug msg="Pod OnUpdate" pod.iam.role= pod.name=dns-client pod.namespace=default pod.status.ip=10.1.49.31 pod.status.phase=Running
dnstools# curl http://169.254.169.254/latest/meta-data/iam/security-credentials/ pod with specificed IP not found dnstools# curl -s http://169.254.169.254/debug/store | jq '.rolesByIP' | grep $(hostname -i) dnstools#
Good Node time="2019-10-09T05:04:30Z" level=debug msg="Pod OnAdd" pod.iam.role= pod.name=dns-client pod.namespace=default pod.status.ip= pod.status.phase=Pending time="2019-10-09T05:04:30Z" level=debug msg="Pod OnUpdate" pod.iam.role= pod.name=dns-client pod.namespace=default pod.status.ip= pod.status.phase=Pending time="2019-10-09T05:04:31Z" level=debug msg="Pod OnUpdate" pod.iam.role= pod.name=dns-client pod.namespace=default pod.status.ip=10.1.69.25 pod.status.phase=Running time="2019-10-09T05:04:48Z" level=info msg="GET /latest/meta-data/iam/security-credentials/ (404) took 3157708446.000000 ns" req.method=GET req.path=/latest/meta-data/iam/security-credentials/ req.remote=10.1.69.25 res.duration=3.157708446e+09 res.status=404 time="2019-10-09T05:04:58Z" level=info msg="GET /debug/store (200) took 106885.000000 ns" req.method=GET req.path=/debug/store req.remote=10.1.69.25 res.duration=106885 res.status=200
dnstools# curl http://169.254.169.254/latest/meta-data/iam/security-credentials/ unable to find role for IP 10.1.69.25 dnstools# curl -s http://169.254.169.254/debug/store | jq '.rolesByIP' | grep $(hostname -i) "10.1.69.25": "", dnstools#
Hey, I have installed kube2iam in kops cluster where the networking we are using in weave, this works perfectly fine if the test pods ran without host network set to true, gave me access denied error and doing curl gave me the right IAM role passed as part of pod annotation. But when the pod ran with host network set to true it gets the access of host ec2 IAM role bypassing the annotation of IAM role passed to that pod and able to query AWS services that the ec2 IAM role has, long story short when used host network set to true inside the pod the host IAM role is inherited and gets the permissions of what the host has and not able to use the restricted IAM role passed as part of the pod annotation, is it the right behavior? does host network true does not make use of pod annotations IAM role?
This could open a scenario and security concern as when pods use host docker-engine when you run docker container inside a pod using --net=host, it will pick up host IAM role and its permissions, though the pod runs without host network as true and just because it is using the docker engine from host though it can use the IAM role passed as part of annotation if we run a container inside that pod with --net=host it will bypass the annotation passed to the pod and inherits ec2 IAM role.
kubectl exec -it jdk12 bash
----> this pod does not run with hostNetwork: true
once we exec inside the pod and run a container as:
apache@jdk12:/home/apache$ docker run -d -it --net=host fstab/aws-cli:latest
Once we run with --net=host inside the pod and exec it staright away inherits host IAM role.
apache@jdk12:/home/apache$ docker exec -it d13dc46a83a7 bash -l
Pod YAML: ---> straight away inherits host IAM role when started with host network as true, without host network if we follow the above steps we get access to that host IAM permissions.
when run with host network: true it does not make use of iam.amazonaws.com/role: Test-Limited and gets the IAM role of host, but works fine when commented host network: true
apiVersion: v1
kind: Pod
metadata:
name: aws-cli
labels:
name: aws-cli
annotations:
iam.amazonaws.com/role: Test-Limited
spec:
hostNetwork: true
containers:
- image: fstab/aws-cli:latest
command: [ "/bin/bash", "-c", "--" ]
args: [ "while true; do sleep 9200000; done;" ]
name: aws-cli
env:
- name: AWS_DEFAULT_REGION
value: us-east-1
kube2iam DS yaml:
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
generation: 2
labels:
app.kubernetes.io/instance: kube2
app.kubernetes.io/managed-by: Tiller
app.kubernetes.io/name: kube2iam
helm.sh/chart: kube2iam-2.1.0
name: kube2-kube2iam
namespace: kube-system
spec:
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/instance: kube2
app.kubernetes.io/name: kube2iam
template:
metadata:
creationTimestamp: null
labels:
app.kubernetes.io/instance: kube2
app.kubernetes.io/name: kube2iam
spec:
containers:
- args:
- --host-interface=weave
- --node=$(NODE_NAME)
- --auto-discover-base-arn
- --auto-discover-default-role=true
- --use-regional-sts-endpoint
- --host-ip=$(HOST_IP)
- --iptables=true
- --verbose
- --debug
- --app-port=8181
- --metrics-port=8181
env:
- name: AWS_DEFAULT_REGION
value: us-east-1
- name: HOST_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
- name: HTTPS_PROXY
value: http://test-proxy.us-east-1.aws:4438
- name: HTTP_PROXY
value: http://test-proxy.us-east-1.aws:4438
- name: NO_PROXY
value: .dist.kope.io,ec2.us-east-1.amazonaws.com,.s3.amazonaws.com,127.0.0.1,localhost,.k8s.local,.elb.amazonaws.com,100.96.9.0/11,.ec2.internal,api.qacd.k8s.local,api.internal.,internal.,.elb.us-east-1.amazonaws.com,elasticloadbalancing.us-east-1.amazonaws.com,autoscaling.us-east-1.amazonaws.com,178.28.0.1
- name: http_proxy
value: http://test-proxy.us-east-1.aws:4438
- name: https_proxy
value: http://test-proxy.us-east-1.aws:4438
- name: no_proxy
value: .dist.kope.io,ec2.us-east-1.amazonaws.com,.s3.amazonaws.com,127.0.0.1,localhost,.k8s.local,.elb.amazonaws.com,100.96.9.0/11,.ec2.internal,api.qacd.k8s.local,api.internal.,internal.,.elb.us-east-1.amazonaws.com,elasticloadbalancing.us-east-1.amazonaws.com,autoscaling.us-east-1.amazonaws.com,178.28.0.1
image: jtblin/kube2iam:0.10.7
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 8181
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 1
name: kube2iam
ports:
- containerPort: 8181
hostPort: 8181
name: http
protocol: TCP
resources: {}
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
hostNetwork: true
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: kube2-kube2iam
serviceAccountName: kube2-kube2iam
terminationGracePeriodSeconds: 30
templateGeneration: 2
updateStrategy:
type: OnDelete
status:
currentNumberScheduled: 2
desiredNumberScheduled: 2
numberAvailable: 2
numberMisscheduled: 0
numberReady: 2
observedGeneration: 2
updatedNumberScheduled: 2
OK, this is already a pretty long thread, but I ran into the same issue and spent a long time trying to get this to work with my eks cluster. Here are some of my learnings. I install the helm chart for kube2iam (with helm3) like so:
helm upgrade \
kube2iam \
./ \
--install \
--values values.yaml \
--set extraArgs.log-level="debug" \
--set extraArgs.base-role-arn=arn:aws:iam::$account:role/ \
--set host.iptables=true \
--set host.interface=eni+ \
--set aws.region=$AWS_REGION \
--namespace kube-system \
--debug
where $account is the AWS account number, and $AWS_REGION is the region (e.g. us-east-1). I found that I needed to set the host.iptables to true, and host.interface to eni+ for things having a chance to work on my 1.21 EKS cluster. Older EKS clusters use docker, so this applies to 1.21+
Next is the worker node role assigned to each worker node in the cluster. To find that, you can get a shell to the node in question (I use the krew plugin called node-shell) with something like kubectl node-shell ip-10-128-48-155.us-east-2.compute.internal
(you can do kubectl get nodes
to get the names). Once in the shell, we can use:
curl http://169.254.169.254/latest/meta-data/iam/security-credentials/ && echo
to get the name of the role that all the workers have.
Next go to the AWS console (IAM->roles), and inspect the role, noting its name. In the permission tab, it should have the following policies attached:
AmazonEKSWorkerNodePolicy
AmazonEC2ContainterRegistryReadOnly
AmazonEKS_CNI_Policy
Click on the trust relationships tab - it should have a trusted entity that looks like this:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "EKSWorkerAssumeRole",
"Effect": "Allow",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
What that does is it allows nodes to assume roles (EC2 instances that are worker nodes), IF the role we are trying to assume allows it. Now any containers running on those nodes will inherit this privilege, including the kube2iam pod running on each worker. This allows kube2iam to call aws sts to assume the role on behalf of the pod.
Next we have to consider the role that is to be assumed by the pod. It consists of two parts, the attached policy -- and the trust relationship. The trust relationship basically says this role can be assumed by X, where X is a user, a role or a Service. My trust relationship looks like this:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::AWS_ACCOUNT_NO:role/noted_name_from_above",
"Service": [
"codedeploy.amazonaws.com",
"ec2.amazonaws.com"
]
},
"Action": "sts:AssumeRole"
}
]
}
Here we're saying -- if you are codedeploy, or ec2, or if you have the above role, you can use the role. Now the role itself does nothing if a policy is not attached. My example here is for an nginx reverse proxy -- it must be able to create ELB load balancers, so the policy looks like so:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "elasticloadbalancing:*",
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"ec2:DescribeAccountAttributes",
"ec2:DescribeAddresses",
"ec2:DescribeInternetGateways",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSubnets",
"ec2:DescribeVpcs",
"ec2:DescribeVpcClassicLink",
"ec2:DescribeInstances",
"ec2:DescribeNetworkInterfaces",
"ec2:DescribeClassicLinkInstances",
"ec2:DescribeRouteTables",
"ec2:DescribeCoipPools",
"ec2:GetCoipPoolUsage",
"ec2:DescribeVpcPeeringConnections",
"cognito-idp:DescribeUserPoolClient"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": "iam:CreateServiceLinkedRole",
"Resource": "*",
"Condition": {
"StringEquals": {
"iam:AWSServiceName": "elasticloadbalancing.amazonaws.com"
}
}
}
]
}
When all this is done, you still need the pod annotation. So my helm deployment looks like this:
helm3 upgrade \
nginx-ingress-$env \
./ \
--install \
--values ./values.yaml \
--set env=$env \
--set controller.ingressClassResource.name=$ingress_class \
--set controller.publishService.enabled=true \
--set controller.ingressClass=$ingress_class \
--set controller.nodeSelector.eks_namespace=$env \
--set controller.podAnnotations.'iam\.amazonaws\.com/role'=$ingress_role_arn \
--set controller.service.annotations.'kubernetes\.io/ingress\.class'=$ingress_class \
--set controller.service.annotations.'service\.beta\.kubernetes\.io/aws-load-balancer-ssl-cert'=$cert_arn \
--set controller.service.annotations.'service\.beta\.kubernetes\.io/aws-load-balancer-backend-protocol'="http" \
--set controller.service.annotations.'service\.beta\.kubernetes\.io/aws-load-balancer-ssl-ports'="443" \
--set controller.service.annotations.'service\.beta\.kubernetes\.io/aws-load-balancer-connection-idle-timeout'="60" \
--set controller.service.annotations.'service\.beta\.kubernetes\.io/aws-load-balancer-extra-security-groups'=$elb_sg_id \
--set-string controller.autoscaling.enabled="true" \
--set-string controller.metrics.enabled="false" \
--set-string controller.metrics.serviceMonitor.enabled="false" \
--set-string controller.prometheus.create="false" \
--set-string controller.prometheus.port="8080" \
--set-string controller.prometheus.scheme="http" \
--set-string controller.enable-prometheus-metrics="false" \
--set-string controller.metrics.service.servicePort="10254" \
--set-string controller.metrics.service.type="ClusterIP" \
--namespace $env \
--debug
note the line --set controller.podAnnotations.'iam\.amazonaws\.com/role'=$ingress_role_arn
-- this is where the annotation is set, or in plain yaml look at the deployment example on hte kube2iam github readme -- the line of interest is
iam.amazonaws.com/role: role-arn
now the role-arn is the the arn for the role we just created (not the worker-node role arn).
To test if the role was actually assumed, I do this:
pod=$(kubectl get pod -n dev | grep nginx | cut -f1 -d ' ')
kubectl exec -it $pod -- sh
curl http://169.254.169.254/latest/meta-data/iam/security-credentials/ && echo
This should display the desired role name on the terminal. If you want to be absolutely sure that the pod has the correct aws permissions, you can install the aws cli into the pod. Without providing ~/.aws/credentials, all the aws commands should work without extra permissions. If you allowed all access to S3, aws s3 ls should then show a list of buckets.
Running kube2iam latest on AWS EKS Ubuntu AMI, I am unable to assume any roles
The debug page only displays the following:
kube2iam logs are not even showing any activity:
The daemon is running:
One of the pods described:
Here is the deployment:
Have tried removing/re-creating pods, iptables, creating a new key, deploying the roles from scratch.
Manual impersonation with AWS cli works, so no issues with delegation.
Any tips on troubleshooting this?