weaveworks / weave

Simple, resilient multi-host containers networking and more.
https://www.weave.works
Apache License 2.0
6.62k stars 670 forks source link

Rbac Error while installing weave-net on GKE #3111

Closed philicious closed 6 years ago

philicious commented 7 years ago

What you expected to happen?

Installation works and weave-net available

What happened?

Seeing errors when installing and weave-net route/eth missing in pods.

kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"                                          
serviceaccount "weave-net" configured
clusterrolebinding "weave-net" configured
daemonset "weave-net" configured
Error from server (Forbidden): clusterroles.rbac.authorization.k8s.io "weave-net" is forbidden: attempt to grant extra privileges: [PolicyRule{Resources:["pods"], APIGroups:[""], Verbs:["get"]} PolicyRule{Resources:["pods"], APIGroups:[""], Verbs:["list"]} PolicyRule{Resources:["pods"], APIGroups:[""], Verbs:["watch"]} PolicyRule{Resources:["namespaces"], APIGroups:[""], Verbs:["get"]} PolicyRule{Resources:["namespaces"], APIGroups:[""], Verbs:["list"]} PolicyRule{Resources:["namespaces"], APIGroups:[""], Verbs:["watch"]} PolicyRule{Resources:["nodes"], APIGroups:[""], Verbs:["get"]} PolicyRule{Resources:["nodes"], APIGroups:[""], Verbs:["list"]} PolicyRule{Resources:["nodes"], APIGroups:[""], Verbs:["watch"]} PolicyRule{Resources:["networkpolicies"], APIGroups:["extensions"], Verbs:["get"]} PolicyRule{Resources:["networkpolicies"], APIGroups:["extensions"], Verbs:["list"]} PolicyRule{Resources:["networkpolicies"], APIGroups:["extensions"], Verbs:["watch"]}] user=&{ops@moqops.com  [system:authenticated] map[]} ownerrules=[PolicyRule{Resources:["selfsubjectaccessreviews"], APIGroups:["authorization.k8s.io"], Verbs:["create"]} PolicyRule{NonResourceURLs:["/api" "/api/*" "/apis" "/apis/*" "/healthz" "/swaggerapi" "/swaggerapi/*" "/version"], Verbs:["get"]}] ruleResolutionErrors=[]

Anything else we need to know?

Versions:

$ weave version
kubectl exec -n kube-system weave-net-5kglk -c weave -- /home/weave/weave --local status                                                           
        Version: 2.0.4 (up to date; next check at 2017/09/04 17:28:43)

        Service: router
       Protocol: weave 1..2
           Name: a2:cc:c8:f2:2e:70(gke-production-default-pool-60a75030-dxkl)
     Encryption: disabled
  PeerDiscovery: enabled
        Targets: 3
    Connections: 3 (2 established, 1 failed)
          Peers: 3 (with 6 established connections)
 TrustedSubnets: none

        Service: ipam
         Status: ready
          Range: 10.32.0.0/12
  DefaultSubnet: 10.32.0.0/12

$ kubectl version
kubectl version                                                                                                                                   
Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.3", GitCommit:"2c2fe6e8278a5db2d15a013987b53968c743f2a1", GitTreeState:"clean", BuildDate:"2017-08-03T07:00:21Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.3", GitCommit:"2c2fe6e8278a5db2d15a013987b53968c743f2a1", GitTreeState:"clean", BuildDate:"2017-08-03T06:43:48Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

Logs:

$ kubectl logs -n kube-system <weave-net-pod> weave

modprobe: can't load module ip_set (kernel/net/netfilter/ipset/ip_set.ko): Operation not permitted
Ignore the error if "xt_set" is built-in in the kernel
INFO: 2017/09/01 13:56:27.195307 Command line options: map[port:6783 db-prefix:/weavedb/weave-net docker-api: host-root:/host ipalloc-init:consensus=3 ipalloc-range:10.32.0.0/12 nickname:gke-production-default-pool-60a75030-dxkl no-dns:true status-addr:0.0.0.0:6782 conn-limit:30 datapath:datapath expect-npc:true http-addr:127.0.0.1:6784]
INFO: 2017/09/01 13:56:27.195399 weave  2.0.4
WARN: 2017/09/01 13:56:27.199564 Skipping bridge creation of "bridged_fastdp" due to: : bridge not supported
INFO: 2017/09/01 13:56:27.290374 Bridge type is bridge
INFO: 2017/09/01 13:56:27.290401 Communication between peers is unencrypted.
INFO: 2017/09/01 13:56:27.394665 Our name is a2:cc:c8:f2:2e:70(gke-production-default-pool-60a75030-dxkl)
INFO: 2017/09/01 13:56:27.394756 Launch detected - using supplied peer list: [10.132.0.5 10.132.0.7 10.132.0.6]
INFO: 2017/09/01 13:56:27.394787 Checking for pre-existing addresses on weave bridge
INFO: 2017/09/01 13:56:27.622863 [allocator a2:cc:c8:f2:2e:70] No valid persisted data
INFO: 2017/09/01 13:56:27.629105 [allocator a2:cc:c8:f2:2e:70] Initialising via deferred consensus
INFO: 2017/09/01 13:56:27.629171 Sniffing traffic on vethwe-pcap (via pcap)
INFO: 2017/09/01 13:56:27.654477 ->[10.132.0.7:6783] attempting connection
INFO: 2017/09/01 13:56:27.655110 ->[10.132.0.6:6783] attempting connection
INFO: 2017/09/01 13:56:27.655233 ->[10.132.0.5:6783] attempting connection
INFO: 2017/09/01 13:56:27.655506 ->[10.132.0.7:38832] connection accepted
INFO: 2017/09/01 13:56:27.656208 ->[10.132.0.7:38832|a2:cc:c8:f2:2e:70(gke-production-default-pool-60a75030-dxkl)]: connection shutting down due to error: cannot connect to ourself
INFO: 2017/09/01 13:56:27.656330 ->[10.132.0.5:6783] error during connection attempt: dial tcp4 :0->10.132.0.5:6783: getsockopt: connection refused
INFO: 2017/09/01 13:56:27.656546 ->[10.132.0.7:6783|a2:cc:c8:f2:2e:70(gke-production-default-pool-60a75030-dxkl)]: connection shutting down due to error: cannot connect to ourself
INFO: 2017/09/01 13:56:27.656613 ->[10.132.0.6:6783] error during connection attempt: dial tcp4 :0->10.132.0.6:6783: getsockopt: connection refused
INFO: 2017/09/01 13:56:27.659845 Listening for HTTP control messages on 127.0.0.1:6784
INFO: 2017/09/01 13:56:27.660315 Listening for metrics requests on 0.0.0.0:6782
INFO: 2017/09/01 13:56:27.774798 ->[10.132.0.6:32842] connection accepted
INFO: 2017/09/01 13:56:27.777221 ->[10.132.0.6:32842|82:07:d7:3b:b6:61(gke-production-default-pool-60a75030-zzkg)]: connection ready; using protocol version 2
INFO: 2017/09/01 13:56:27.777396 overlay_switch ->[82:07:d7:3b:b6:61(gke-production-default-pool-60a75030-zzkg)] using sleeve
INFO: 2017/09/01 13:56:27.777566 ->[10.132.0.6:32842|82:07:d7:3b:b6:61(gke-production-default-pool-60a75030-zzkg)]: connection added (new peer)
INFO: 2017/09/01 13:56:27.782615 ->[10.132.0.6:32842|82:07:d7:3b:b6:61(gke-production-default-pool-60a75030-zzkg)]: connection fully established
INFO: 2017/09/01 13:56:27.783019 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2017/09/01 13:56:27.785418 sleeve ->[10.132.0.6:6783|82:07:d7:3b:b6:61(gke-production-default-pool-60a75030-zzkg)]: Effective MTU verified at 1398
mkdir: can't create directory '/host/opt/cni/': Read-only file system
INFO: 2017/09/01 13:56:27.866023 ->[10.132.0.5:37555] connection accepted
INFO: 2017/09/01 13:56:27.868264 ->[10.132.0.5:37555|56:0f:41:73:61:70(gke-production-default-pool-60a75030-4h0t)]: connection ready; using protocol version 2
INFO: 2017/09/01 13:56:27.868931 overlay_switch ->[56:0f:41:73:61:70(gke-production-default-pool-60a75030-4h0t)] using sleeve
INFO: 2017/09/01 13:56:27.869065 ->[10.132.0.5:37555|56:0f:41:73:61:70(gke-production-default-pool-60a75030-4h0t)]: connection added (new peer)
10.40.0.0
INFO: 2017/09/01 13:56:28.348805 Discovered local MAC a2:cc:c8:f2:2e:70
INFO: 2017/09/01 13:56:28.371912 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2017/09/01 13:56:28.372056 ->[10.132.0.5:37555|56:0f:41:73:61:70(gke-production-default-pool-60a75030-4h0t)]: connection fully established
INFO: 2017/09/01 13:56:28.373135 sleeve ->[10.132.0.5:6783|56:0f:41:73:61:70(gke-production-default-pool-60a75030-4h0t)]: Effective MTU verified at 1398
INFO: 2017/09/01 13:56:28.491258 Discovered remote MAC 82:07:d7:3b:b6:61 at 82:07:d7:3b:b6:61(gke-production-default-pool-60a75030-zzkg)
INFO: 2017/09/01 13:56:28.633075 Discovered remote MAC 56:0f:41:73:61:70 at 56:0f:41:73:61:70(gke-production-default-pool-60a75030-4h0t)
INFO: 2017/09/01 14:07:27.398798 Expired MAC 82:07:d7:3b:b6:61 at 82:07:d7:3b:b6:61(gke-production-default-pool-60a75030-zzkg)
INFO: 2017/09/01 14:07:27.398843 Expired MAC 56:0f:41:73:61:70 at 56:0f:41:73:61:70(gke-production-default-pool-60a75030-4h0t)
INFO: 2017/09/01 14:07:27.399054 Expired MAC a2:cc:c8:f2:2e:70 at a2:cc:c8:f2:2e:70(gke-production-default-pool-60a75030-dxkl)
ERRO: 2017/09/04 10:33:31.065215 template: dnsEntries:1:30: executing "dnsEntries" at <.DNS.Domain>: can't evaluate field Domain in type *nameserver.Status

Network:

in the pods, I see the weave-net route missing

kubectl exec -it XXX sh                                                                                            
/usr/local/tomcat # ip route show
default via 10.4.1.1 dev eth0
10.4.1.0/24 dev eth0  src 10.4.1.205

Extra info

excerpt of kubectl describe nodes

System Info:
 Machine ID:            df894e482a2fc05ee3feacbe057e0b1f
 System UUID:           DF894E48-2A2F-C05E-E3FE-ACBE057E0B1F
 Boot ID:           22c984cf-4976-47ce-ac68-2775b8418b2c
 Kernel Version:        4.4.52+
 OS Image:          Container-Optimized OS from Google
 Operating System:      linux
 Architecture:          amd64
 Container Runtime Version: docker://1.11.2
 Kubelet Version:       v1.7.2
 Kube-Proxy Version:        v1.7.2

My best guess would be that these are problematic

philicious commented 7 years ago

btw I'm seeing a somewhat similar error with weave-scope

kubectl apply --namespace kube-system -f "https://cloud.weave.works/k8s/scope.yaml?k8s-version=$(kubectl version | base64 | tr -d '\n')"          
serviceaccount "weave-scope" configured
clusterrolebinding "weave-scope" configured
deployment "weave-scope-app" configured
service "weave-scope-app" configured
daemonset "weave-scope-agent" configured
Error from server (Forbidden): clusterroles.rbac.authorization.k8s.io "weave-scope" is forbidden: attempt to grant extra privileges: [PolicyRule{Resources:["*"], APIGroups:["*"], Verbs:["*"]} PolicyRule{NonResourceURLs:["*"], Verbs:["*"]}] user=&{ops@moqops.com  [system:authenticated] map[]} ownerrules=[PolicyRule{Resources:["selfsubjectaccessreviews"], APIGroups:["authorization.k8s.io"], Verbs:["create"]} PolicyRule{NonResourceURLs:["/api" "/api/*" "/apis" "/apis/*" "/healthz" "/swaggerapi" "/swaggerapi/*" "/version"], Verbs:["get"]}] ruleResolutionErrors=[]

however weave-scope seems to work. i.e. I can open the dashboard and it shows my pods, containers, process etc

bboreham commented 7 years ago

@philicious are you running kubelet configured to call CNI? That 10.4.1.0/24 range is nothing to do with Weave Net, would be consistent with kubelet using non-CNI networking.

The rbac error is a bit cryptic, but I think it means the user you are calling kubectl as does not have the required permissions.

How is your Kubernetes installed? E.g from kubeadm or from instructions found somewhere?

philicious commented 7 years ago

@bboreham its GKE and its installed as simple as gcloud container clusters create production --zone europe-west1-d --machine-type n1-standard-2 --num-nodes 3 --cluster-version 1.7.2

the rbac error also surprised as the user I'm using is the owner of the GCP project. so it has all possible rights.

philicious commented 7 years ago

@bboreham ye so the reason for the CNI plugin not being picked up is that --network-plugin is an k8s alpha feature as I just noticed and that cluster doesnt have alpha enabled. thx for the hint.

so whats left from this issue is the rbac errors. as weave-net wont work for me in this scenario, we could also close this issue if you are not "interested" in the rbac errors ?

bboreham commented 7 years ago

I don't know much about GKE but I see lots of versions of the same answer, e.g. https://www.weave.works/docs/tutorials/kubernetes/cloud-on-gke/ https://coreos.com/operators/prometheus/docs/latest/troubleshooting.html

bboreham commented 7 years ago

Can I ask what you were looking to get from Weave Net, since GKE already provides a container network managed by Google? (There are lots of valid answers, just interested to know which one(s) apply here)

philicious commented 7 years ago

I was hoping to get support for Multicast. The project I'm currently working for uses vert.x microservices framework which uses e.g. multicast for discovery. the devs are struggling getting the discovery methods designed for running on k8s to work. so I was evaluating a more platform-side solution to this problem

philicious commented 7 years ago

@bboreham I can confirm that running kubectl create clusterrolebinding myname-cluster-admin-binding --clusterrole=cluster-admin --user=myname@example.org prior to running the weave-scope or weave-net install scripts, fixes the rbac error. i.e. its not happening. so maybe the install doc needs to be updated with that precious info

philicious commented 7 years ago

btw I never saw that page you linked https://www.weave.works/docs/tutorials/kubernetes/cloud-on-gke/ and only the docs from here https://www.weave.works/docs/net/latest/kubernetes/kube-addon/ as I havent used weave cloud before but only the community stuff.

otherwise I would have found that command myself I guess :P

murali-reddy commented 6 years ago

This is documented as prerequisite in https://cloud.google.com/kubernetes-engine/docs/how-to/role-based-access-control

You must grant your user the ability to create roles in Kubernetes by running the following command. [USER_ACCOUNT] is the user's email address:

kubectl create clusterrolebinding cluster-admin-binding \
--clusterrole cluster-admin --user [USER_ACCOUNT]
murali-reddy commented 6 years ago

Necessary documentation is added in #3412