Closed rwlove closed 4 years ago
It is likely that rbac is preventing ovnkube from accessing kube API server resources.
kubectl create -f vagrant/ovnkube-rbac.yaml
And then try starting node service again.
You can also just follow the scripts in vagrant/provisioning/setup-master.sh and vagrant/provisioning/setup-minion.sh to get things to work.
Thanks, shettyg.
I am now creating the ovnkube-rbac, on the master, before running ovnkube, but I still get errors:
E0528 13:59:29.661441 47598 reflector.go:205] github.com/ovn-org/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list v1.NetworkPolicy: networkpolicies.networking.k8s.io is forbidden: User "system:node:ae11-13-wp" cannot list resource "networkpolicies" in API group "networking.k8s.io" at the cluster scope E0528 13:59:29.663650 47598 reflector.go:205] github.com/ovn-org/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list v1.Namespace: namespaces is forbidden: User "system:node:ae11-13-wp" cannot list resource "namespaces" in API group "" at the cluster scope E0528 13:59:29.663816 47598 reflector.go:205] github.com/ovn-org/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1.Endpoints: endpoints is forbidden: User "system:node:ae11-13-wp" cannot list resource "endpoints" in API group "" at the cluster scope
I'd also like to mention that in this run I am supplying the '-k8s-kubeconfig /etc/kubernetes/kubelet.conf' option to ovnkube. I am not sure whether I need to supply -k8s-kubeconfig or where I should be getting that config file for the nodes. I use kubeadm to initialize my cluster. kubeadm generates an admin.conf that I copy into ~/.kube/conf, on master. However, I'm not really sure what to supply for the -k8s-kubeconfig on the nodes. kubelet.conf was in /etc/kubernetes, on each node, and seemed to have relevant information like 'server: https://10.10.3.9:6443', however the port is ignored so I have to also provide '-k8s-apiserver="http://$CENTRAL_IP:6443"' So, I've been trying sometimes with -k8s-kubeconfig=kubelet.conf and sometimes without that option.
@rwlove
The vagrant uses kubeadm too. And the minion there uses tokens and no -k8s-kubeconfig See here:
https://github.com/ovn-org/ovn-kubernetes/blob/master/vagrant/provisioning/setup-minion.sh#L169
@shettyg
Got it. No need for the k8s-kubeconfig. I've removed it. Thank you.
I looked through the setup-minion.sh and I don't see anything that would help with my above error messages. It seems that ovnkube is trying to query the API Server with user "system:node:ae11-13-wp". The ovnkube-rbac.yaml is configuring the user 'ovnkube'.
I am not well versed with kubernetes rbac. Not quite sure how the ovnkube user is becoming "system:node:ae11-13-wp".
@dcbw @girishmg Any ideas?
I noticed that ./go-controller/vendor/k8s.io/client-go/util/certificate/csr/csr.go:54 is the only place that "system:node:" + string(nodeName) is set in function RequestNodeCertificate, but I don't see anything calling that code other than csr_test.go:64. I'm unsure where to go from there; maybe my grep'ing was wrong.
Also, since RequestNodeCertificate is dealing with certs I added the following line to my ovnkube options:
-k8s-cacert=/etc/kubernetes/pki/ca.crt
The current ovnkube commands are:
(note hostnames/CIDR-blocks have changed and are not consistent with previous comments)
master:
#!/bin/bash
CENTRAL_IP=10.10.3.12
CLUSTER_IP_SUBNET=2.2.0.0/16
NODE_NAME=ae11-18-wp
SERVICE_IP_SUBNET=6.6.0.0/16
TOKEN=abcdef.0123456789abcdef
LOG_DIR=/var/log/ovn-kubernetes
NEXTHOP=`ip route | grep enp24s0f0 | grep via | cut -d ' ' -f 3`
[ -d "${LOG_DIR}" ] || mkdir -p ${LOG_DIR}
ovn-nbctl set-connection ptcp:6641
ovn-sbctl set-connection ptcp:6642
# To open up Geneve port. Not sure this is necessary,
# remove after OVN-Kubernetes is working
/usr/share/openvswitch/scripts/ovs-ctl \
--protocol=udp \
--dport=6081 \
enable-protocol
kubectl create -f /tmp/ovn/vagrant/ovnkube-rbac.yaml
nohup ovnkube \
-k8s-kubeconfig /root/.kube/config \
-net-controller \
-logfile="/var/log/ovn-kubernetes/ovnkube.log" \
-loglevel=5 \
-k8s-apiserver="http://$CENTRAL_IP:6443" \
-k8s-cacert=/etc/kubernetes/pki/ca.crt \
-init-master=$NODE_NAME -init-node=$NODE_NAME \
-cluster-subnet="$CLUSTER_IP_SUBNET" \
-service-cluster-ip-range=$SERVICE_IP_SUBNET \
-nodeport \
-init-gateways -gateway-interface=enp24s0f0 -gateway-nexthop="$NEXTHOP" \
-k8s-token="$TOKEN" \
-nb-address="tcp://$CENTRAL_IP:6641" \
-sb-address="tcp://$CENTRAL_IP:6642" 2>&1 &
minion/node:
#!/bin/bash
CENTRAL_IP=10.10.3.12
CLUSTER_IP_SUBNET=2.2.0.0/16
NODE_NAME=ae11-18-wp
SERVICE_IP_SUBNET=6.6.0.0/16
TOKEN=abcdef.0123456789abcdef
LOG_DIR=/var/log/ovn-kubernetes
NEXTHOP=`ip route | grep enp24s0f0 | grep via | cut -d ' ' -f 3`
[ -d "${LOG_DIR}" ] || mkdir -p ${LOG_DIR}
# To open up Geneve port. Not sure this is necessary,
# remove after OVN-Kubernetes is working
/usr/share/openvswitch/scripts/ovs-ctl \
--protocol=udp \
--dport=6081 \
enable-protocol
nohup ovnkube \
-loglevel=5 \
-logfile="/var/log/ovn-kubernetes/ovnkube.log" \
-k8s-apiserver="http://$CENTRAL_IP:6443" \
-k8s-cacert=/etc/kubernetes/pki/ca.crt \
-init-node="$NODE_NAME" \
-nodeport \
-nb-address="tcp://$CENTRAL_IP:6641" \
-sb-address="tcp://$CENTRAL_IP:6642" \
-k8s-token="$TOKEN" \
-init-gateways -gateway-interface=enp24s0f0 -gateway-nexthop="$NEXTHOP" \
-service-cluster-ip-range=$SERVICE_IP_SUBNET \
-cluster-subnet=$CLUSTER_IP_SUBNET 2>&1 &
@rwlove
It still does not work, correct?
I updated the README to use daemonsets instead of doing this manually. The current README became README_MANUAL.md. You can probably try daemonsets?
About doing this manually, can you look at setup-master.sh to see that you are using kubeadm the same way too? It is probably something to do with kubernetes setup itself.
OK, I'll revisit my kubeadm commands.
A few data points- I am using the same kubeadm initialization, via Ansible, for Calico (IPIP), Calico (BGP peering with switches) and Cilium. My assumption was that my base-k8s install was sound. I'm just swapping out the network solution. Below is my kubeadm configuration file.
apiVersion: kubeadm.k8s.io/v1beta1
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: '{{ master_ip }}'
bindPort: 6443
nodeRegistration:
criSocket: /var/run/dockershim.sock
name: '{{ ansible_hostname }}'
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
---
apiServer:
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta1
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controlPlaneEndpoint: ""
controllerManager: {}
dns:
type: CoreDNS
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: k8s.gcr.io
kind: ClusterConfiguration
kubernetesVersion: v1.14.0
networking:
dnsDomain: cluster.local
podSubnet: '{{ pod_cidr }}'
serviceSubnet: '{{ service_cidr }}'
scheduler: {}
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: iptables
I've spent the day working on setup-master.sh. I am not running Vagrant, so I am just executing it as a script run by root, on the master node. Here are the problems I encountered and my workarounds. I feel that I'm at the same state I was with my Anisble scripts, just with less error handling (setup-master.sh doesn't do any error handling).
1) ifconfig commands fail on Ubuntu 18.04. For example, the following command fails as there is no line with 'inet addr' in the ifconfig output.
MASTER1=`ifconfig enp0s8 | grep 'inet addr' | cut -d: -f2 | awk '{print $1}'
2) fails to install some packages
sudo apt-get install -y linux-image-extra-4.15.0-34-generic linux-image-extra-virtual
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package linux-image-extra-4.15.0-34-generic
E: Couldn't find any package by glob 'linux-image-extra-4.15.0-34-generic'
E: Couldn't find any package by regex 'linux-image-extra-4.15.0-34-generic'
3) uses absolute path for location of ovnkube-rbac.yaml (maybe this is a Vagrant thing)
- sudo kubectl create -f /vagrant/ovnkube-rbac.yaml
+ sudo kubectl create -f /root/ovnkube-rbac.yaml
4) had to hardcode token, again using /vagrant path
- TOKEN=`kubectl get secret/$SECRET -o yaml |grep "token:" | cut -f2 -d ":" | sed 's/^ *//' | base64 -d`
- echo $TOKEN > /vagrant/token
-
+ TOKEN=abcdef.0123456789abcdef
5) I don't see the point of the setup_master_args.sh script being generated. Where is it used?
6) Getting keys times-out
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 58118E89F3A912897C070ADBF76221572C52609D
7) Kublet fails to start because docker api version is too old.
setup-master.sh script installs 'Docker version 1.11.2, build b9f10c9'
My Ansible installs 'Docker version 17.12.1-ce, build 7390fc6'
kubeadm sates that the latest validated version: 18.09
Follow [these instructions](https://docs.docker.com/install/linux/docker-ce/ubuntu/) to get 'Docker version 18.09.6, build 481bc77'
8) WARNING about docker cgroup driver
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
After working through or ignoring these issues my kubernetes master is installed and:
Your Kubernetes control-plane has initialized successfully!
However, when ovnkube runs I get the following results:
root@ae11-09-wp:~# tail -n 6 nohup.out
E0529 13:50:02.091186 54261 reflector.go:205] github.com/ovn-org/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1.NetworkPolicy: Unauthorized
E0529 13:50:02.092202 54261 reflector.go:205] github.com/ovn-org/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1.Node: Unauthorized
E0529 13:50:02.093102 54261 reflector.go:205] github.com/ovn-org/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1.Service: Unauthorized
E0529 13:50:02.094348 54261 reflector.go:205] github.com/ovn-org/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1.Namespace: Unauthorized
E0529 13:50:02.095381 54261 reflector.go:205] github.com/ovn-org/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1.Endpoints: Unauthorized
E0529 13:50:02.096490 54261 reflector.go:205] github.com/ovn-org/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1.Pod: Unauthorized
root@ae11-09-wp:~# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-fb8b8dccf-62849 0/1 Pending 0 21m
kube-system coredns-fb8b8dccf-vtlrd 0/1 Pending 0 21m
kube-system etcd-ae11-09-wp 1/1 Running 0 20m
kube-system kube-apiserver-ae11-09-wp 1/1 Running 0 20m
kube-system kube-controller-manager-ae11-09-wp 1/1 Running 0 20m
kube-system kube-proxy-b57wb 1/1 Running 0 21m
kube-system kube-scheduler-ae11-09-wp 1/1 Running 0 20m
root@ae11-09-wp:~# kubectl -n kube-system logs kube-apiserver-ae11-09-wp | tail -n 1
E0529 20:45:12.651598 1 authentication.go:65] Unable to authenticate the request due to an error: invalid bearer token
What version of kubernetes?
On the master node, ovnkube running there tries to access the same kube-apiserver resources too. And you don't see this error in ovnkube in master? It is just in nodes?
The vagrant uses ubuntu 16.04 and installs kubernetes version v1.14.2. I tried it just now on my MAC laptop with the following diff applied and it worked fine.
diff --git a/vagrant/provisioning/setup-master.sh b/vagrant/provisioning/setup-master.sh
index 84bb1b60..5b14da11 100755
--- a/vagrant/provisioning/setup-master.sh
+++ b/vagrant/provisioning/setup-master.sh
@@ -32,7 +32,7 @@ OVN_EXTERNAL=$OVN_EXTERNAL
EOL
# Comment out the next line if you don't prefer daemonsets.
-DAEMONSET="true"
+#DAEMONSET="true"
# Comment out the next line, if you prefer TCP instead of SSL.
SSL="true"
diff --git a/vagrant/provisioning/setup-minion.sh b/vagrant/provisioning/setup-minion.sh
index b81ceb65..bc5f87eb 100755
--- a/vagrant/provisioning/setup-minion.sh
+++ b/vagrant/provisioning/setup-minion.sh
@@ -29,7 +29,7 @@ OVN_EXTERNAL=$OVN_EXTERNAL
EOL
# Comment out the next line if you don't prefer daemonsets.
-DAEMONSET="true"
+#DAEMONSET="true"
# Comment out the next line if you prefer TCP instead of SSL.
SSL="true"
Is it possible that your token is expired? It expires in 24 hrs or so.
Getting keys times-out sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 58118E89F3A912897C070ADBF76221572C52609D
This does happen sometimes. A repeat after a few minutes generally gets it to work.
setup-master.sh script installs 'Docker version 1.11.2, build b9f10c9' My Ansible installs 'Docker version 17.12.1-ce, build 7390fc6' kubeadm sates that the latest validated version: 18.09
That likely happened because of the failed keyserver. Otherwise, I see 17.05.0-ce when I just ran the vagrant.
I am just making wild guesses here. Your token looked short and likely used just for bootstrapping? The token that I get is of the form:
vagrant@k8smaster:~$ cat /vagrant/token
eyJhbGciOiJSUzI1NiIsImtpZCI6IiJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJkZWZhdWx0Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZWNyZXQubmFtZSI6Im92bmt1YmUtdG9rZW4tdzZ4ZG4iLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoib3Zua3ViZSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6ImJkMTM1YTkzLTgyNTctMTFlOS1hMWNkLTAyNWJjMGVlNzUwYSIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDpkZWZhdWx0Om92bmt1YmUifQ.SEWTwHTcD9gSJEoSIzau8_Oa496KE78Hh74s-htWTZqrQRonprvnoBbnbjwdikoXbR0_1LUvSmwwe88v-V9OWuz4pqBipawMGlm8p9awe4lvwPxcUvfOVHgPX9wlyDyWkMqBT6vcAPbKgfxrFZePg1npIXazGuvjMz_6PVz_rRfAjoovn-VZUVEGpodXg6RFWa-eYJBmhZXkMB-LCmS6nJSsRntUwoPi7KtU_wQMRek3k241EbzPkLXjc8q1qxnBeGW1ji2kT-CQoriTPhAMIQn5yaTXlJcmKlsagboFNt2d7DYKstvifmKlxMZmmfw-n-UY_eNSHR4Hil6vWBNZrg
Since the kube-apiserver logs show that there is an 'invalid bearer token' I added the following to my kubeadm command:
TOKEN=abcdef.0123456789abcdef
-k8s-token="$TOKEN" \
Which results in the following nohup.out output:
E0529 15:00:11.860485 61319 reflector.go:205] github.com/ovn-org/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1.Pod: pods is forbidden: User "system:bootstrap:abcdef" cannot list resource "pods" in API group "" at the cluster scope
E0529 15:00:11.862715 61319 reflector.go:205] github.com/ovn-org/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1.Namespace: namespaces is forbidden: User "system:bootstrap:abcdef" cannot list resource "namespaces" in API group "" at the cluster scope
E0529 15:00:11.865634 61319 reflector.go:205] github.com/ovn-org/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1.NetworkPolicy: networkpolicies.networking.k8s.io is forbidden: User "system:bootstrap:abcdef" cannot list resource "networkpolicies" in API group "networking.k8s.io" at the cluster scope
E0529 15:00:11.866234 61319 reflector.go:205] github.com/ovn-org/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1.Endpoints: endpoints is forbidden: User "system:bootstrap:abcdef" cannot list resource "endpoints" in API group "" at the cluster scope
E0529 15:00:11.867179 61319 reflector.go:205] github.com/ovn-org/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1.Node: nodes is forbidden: User "system:bootstrap:abcdef" cannot list resource "nodes" in API group "" at the cluster scope
E0529 15:00:11.867215 61319 reflector.go:205] github.com/ovn-org/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1.Service: services is forbidden: User "system:bootstrap:abcdef" cannot list resource "services" in API group "" at the cluster scope
I feel like I have some problem with my tokens...
Shouldn't the k8s-apiserver below use https
for both ovnkube master and node invocations.
-k8s-apiserver="http://$CENTRAL_IP:6443" \
@shettyg
I think I missed a few of your messages today. I'll try with daemonsets as soon as possible. I won't be able to work on this tomorrow, but I should have some time on Friday and certainly on Monday.
ovn-kubernetes is now working in my environment using daemonsets.
Thank you very much for your assistance, @shettyg!
@rwlove
You can delete kube-proxy as it is not needed.
Getting a bit off topic, but I have a few questions:
1) Previously I specified the gateway interface and the external gateway. Is there something I should be doing here? Everything seems to be working, so maybe this is fine.
-init-gateways -gateway-interface=enp24s0f0 -gateway-nexthop="$NEXTHOP" \
2) VxLAN - can I configure this via ovn-kubernetes, or do I need to do some hacking?
3) Is there a mailing list where these questions should be asked? Do you use the openvswitch lists for this codebase?
Previously I specified the gateway interface and the external gateway. Is there something I should be doing here? Everything seems to be working, so maybe this is fine.
The daemonsets uses a gateway mode, where we no longer take up physical interface. Instead we create a standalone OVS bridge and let iptables bridge external traffic to OVN/OVS. Not super efficient, but good enough to get started and more flexible.
Ideal situation is to use OVS outside daemonsets and OVN inside daemonsets. But daemonsets currently do not have option to specify physical gateways.
VxLAN - can I configure this via ovn-kubernetes, or do I need to do some hacking?
OVN does not use Vxlan. We use a next generation tunneling protocol called "geneve". There are NICs that allow geneve offloads (basically UDP offloads) to boost tunneling throughput (similar to vxlan offload nics). Geneve is easy for advanced network virtualization use cases that need more header space.
Is there a mailing list where these questions should be asked? Do you use the openvswitch lists for this codebase?
Issues here are a good place to ask ovn-kubernetes specific questions. Generic OVN questions need to be asked in mailing list - discuss@openvswitch.org
But daemonsets currently do not have option to specify physical gateways.
@shettyg we have a way to do it using daemonsets. See https://github.com/ovn-org/ovn-kubernetes/commit/3acdfa593657593498bfa3c5ec931057f5ecb394
@girishmg
Thanks. If you have time, can you please update README to include the additional information?
@shettyg will do
I moved on to something else and then when I came back ovn-kubernetes is not working. I am using Ansible to provision. Here are the problems I encountered:
1) README suggests to run a 'sudo apt-get build-dep dkms' when installing OVS, however, there are no instructions to install source URIs, so this command fails.
root@ae11-28-wp:~# sudo apt-get build-dep dkms
Reading package lists... Done
E: You must put some 'source' URIs in your sources.list
2) When installing OVS according to the README, the package install starts the openvswitch-vswitch process. As a result, ovnkube.sh fails at: "another process is currently managing ovs"
3) In my Ansible script, after installing OVS, I 'systemctl stop openvswitch-switch'. The next problem is that:
kubectl -n ovn-kubernetes logs ovnkube-node-jlgp6 -c ovn-controller
...
=============== ovn-controller - (wait for ovs)
=============== ovn-controller - (wait for ready_to_start_node)
info: Waiting for ready_to_start_node to come up, waiting 1s ...
info: Waiting for ready_to_start_node to come up, waiting 5s ...
...
I'll be poking around on this today...
For #3, ready_to_start_node suggested that the OVN DB needed to be running. It was not because of no nodes matching the node selector.
The OVN DB required the following labels:
Node-Selectors: beta.kubernetes.io/os=linux
node-role.kubernetes.io/master=
My master node had the following labels:
kubernetes.io/os=linux,node-role.kubernetes.io/master=true
To resolve my label/selection problem I ran the following command:
kubectl label nodes ae11-28-wp node-role.kubernetes.io/master= --overwrite
- When installing OVS according to the README, the package install starts the openvswitch-vswitch process. As a result, ovnkube.sh fails at: "another process is currently managing ovs"
- In my Ansible script, after installing OVS, I 'systemctl stop openvswitch-switch'. The next problem is
This was user error. I had some code in my Ansible scripts that was installing OVS outside of the daemonset.
I'm going to close this; neither vagrant nor ansible is supported for installation any more. If there are problems with the current installation methods / documentation, then people can file new bugs about them (and maybe already have).
I have followed the instructions in the ovn-kubernetes README but I have either made a mistake or the README is out of date.
I followed the OVS install instructions and from what I can tell it is installed correctly.
I followed the ovn-kubernetes README. The master seems to be initialized as there are no obvious errors in the logs and nohup.out is empty.
When I run ovnkube on my nodes I get the following output (repeated) in my nohup.out file:
I don't see any errors/warnings/failures in ovn-kubernetes.log on the nodes.
I am using the following script to initialize the nodes:
I followed the debugging.md document and everything looks fine until "Sanity check cross host ping." which fails.