ovn-org / ovn-kubernetes

A robust Kubernetes networking platform
https://ovn-kubernetes.io/
Apache License 2.0
824 stars 347 forks source link

README does not install ovn-kubernetes #709

Closed rwlove closed 4 years ago

rwlove commented 5 years ago

I have followed the instructions in the ovn-kubernetes README but I have either made a mistake or the README is out of date.

I followed the OVS install instructions and from what I can tell it is installed correctly.

I followed the ovn-kubernetes README. The master seems to be initialized as there are no obvious errors in the logs and nohup.out is empty.

When I run ovnkube on my nodes I get the following output (repeated) in my nohup.out file:

E0528 09:16:35.282844 8643 reflector.go:205] github.com/openvswitch/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list v1.Node: the server rejected our request for an unknown reason (get nodes) E0528 09:16:35.282864 8643 reflector.go:205] github.com/openvswitch/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list v1.Pod: the server rejected our request for an unknown reason (get pods) E0528 09:16:35.282865 8643 reflector.go:205] github.com/openvswitch/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list v1.NetworkPolicy: the server rejected our request for an unknown reason (get networkpolicies.networking.k8s.io) E0528 09:16:35.283070 8643 reflector.go:205] github.com/openvswitch/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list v1.Namespace: the server rejected our request for an unknown reason (get namespaces) E0528 09:16:35.283087 8643 reflector.go:205] github.com/openvswitch/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list v1.Endpoints: the server rejected our request for an unknown reason (get endpoints) E0528 09:16:35.283106 8643 reflector.go:205] github.com/openvswitch/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list v1.Service: the server rejected our request for an unknown reason (get services)

I don't see any errors/warnings/failures in ovn-kubernetes.log on the nodes.

I am using the following script to initialize the nodes:

#!/bin/bash

CENTRAL_IP=10.10.3.9
CLUSTER_IP_SUBNET=1.1.0.0/16
NODE_NAME=ae11-13-wp
SERVICE_IP_SUBNET=5.5.0.0/16
TOKEN=abcdef.0123456789abcdef
LOG_DIR=/var/log/ovn-kubernetes
NEXTHOP=`ip route | grep enp24s0f0 | grep via | cut -d ' ' -f 3`

[ -d "${LOG_DIR}" ] || mkdir -p ${LOG_DIR}

# To open up Geneve port. Not sure this is necessary,
#  remove after OVN-Kubernetes is working
/usr/share/openvswitch/scripts/ovs-ctl \
--protocol=udp \
--dport=6081 \
enable-protocol

nohup ovnkube -loglevel=5 \
  -logfile="/var/log/ovn-kubernetes/ovnkube.log" \
  -k8s-apiserver="http://$CENTRAL_IP:6443" \
  -init-node="$NODE_NAME"  \
  -nodeport \
  -nb-address="tcp://$CENTRAL_IP:6641" \
  -sb-address="tcp://$CENTRAL_IP:6642" \
  -k8s-token="$TOKEN" \
  -init-gateways -gateway-interface=enp24s0f0 -gateway-nexthop="$NEXTHOP" \
  -service-cluster-ip-range=$SERVICE_IP_SUBNET \
  -cluster-subnet=$CLUSTER_IP_SUBNET 2>&1 &

I followed the debugging.md document and everything looks fine until "Sanity check cross host ping." which fails.

shettyg commented 5 years ago

It is likely that rbac is preventing ovnkube from accessing kube API server resources.

kubectl create -f vagrant/ovnkube-rbac.yaml

And then try starting node service again.

You can also just follow the scripts in vagrant/provisioning/setup-master.sh and vagrant/provisioning/setup-minion.sh to get things to work.

rwlove commented 5 years ago

Thanks, shettyg.

I am now creating the ovnkube-rbac, on the master, before running ovnkube, but I still get errors:

E0528 13:59:29.661441 47598 reflector.go:205] github.com/ovn-org/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list v1.NetworkPolicy: networkpolicies.networking.k8s.io is forbidden: User "system:node:ae11-13-wp" cannot list resource "networkpolicies" in API group "networking.k8s.io" at the cluster scope E0528 13:59:29.663650 47598 reflector.go:205] github.com/ovn-org/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list v1.Namespace: namespaces is forbidden: User "system:node:ae11-13-wp" cannot list resource "namespaces" in API group "" at the cluster scope E0528 13:59:29.663816 47598 reflector.go:205] github.com/ovn-org/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1.Endpoints: endpoints is forbidden: User "system:node:ae11-13-wp" cannot list resource "endpoints" in API group "" at the cluster scope

I'd also like to mention that in this run I am supplying the '-k8s-kubeconfig /etc/kubernetes/kubelet.conf' option to ovnkube. I am not sure whether I need to supply -k8s-kubeconfig or where I should be getting that config file for the nodes. I use kubeadm to initialize my cluster. kubeadm generates an admin.conf that I copy into ~/.kube/conf, on master. However, I'm not really sure what to supply for the -k8s-kubeconfig on the nodes. kubelet.conf was in /etc/kubernetes, on each node, and seemed to have relevant information like 'server: https://10.10.3.9:6443', however the port is ignored so I have to also provide '-k8s-apiserver="http://$CENTRAL_IP:6443"' So, I've been trying sometimes with -k8s-kubeconfig=kubelet.conf and sometimes without that option.

shettyg commented 5 years ago

@rwlove

The vagrant uses kubeadm too. And the minion there uses tokens and no -k8s-kubeconfig See here:

https://github.com/ovn-org/ovn-kubernetes/blob/master/vagrant/provisioning/setup-minion.sh#L169

rwlove commented 5 years ago

@shettyg

Got it. No need for the k8s-kubeconfig. I've removed it. Thank you.

I looked through the setup-minion.sh and I don't see anything that would help with my above error messages. It seems that ovnkube is trying to query the API Server with user "system:node:ae11-13-wp". The ovnkube-rbac.yaml is configuring the user 'ovnkube'.

shettyg commented 5 years ago

I am not well versed with kubernetes rbac. Not quite sure how the ovnkube user is becoming "system:node:ae11-13-wp".

@dcbw @girishmg Any ideas?

rwlove commented 5 years ago

I noticed that ./go-controller/vendor/k8s.io/client-go/util/certificate/csr/csr.go:54 is the only place that "system:node:" + string(nodeName) is set in function RequestNodeCertificate, but I don't see anything calling that code other than csr_test.go:64. I'm unsure where to go from there; maybe my grep'ing was wrong.

Also, since RequestNodeCertificate is dealing with certs I added the following line to my ovnkube options:

-k8s-cacert=/etc/kubernetes/pki/ca.crt

The current ovnkube commands are:

(note hostnames/CIDR-blocks have changed and are not consistent with previous comments)

master:

#!/bin/bash

CENTRAL_IP=10.10.3.12
CLUSTER_IP_SUBNET=2.2.0.0/16
NODE_NAME=ae11-18-wp
SERVICE_IP_SUBNET=6.6.0.0/16
TOKEN=abcdef.0123456789abcdef
LOG_DIR=/var/log/ovn-kubernetes
NEXTHOP=`ip route | grep enp24s0f0 | grep via | cut -d ' ' -f 3`

[ -d "${LOG_DIR}" ] || mkdir -p ${LOG_DIR}

ovn-nbctl set-connection ptcp:6641
ovn-sbctl set-connection ptcp:6642

# To open up Geneve port. Not sure this is necessary,
#  remove after OVN-Kubernetes is working
/usr/share/openvswitch/scripts/ovs-ctl \
--protocol=udp \
--dport=6081 \
enable-protocol

kubectl create -f /tmp/ovn/vagrant/ovnkube-rbac.yaml

nohup ovnkube \
  -k8s-kubeconfig /root/.kube/config \
  -net-controller \
  -logfile="/var/log/ovn-kubernetes/ovnkube.log" \
  -loglevel=5 \
  -k8s-apiserver="http://$CENTRAL_IP:6443" \
  -k8s-cacert=/etc/kubernetes/pki/ca.crt \
  -init-master=$NODE_NAME -init-node=$NODE_NAME \
  -cluster-subnet="$CLUSTER_IP_SUBNET" \
  -service-cluster-ip-range=$SERVICE_IP_SUBNET \
  -nodeport \
  -init-gateways -gateway-interface=enp24s0f0 -gateway-nexthop="$NEXTHOP" \
  -k8s-token="$TOKEN" \
  -nb-address="tcp://$CENTRAL_IP:6641" \
  -sb-address="tcp://$CENTRAL_IP:6642" 2>&1 &

minion/node:

#!/bin/bash

CENTRAL_IP=10.10.3.12
CLUSTER_IP_SUBNET=2.2.0.0/16
NODE_NAME=ae11-18-wp
SERVICE_IP_SUBNET=6.6.0.0/16
TOKEN=abcdef.0123456789abcdef
LOG_DIR=/var/log/ovn-kubernetes
NEXTHOP=`ip route | grep enp24s0f0 | grep via | cut -d ' ' -f 3`

[ -d "${LOG_DIR}" ] || mkdir -p ${LOG_DIR}

# To open up Geneve port. Not sure this is necessary,
#  remove after OVN-Kubernetes is working
/usr/share/openvswitch/scripts/ovs-ctl \
--protocol=udp \
--dport=6081 \
enable-protocol

nohup ovnkube \
  -loglevel=5 \
  -logfile="/var/log/ovn-kubernetes/ovnkube.log" \
  -k8s-apiserver="http://$CENTRAL_IP:6443" \
  -k8s-cacert=/etc/kubernetes/pki/ca.crt \
  -init-node="$NODE_NAME"  \
  -nodeport \
  -nb-address="tcp://$CENTRAL_IP:6641" \
  -sb-address="tcp://$CENTRAL_IP:6642" \
  -k8s-token="$TOKEN" \
  -init-gateways -gateway-interface=enp24s0f0 -gateway-nexthop="$NEXTHOP" \
  -service-cluster-ip-range=$SERVICE_IP_SUBNET \
  -cluster-subnet=$CLUSTER_IP_SUBNET 2>&1 &
shettyg commented 5 years ago

@rwlove

It still does not work, correct?

I updated the README to use daemonsets instead of doing this manually. The current README became README_MANUAL.md. You can probably try daemonsets?

shettyg commented 5 years ago

About doing this manually, can you look at setup-master.sh to see that you are using kubeadm the same way too? It is probably something to do with kubernetes setup itself.

rwlove commented 5 years ago

OK, I'll revisit my kubeadm commands.

A few data points- I am using the same kubeadm initialization, via Ansible, for Calico (IPIP), Calico (BGP peering with switches) and Cilium. My assumption was that my base-k8s install was sound. I'm just swapping out the network solution. Below is my kubeadm configuration file.

apiVersion: kubeadm.k8s.io/v1beta1
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: abcdef.0123456789abcdef
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: '{{ master_ip }}'
  bindPort: 6443
nodeRegistration:
  criSocket: /var/run/dockershim.sock
  name: '{{ ansible_hostname }}'
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
---
apiServer:
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta1
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controlPlaneEndpoint: ""
controllerManager: {}
dns:
  type: CoreDNS
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: k8s.gcr.io
kind: ClusterConfiguration
kubernetesVersion: v1.14.0
networking:
  dnsDomain: cluster.local
  podSubnet: '{{ pod_cidr }}'
  serviceSubnet: '{{ service_cidr }}'
scheduler: {}
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: iptables
rwlove commented 5 years ago

I've spent the day working on setup-master.sh. I am not running Vagrant, so I am just executing it as a script run by root, on the master node. Here are the problems I encountered and my workarounds. I feel that I'm at the same state I was with my Anisble scripts, just with less error handling (setup-master.sh doesn't do any error handling).

1) ifconfig commands fail on Ubuntu 18.04. For example, the following command fails as there is no line with 'inet addr' in the ifconfig output.

    MASTER1=`ifconfig enp0s8 | grep 'inet addr' | cut -d: -f2 | awk '{print $1}'

2) fails to install some packages

    sudo apt-get install -y linux-image-extra-4.15.0-34-generic linux-image-extra-virtual
    Reading package lists... Done
    Building dependency tree
    Reading state information... Done
    E: Unable to locate package linux-image-extra-4.15.0-34-generic
    E: Couldn't find any package by glob 'linux-image-extra-4.15.0-34-generic'

    E: Couldn't find any package by regex 'linux-image-extra-4.15.0-34-generic'

3) uses absolute path for location of ovnkube-rbac.yaml (maybe this is a Vagrant thing)

    -  sudo kubectl create -f /vagrant/ovnkube-rbac.yaml
    +  sudo kubectl create -f /root/ovnkube-rbac.yaml

4) had to hardcode token, again using /vagrant path

    -  TOKEN=`kubectl get secret/$SECRET -o yaml |grep "token:" | cut -f2  -d ":" | sed 's/^  *//' | base64 -d`
    -  echo $TOKEN > /vagrant/token
    -
    +  TOKEN=abcdef.0123456789abcdef

5) I don't see the point of the setup_master_args.sh script being generated. Where is it used?

6) Getting keys times-out sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 58118E89F3A912897C070ADBF76221572C52609D

7) Kublet fails to start because docker api version is too old.

setup-master.sh script installs 'Docker version 1.11.2, build b9f10c9'
My Ansible installs 'Docker version 17.12.1-ce, build 7390fc6'
kubeadm sates that the latest validated version: 18.09
Follow [these instructions](https://docs.docker.com/install/linux/docker-ce/ubuntu/) to get 'Docker version 18.09.6, build 481bc77'

8) WARNING about docker cgroup driver [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/

After working through or ignoring these issues my kubernetes master is installed and:

Your Kubernetes control-plane has initialized successfully!

However, when ovnkube runs I get the following results:

root@ae11-09-wp:~# tail -n 6 nohup.out
E0529 13:50:02.091186   54261 reflector.go:205] github.com/ovn-org/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1.NetworkPolicy: Unauthorized
E0529 13:50:02.092202   54261 reflector.go:205] github.com/ovn-org/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1.Node: Unauthorized
E0529 13:50:02.093102   54261 reflector.go:205] github.com/ovn-org/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1.Service: Unauthorized
E0529 13:50:02.094348   54261 reflector.go:205] github.com/ovn-org/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1.Namespace: Unauthorized
E0529 13:50:02.095381   54261 reflector.go:205] github.com/ovn-org/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1.Endpoints: Unauthorized
E0529 13:50:02.096490   54261 reflector.go:205] github.com/ovn-org/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1.Pod: Unauthorized

root@ae11-09-wp:~# kubectl get pods --all-namespaces
NAMESPACE     NAME                                 READY   STATUS    RESTARTS   AGE
kube-system   coredns-fb8b8dccf-62849              0/1     Pending   0          21m
kube-system   coredns-fb8b8dccf-vtlrd              0/1     Pending   0          21m
kube-system   etcd-ae11-09-wp                      1/1     Running   0          20m
kube-system   kube-apiserver-ae11-09-wp            1/1     Running   0          20m
kube-system   kube-controller-manager-ae11-09-wp   1/1     Running   0          20m
kube-system   kube-proxy-b57wb                     1/1     Running   0          21m
kube-system   kube-scheduler-ae11-09-wp            1/1     Running   0          20m

root@ae11-09-wp:~# kubectl -n kube-system logs kube-apiserver-ae11-09-wp | tail -n 1
E0529 20:45:12.651598       1 authentication.go:65] Unable to authenticate the request due to an error: invalid bearer token
shettyg commented 5 years ago

What version of kubernetes?

shettyg commented 5 years ago

On the master node, ovnkube running there tries to access the same kube-apiserver resources too. And you don't see this error in ovnkube in master? It is just in nodes?

The vagrant uses ubuntu 16.04 and installs kubernetes version v1.14.2. I tried it just now on my MAC laptop with the following diff applied and it worked fine.

diff --git a/vagrant/provisioning/setup-master.sh b/vagrant/provisioning/setup-master.sh
index 84bb1b60..5b14da11 100755
--- a/vagrant/provisioning/setup-master.sh
+++ b/vagrant/provisioning/setup-master.sh
@@ -32,7 +32,7 @@ OVN_EXTERNAL=$OVN_EXTERNAL
 EOL

 # Comment out the next line if you don't prefer daemonsets.
-DAEMONSET="true"
+#DAEMONSET="true"

 # Comment out the next line, if you prefer TCP instead of SSL.
 SSL="true"
diff --git a/vagrant/provisioning/setup-minion.sh b/vagrant/provisioning/setup-minion.sh
index b81ceb65..bc5f87eb 100755
--- a/vagrant/provisioning/setup-minion.sh
+++ b/vagrant/provisioning/setup-minion.sh
@@ -29,7 +29,7 @@ OVN_EXTERNAL=$OVN_EXTERNAL
 EOL

 # Comment out the next line if you don't prefer daemonsets.
-DAEMONSET="true"
+#DAEMONSET="true"

 # Comment out the next line if you prefer TCP instead of SSL.
 SSL="true"
shettyg commented 5 years ago

Is it possible that your token is expired? It expires in 24 hrs or so.

shettyg commented 5 years ago

Getting keys times-out sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 58118E89F3A912897C070ADBF76221572C52609D

This does happen sometimes. A repeat after a few minutes generally gets it to work.

setup-master.sh script installs 'Docker version 1.11.2, build b9f10c9' My Ansible installs 'Docker version 17.12.1-ce, build 7390fc6' kubeadm sates that the latest validated version: 18.09

That likely happened because of the failed keyserver. Otherwise, I see 17.05.0-ce when I just ran the vagrant.

shettyg commented 5 years ago

I am just making wild guesses here. Your token looked short and likely used just for bootstrapping? The token that I get is of the form:

vagrant@k8smaster:~$ cat /vagrant/token
eyJhbGciOiJSUzI1NiIsImtpZCI6IiJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJkZWZhdWx0Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZWNyZXQubmFtZSI6Im92bmt1YmUtdG9rZW4tdzZ4ZG4iLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoib3Zua3ViZSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6ImJkMTM1YTkzLTgyNTctMTFlOS1hMWNkLTAyNWJjMGVlNzUwYSIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDpkZWZhdWx0Om92bmt1YmUifQ.SEWTwHTcD9gSJEoSIzau8_Oa496KE78Hh74s-htWTZqrQRonprvnoBbnbjwdikoXbR0_1LUvSmwwe88v-V9OWuz4pqBipawMGlm8p9awe4lvwPxcUvfOVHgPX9wlyDyWkMqBT6vcAPbKgfxrFZePg1npIXazGuvjMz_6PVz_rRfAjoovn-VZUVEGpodXg6RFWa-eYJBmhZXkMB-LCmS6nJSsRntUwoPi7KtU_wQMRek3k241EbzPkLXjc8q1qxnBeGW1ji2kT-CQoriTPhAMIQn5yaTXlJcmKlsagboFNt2d7DYKstvifmKlxMZmmfw-n-UY_eNSHR4Hil6vWBNZrg
rwlove commented 5 years ago

Since the kube-apiserver logs show that there is an 'invalid bearer token' I added the following to my kubeadm command:

TOKEN=abcdef.0123456789abcdef

-k8s-token="$TOKEN" \

Which results in the following nohup.out output:

E0529 15:00:11.860485   61319 reflector.go:205] github.com/ovn-org/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1.Pod: pods is forbidden: User "system:bootstrap:abcdef" cannot list resource "pods" in API group "" at the cluster scope
E0529 15:00:11.862715   61319 reflector.go:205] github.com/ovn-org/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1.Namespace: namespaces is forbidden: User "system:bootstrap:abcdef" cannot list resource "namespaces" in API group "" at the cluster scope
E0529 15:00:11.865634   61319 reflector.go:205] github.com/ovn-org/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1.NetworkPolicy: networkpolicies.networking.k8s.io is forbidden: User "system:bootstrap:abcdef" cannot list resource "networkpolicies" in API group "networking.k8s.io" at the cluster scope
E0529 15:00:11.866234   61319 reflector.go:205] github.com/ovn-org/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1.Endpoints: endpoints is forbidden: User "system:bootstrap:abcdef" cannot list resource "endpoints" in API group "" at the cluster scope
E0529 15:00:11.867179   61319 reflector.go:205] github.com/ovn-org/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1.Node: nodes is forbidden: User "system:bootstrap:abcdef" cannot list resource "nodes" in API group "" at the cluster scope
E0529 15:00:11.867215   61319 reflector.go:205] github.com/ovn-org/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1.Service: services is forbidden: User "system:bootstrap:abcdef" cannot list resource "services" in API group "" at the cluster scope

I feel like I have some problem with my tokens...

girishmg commented 5 years ago

Shouldn't the k8s-apiserver below use https for both ovnkube master and node invocations.

-k8s-apiserver="http://$CENTRAL_IP:6443" \

rwlove commented 5 years ago

@shettyg

I think I missed a few of your messages today. I'll try with daemonsets as soon as possible. I won't be able to work on this tomorrow, but I should have some time on Friday and certainly on Monday.

rwlove commented 5 years ago

ovn-kubernetes is now working in my environment using daemonsets.

Thank you very much for your assistance, @shettyg!

shettyg commented 5 years ago

@rwlove

You can delete kube-proxy as it is not needed.

rwlove commented 5 years ago

Getting a bit off topic, but I have a few questions:

1) Previously I specified the gateway interface and the external gateway. Is there something I should be doing here? Everything seems to be working, so maybe this is fine.

-init-gateways -gateway-interface=enp24s0f0 -gateway-nexthop="$NEXTHOP" \

2) VxLAN - can I configure this via ovn-kubernetes, or do I need to do some hacking?

3) Is there a mailing list where these questions should be asked? Do you use the openvswitch lists for this codebase?

shettyg commented 5 years ago

Previously I specified the gateway interface and the external gateway. Is there something I should be doing here? Everything seems to be working, so maybe this is fine.

The daemonsets uses a gateway mode, where we no longer take up physical interface. Instead we create a standalone OVS bridge and let iptables bridge external traffic to OVN/OVS. Not super efficient, but good enough to get started and more flexible.

Ideal situation is to use OVS outside daemonsets and OVN inside daemonsets. But daemonsets currently do not have option to specify physical gateways.

VxLAN - can I configure this via ovn-kubernetes, or do I need to do some hacking?

OVN does not use Vxlan. We use a next generation tunneling protocol called "geneve". There are NICs that allow geneve offloads (basically UDP offloads) to boost tunneling throughput (similar to vxlan offload nics). Geneve is easy for advanced network virtualization use cases that need more header space.

Is there a mailing list where these questions should be asked? Do you use the openvswitch lists for this codebase?

Issues here are a good place to ask ovn-kubernetes specific questions. Generic OVN questions need to be asked in mailing list - discuss@openvswitch.org

girishmg commented 5 years ago

But daemonsets currently do not have option to specify physical gateways.

@shettyg we have a way to do it using daemonsets. See https://github.com/ovn-org/ovn-kubernetes/commit/3acdfa593657593498bfa3c5ec931057f5ecb394

shettyg commented 5 years ago

@girishmg

Thanks. If you have time, can you please update README to include the additional information?

girishmg commented 5 years ago

@shettyg will do

rwlove commented 5 years ago

I moved on to something else and then when I came back ovn-kubernetes is not working. I am using Ansible to provision. Here are the problems I encountered:

1) README suggests to run a 'sudo apt-get build-dep dkms' when installing OVS, however, there are no instructions to install source URIs, so this command fails.

root@ae11-28-wp:~# sudo apt-get build-dep dkms
Reading package lists... Done
E: You must put some 'source' URIs in your sources.list

2) When installing OVS according to the README, the package install starts the openvswitch-vswitch process. As a result, ovnkube.sh fails at: "another process is currently managing ovs"

3) In my Ansible script, after installing OVS, I 'systemctl stop openvswitch-switch'. The next problem is that:

kubectl -n ovn-kubernetes logs ovnkube-node-jlgp6 -c ovn-controller
...
=============== ovn-controller - (wait for ovs)
=============== ovn-controller - (wait for ready_to_start_node)
info: Waiting for ready_to_start_node  to come up, waiting 1s ...
info: Waiting for ready_to_start_node  to come up, waiting 5s ...
...

I'll be poking around on this today...

rwlove commented 5 years ago

For #3, ready_to_start_node suggested that the OVN DB needed to be running. It was not because of no nodes matching the node selector.

The OVN DB required the following labels:

Node-Selectors:  beta.kubernetes.io/os=linux
                 node-role.kubernetes.io/master=

My master node had the following labels: kubernetes.io/os=linux,node-role.kubernetes.io/master=true

To resolve my label/selection problem I ran the following command: kubectl label nodes ae11-28-wp node-role.kubernetes.io/master= --overwrite

rwlove commented 5 years ago
  1. When installing OVS according to the README, the package install starts the openvswitch-vswitch process. As a result, ovnkube.sh fails at: "another process is currently managing ovs"
  2. In my Ansible script, after installing OVS, I 'systemctl stop openvswitch-switch'. The next problem is

This was user error. I had some code in my Ansible scripts that was installing OVS outside of the daemonset.

danwinship commented 4 years ago

I'm going to close this; neither vagrant nor ansible is supported for installation any more. If there are problems with the current installation methods / documentation, then people can file new bugs about them (and maybe already have).