Closed Fillamug closed 5 years ago
Well, you do actually need the features of the new version; because the issue you describe as a bug is actually how the system was designed to work until: https://github.com/nokia/danm/issues/108
If you are using an older version, kindly read the documentation related to that version: https://github.com/nokia/danm/tree/v3.3.0#creating-the-configuration-for-delegated-cni-operations
One final note: even with using the referenced feature you will never be able to integrate Flannel to central IPAM, simply because Flannel is not CNI compliant. Flannel CNI completely ignores the "ipam" section of its CNI config, and instead it will use the CIDR mounted by the Flannel DaemonSet under /var/run/flannel.scok
On the other hand I'm happy to assist you in installing DANM from master, if you would detail the exact error :)
Ah I see, thank you for your answer. I must've missed the part about the non-CNI standard plugins.
Since it seems I'll have to upgrade to v4.0.0 I would share with you the problem I had with webhook (which is why I went back to v3.3.0).
First of all I encountered a problem when building the docker image. On line 17 of /integration/docker/webhook/Dockerfile it tries to clone a branch named "webhook", which it does not find so it stops. I assumed this is because the branch has already been merged, so I edited the Dockerfile by changing the following lines:
&& git clone -b 'webhook' --depth 1 https://github.com/nokia/danm.git $GOPATH/src/github.com/nokia/danm \
&& cd $GOPATH/src/github.com/nokia/danm \
To this:
&& git clone https://github.com/nokia/danm.git $GOPATH/src/github.com/nokia/danm \
&& cd $GOPATH/src/github.com/nokia/danm \
&& git checkout -b 'webhook' 0195b555ea4dc8768528efae593af044132577c9 \
After this modification it seemed to work and the image built successfully.
The next problem I had was when I was trying to create the danmnets, the webhook component threw these errors:
Error from server (InternalError): error when creating "external_net.yaml": Internal error occurred: failed calling webhook "danm-netvalidation.nokia.k8s.io": the server could not find the requested resource
Error from server (InternalError): error when creating "vnf_external_net.yaml": Internal error occurred: failed calling webhook "danm-netvalidation.nokia.k8s.io": the server could not find the requested resource
Error from server (InternalError): error when creating "vnf_internal_net.yaml": Internal error occurred: failed calling webhook "danm-netvalidation.nokia.k8s.io": the server could not find the requested resource
Error from server (InternalError): error when creating "vnf_management_net.yaml": Internal error occurred: failed calling webhook "danm-netvalidation.nokia.k8s.io": the server could not find the requested resource
I fail to see what could've caused this problem, if you have an idea, please tell me!
I modified the deployment in the webhook.yaml file a bit, as I provided the required certificates with a kubernetes secret. This is how it looks:
apiVersion: apps/v1
kind: Deployment
metadata:
name: danm-webhook-deployment
namespace: kube-system
labels:
danm: webhook
spec:
selector:
matchLabels:
danm: webhook
template:
metadata:
annotations:
# Adapt to your own network environment!
danm.k8s.io/interfaces: |
[
{
"network":"flannel"
}
]
name: danm-webhook
labels:
danm: webhook
spec:
serviceAccountName: danm-webhook
containers:
- name: danm-webhook
image: fillamug/webhook:latest
command: [ "/usr/local/bin/webhook", "-tls-cert-bundle=/etc/webhook/certs/cert.pem", "-tls-private-key-file=/etc/webhook/certs/key.pem", "bind-port=8443" ]
imagePullPolicy: IfNotPresent
volumeMounts:
- name: webhook-certs
mountPath: /etc/webhook/certs
readOnly: true
# Configure the directory holding the Webhook's server certificates
volumes:
- name: webhook-certs
secret:
secretName: danm-webhook-certs
Dockerfile: yeah I copied one of my earlier versions to the repo, but you are right, it definitely needs to be adjusted! I will correct it It should simply checkout latest master, or build from the user's checkout
Reg error: actually you are getting that error from the K8s API server, not directly from the webhook. I got those kind of errors when my webhook configuration was not entirely proper. How does your MutatingWebhookConfiguration look like? Is the danm-webhook-svc Service also created? What happens when you manually contact the webhook? E.g on my cluster: [cloudadmin@controller-1 ~]$ curl https://danm-webhook-svc.kube-system.svc.nokia.net:443/netvalidation curl: (52) Empty reply from server [cloudadmin@controller-1 ~]$ curl https://danm-webhook-svc.kube-system.svc.nokia.net:443/netvalidation2 404 page not found
Oh, sorry, those error messages sent by webhook were from before I downgraded to v3.3.0. Now after reupgrading to v4.0.0 again, I got different errors:
Error from server (InternalError): error when creating "danmnets/external_net.yaml": Internal error occurred: failed calling webhook "danm-netvalidation.nokia.k8s.io": Post https://danm-webhook-svc.kube-system.svc:443/netvalidation?timeout=30s: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Error from server (InternalError): error when creating "danmnets/vnf_external_net.yaml": Internal error occurred: failed calling webhook "danm-netvalidation.nokia.k8s.io": Post https://danm-webhook-svc.kube-system.svc:443/netvalidation?timeout=30s: context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Error from server (InternalError): error when creating "danmnets/vnf_internal_net.yaml": Internal error occurred: failed calling webhook "danm-netvalidation.nokia.k8s.io": Post https://danm-webhook-svc.kube-system.svc:443/netvalidation?timeout=30s: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Error from server (InternalError): error when creating "danmnets/vnf_management_net.yaml": Internal error occurred: failed calling webhook "danm-netvalidation.nokia.k8s.io": Post https://danm-webhook-svc.kube-system.svc:443/netvalidation?timeout=30s: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Which I find strange, since I think I did everything the same as before.
Nonetheless, here is my MutatingWebhookConfiguration that you asked for, I didn't change anything in it compared to /integration/manifests/webhook/webhook.yaml, other than filling in the caBundles:
apiVersion: admissionregistration.k8s.io/v1beta1
kind: MutatingWebhookConfiguration
metadata:
name: danm-webhook-config
namespace: kube-system
webhooks:
- name: danm-netvalidation.nokia.k8s.io
clientConfig:
service:
name: danm-webhook-svc
namespace: kube-system
path: "/netvalidation"
# Configure your pre-generated certificate matching the details of your environment
caBundle: <Filled in via a script>
rules:
- operations: ["CREATE","UPDATE"]
apiGroups: ["danm.k8s.io"]
apiVersions: ["v1"]
resources: ["danmnets","clusternetworks","tenantnetworks"]
failurePolicy: Fail
- name: danm-configvalidation.nokia.k8s.io
clientConfig:
service:
name: danm-webhook-svc
namespace: kube-system
path: "/confvalidation"
# Configure your pre-generated certificate matching the details of your environment
caBundle: <Filled in via a script>
rules:
- operations: ["CREATE","UPDATE"]
apiGroups: ["danm.k8s.io"]
apiVersions: ["v1"]
resources: ["tenantconfigs"]
failurePolicy: Fail
- name: danm-netdeletion.nokia.k8s.io
clientConfig:
service:
name: danm-webhook-svc
namespace: kube-system
path: "/netdeletion"
# Configure your pre-generated certificate matching the details of your environment
caBundle: <Filled in via a script>
rules:
- operations: ["DELETE"]
apiGroups: ["danm.k8s.io"]
apiVersions: ["v1"]
resources: ["tenantnetworks"]
failurePolicy: Fail
Also, the danm-webhook-svc Service did get created as well:
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
kube-system danm-webhook-svc ClusterIP 10.96.199.114 <none> 443/TCP 128m danm=webhook
I created this one exactly as it is the /integration/manifests/webhook/webhook.yaml file.
Also, when I tried to manually connect to the webhook I got the following answer:
vagrant@k8s-master:~$ curl http://10.96.199.114:443/netvalidation
curl: (7) Failed to connect to 10.96.199.114 port 443: Connection timed out
I figured out why there was a different error this time.
The previous error message I send was given when the webhook pod deployes on the master node:
Error from server (InternalError): error when creating "external_net.yaml": Internal error occurred: failed calling webhook "danm-netvalidation.nokia.k8s.io": the server could not find the requested resource
Error from server (InternalError): error when creating "vnf_external_net.yaml": Internal error occurred: failed calling webhook "danm-netvalidation.nokia.k8s.io": the server could not find the requested resource
Error from server (InternalError): error when creating "vnf_internal_net.yaml": Internal error occurred: failed calling webhook "danm-netvalidation.nokia.k8s.io": the server could not find the requested resource
Error from server (InternalError): error when creating "vnf_management_net.yaml": Internal error occurred: failed calling webhook "danm-netvalidation.nokia.k8s.io": the server could not find the requested resource
In this case curl gives the following warning:
Warning: Binary output can mess up your terminal. Use "--output -" to tell
Warning: curl to output it to your terminal anyway, or consider "--output
Warning: <FILE>" to save to a file.
So I tried curling into a file, which then contained this:
^U^C^A^@^B^B
And the latter error I sent occurs when the webhook pod deploys on a worker node:
Error from server (InternalError): error when creating "danmnets/external_net.yaml": Internal error occurred: failed calling webhook "danm-netvalidation.nokia.k8s.io": Post https://danm-webhook-svc.kube-system.svc:443/netvalidation?timeout=30s: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Error from server (InternalError): error when creating "danmnets/vnf_external_net.yaml": Internal error occurred: failed calling webhook "danm-netvalidation.nokia.k8s.io": Post https://danm-webhook-svc.kube-system.svc:443/netvalidation?timeout=30s: context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Error from server (InternalError): error when creating "danmnets/vnf_internal_net.yaml": Internal error occurred: failed calling webhook "danm-netvalidation.nokia.k8s.io": Post https://danm-webhook-svc.kube-system.svc:443/netvalidation?timeout=30s: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Error from server (InternalError): error when creating "danmnets/vnf_management_net.yaml": Internal error occurred: failed calling webhook "danm-netvalidation.nokia.k8s.io": Post https://danm-webhook-svc.kube-system.svc:443/netvalidation?timeout=30s: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Where curl gave the following answer as mentioned before:
curl: (7) Failed to connect to 10.96.199.114 port 443: Connection timed out
So from this I gather, that the pod should probably be always deployed on the master node, but that still leaves the warning given there to deal with, if you could help me with that.
It is not a requirement to deploy on the master nodes. It is a requirement that the K8s API server needs to be able to access it through the provided Service.
In both cases it kind of looks like you have some issue with your cluster's network setup. You should be able to reach the webserver from the host through its service IP so either there is a connectivity issue, or the webserver itself is actually not running/serving the endpoints, or both.
Thanks, I will look into it and see what I can do.
I did a complete reinstall of the kubernetes cluster on the VMs and now when I create the danmnets, the webhook doesn't throw any errors, so I assume it works properly.
cool! anything else I can help you with? does the feature you need work as expected?
I got around testing the feature you were interested in! I needed to make some corrections, see next PR, but otherwise the concept works quite cool with a standard CNI plugin, such as bridge for example. Given following standard bridge CNI config: [cloudadmin@controller-1 ~]$ sudo cat /etc/cni/net.d/bridge_l3.conf { "name": "mynet", "type": "bridge", "bridge": "mynet0", "isDefaultGateway": true, "forceAddress": false, "ipMasq": true, "hairpinMode": true, "ipam": { "type": "host-local", "subnet": "10.10.0.0/16" }, "cniVersion": "0.3.1" }
And DANM ClusterNetwork manifest: [cloudadmin@controller-1 ~]$ kubectl describe cnet bridge | grep -B7 Cidr Network ID: bridge_l3 Network Type: bridge Options: Alloc: gAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAE= allocation_pool: End: 10.100.50.30 Start: 10.100.50.10 Cidr: 10.100.50.0/24
this is what happens when:
[cloudadmin@controller-1 ~]$ kubectl exec test-deployment-848cb89697-zk5mp -n kube-system ip a | grep bridge 5: test_bridge1@if71: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue inet 10.10.0.2/16 scope global test_bridge1
[cloudadmin@controller-1 ~]$ kubectl describe cnet bridge | grep Alloc Alloc: gAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAE=
[cloudadmin@controller-1 ~]$ kubectl exec test-deployment-85cbb96c6c-qwgwj -n kube-system ip a | grep bridge 5: test_bridge1@if547: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue inet 10.100.50.10/24 brd 10.100.50.255 scope global test_bridge1
[cloudadmin@controller-1 ~]$ kubectl describe cnet bridge | grep Alloc Alloc: gCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAE=
So I consider the issue closed, feel free to follow up if something is still not clear
Thank you very much for your help! Sorry for answering so late, I had other things to attend to in the past couple days, but I will check out what you did as soon as I'm able.
Is this a BUG REPORT or FEATURE REQUEST?:
What happened: I tried to set up the svcwatcher demo, with small modifications to the danmnet yaml files to fit my environment. The danmnets succeeded in setting up the interfaces as shown in the demo video:
However when creating the deployments, the pods got their IP adresses from my container networking provider (which is flannel at the moment, but I've tried with calico and weavenet as well and ran into the same issue), instead from the cidr specified in the danmnets' yaml files:
Which, I assume also causes the following problem, that when I step into one of the load-balancer pods for example, the interfaces are not correctly connected as opposed to how they are shown in the example video:
Nor do the services have their correct endpoints:
What you expected to happen: I expected to have the same outcome as is shown in the example video, because I believe I followed all steps correctly, but it seems I probably did not.
How to reproduce it: I have set up a kubernetes system in vagrant with three nodes, one of them is the master and the other two are workers:
I also made the following modifications to the danmnets' yaml files found in the demo: external_net.yaml
vnf_external_net.yaml:
vnf_internal_net.yaml:
Environment:
kubectl version
):danmc.yml:
uname -a
):