Open avarf opened 5 years ago
The only error that I can see in weave net is:
Please don't do this. Post the whole log, or at least the first 50KB.
22: flannel.1 inet 10.1.56.0/32 scope global flannel.1\ valid_lft forever preferred_lft forever
You are running Flannel at the same time as Weave Net?
Can you post the kubelet log please?
After running the command that you suggested (ip -4 -o addr
) I saw the Flannel myself and no we are not running it and never used it. I have to see where it came from.
Please find the logs for the weave-net pod of the master node attached (kubectl logs -n kube-system weave-net-8mdp5 -c weave > k8s-master-weave.log
)
I deleted the Weave and installed it again via below command but I am still facing the same problem.
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
INFO: 2019/09/10 13:27:05.423705 ->[10.203.0.18:45065|86:03:7c:d8:f9:1a(cutie)]: connection shutting down due to error: Received update for IP range I own at 10.32.0.0 v3: incoming message says owner 02:01:5b:b9:8e:fd v14
This indicates an inconsistency in the data used by Weave Net. However the message does not repeat after this time, so maybe the source went away.
Let's try to break the problem down a little:
From yesterday
You mean it was working fine before with Weave Net? What changed on the 26th? Nothing much changes in your log after the 26th.
if I want to go to webpage I receive
This site can’t be reached
What webpage? From where?
Can you try curl -v
to the webpage address on the host where your webserver pod runs and post the response here?
Can you reach one pod from another inside the cluster, using curl? By pod IP? By service IP? By DNS name?
Yes, everything was working properly but late 26th and early 27th we had some internal networking problems with the company DNS and some other minor issues.
We are running a platform consisting of different components. We have a reverse proxy using Nginx that redirects the requests to another component that hosts the static html pages and this is the easiest way to see if there was any request or not. Times that the traffic not going through I see no log at Nginx at all and the curl just stuck:
curl -v -k http://10.203.20.164
* Rebuilt URL to: http://10.203.20.164/
* Trying 10.203.20.164...
* TCP_NODELAY set
* Connected to 10.203.20.164 (10.203.20.164) port 80 (#0)
> GET / HTTP/1.1
> Host: 10.203.20.164
> User-Agent: curl/7.58.0
> Accept: */*
>
But the times that traffic goes through I can see the static html as the response of the curl and also there are info logs in our Nginx.
I ran a small test, I used below command 100 times from inside a pod and it was 100% successful:
kubectl exec -n 164 -ti gateway-7cf68998b6-phwkf -- curl -v -k http://proxy:80
Proxy is our Nginx which I said and when I wanted to access it at the same time that the test was running I was not able to and I got similar results: This site can’t be reached 10.203.20.164 took too long to respond.
What sort of address is 10.203.20.164? (e.g. host IP, cluster (virtual) IP, pod IP)
That is a virtual IP and we defined 40 virtual IPs so we can use each of them for one namespace. That IP is in the range of our physical machines IPs and we defined them on our K8s master node.
What happened?
We have a small in house cluster consists of 5 nodes which we run our platform on it. Our platform consists of different components which they communicate via http or amqp both among themselves and also to outside of the cluster.
From yesterday no traffic goes to the components and they became unreachable while they are up, there is no error, neither in our components nor in k8s components (dns, proxy,etc.) BUT I can access the cluster and components via kubectl and all of the kubectl commands work properly. What I mean is I can run
kubectl exec
,kubectl logs
,helm install
, etc but if I want to go to webpage I receiveThis site can’t be reached
but there is no logs in neither nginx pod nor any of the k8s components which means they haven't received the request and no traffic goes through.The only error that I can see in weave net is:
How to reproduce it?
Because I don't know what caused this problem I don't know how to reproduce this.
Versions:
Environment:
kubectl version
):uname -a
:Weave net (
./weave --local status
):Network:
On K8s master: