Closed 316953425 closed 9 months ago
Try just deploying
spire
. Don't deploy NSM yet. Wait untilspire
is Running
hi @glazychev-art
my env already deploy spire nsm and kernel2ethernet2kernel
current forward.yam is :
with the image artgl/cmd-forwarder-vpp:cilium_test not with hostNetwork: true
I'm a little unsure Do you mean to directly redeploy spire? No need to redeploy nsm and kernel2ethernet2kernel. Does forward.yaml need to use cmd-forwarder-vpp:cilium_test and hostNetwork: true?
thanks
- Please delete:
- kernel2ethernet2kernel
- NSM
- spire
- Redeploy spire again and wait Running state for agents and server
hi @glazychev-art with the image artgl/cmd-forwarder-vpp:cilium_test with hostNetwork: true
i do it
At the moment it doesn't matter to us
with the image artgl/cmd-forwarder-vpp:cilium_test with hostNetwork: true
because we haven't deployed NSM yet. It's right?
Could you please share
kubectl describe ds -n spire
kubectl describe ds -n spire hi @glazychev-art
[root@CNCP-MS-01 deployments-k8s-release-v1.11.1]# kubectl describe ds -n spire Name: spire-agent Selector: app=spire-agent Node-Selector: <none> Labels: app=spire-agent Annotations: deprecated.daemonset.template.generation: 1 Desired Number of Nodes Scheduled: 3 Current Number of Nodes Scheduled: 3 Number of Nodes Scheduled with Up-to-date Pods: 3 Number of Nodes Scheduled with Available Pods: 3 Number of Nodes Misscheduled: 0 Pods Status: 3 Running / 0 Waiting / 0 Succeeded / 0 Failed Pod Template: Labels: app=spire-agent Service Account: spire-agent Init Containers: init: Image: gcr.io/spiffe-io/wait-for-it Port: <none> Host Port: <none> Args: -t 30 spire-server:8081 Environment: <none> Mounts: <none> init-bundle: Image: gcr.io/spiffe-io/wait-for-it Port: <none> Host Port: <none> Command: sh -c t=0; until [ -f /run/spire/bundle/bundle.crt 2>&1 ] || [ $t -eq 60 ]; do t=`expr $t + 1`; sleep 1; done Environment: <none> Mounts: /run/spire/bundle from spire-bundle (rw) Containers: spire-agent: Image: ghcr.io/spiffe/spire-agent:1.6.1 Port: <none> Host Port: <none> Args: -config /run/spire/config/agent.conf Liveness: exec [/opt/spire/bin/spire-agent healthcheck -socketPath /run/spire/sockets/agent.sock] delay=15s timeout=3s period=60s #success=1 #failure=2 Readiness: exec [/opt/spire/bin/spire-agent healthcheck -socketPath /run/spire/sockets/agent.sock --shallow] delay=5s timeout=1s period=5s #success=1 #failure=3 Environment: <none> Mounts: /run/spire/bundle from spire-bundle (rw) /run/spire/config from spire-config (ro) /run/spire/sockets from spire-agent-socket (rw) /var/run/secrets/tokens from spire-token (rw) Volumes: spire-config: Type: ConfigMap (a volume populated by a ConfigMap) Name: spire-agent Optional: false spire-bundle: Type: ConfigMap (a volume populated by a ConfigMap) Name: spire-bundle Optional: false spire-agent-socket: Type: HostPath (bare host directory volume) Path: /run/spire/sockets HostPathType: DirectoryOrCreate spire-token: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 7200 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulCreate 4m40s daemonset-controller Created pod: spire-agent-6cwrq Normal SuccessfulCreate 4m39s daemonset-controller Created pod: spire-agent-hf6lv Normal SuccessfulCreate 4m39s daemonset-controller Created pod: spire-agent-84rx8
Thanks!
Please don't forget for spire kubectl apply -f https://raw.githubusercontent.com/networkservicemesh/deployments-k8s/v1.11.1/examples/spire/single_cluster/clusterspiffeid-template.yaml
Now, please deploy NSM
with the image artgl/cmd-forwarder-vpp:cilium_test
and hostNetwork: true
Is spire still Running?
because we haven't deployed NSM yet. It's right?
hi @glazychev-art
yes spire is running
with the image artgl/cmd-forwarder-vpp:cilium_test and hostNetwork: true
deploy NSM done
Ok, cool Please, deploy Kernel2Ethernet2Kernel
Ok, cool Please, deploy Kernel2Ethernet2Kernel
hi @glazychev-art done
alpine start but nsm-1 periodically created and deleted
like this :
Okay, got it, thank you very much! This is strange to me.
In that case, it would be great if you could share your environment. Can I somehow create the same cluster as yours?
Okay, got it, thank you very much! This is strange to me.
In that case, it would be great if you could share your environment. Can I somehow create the same cluster as yours?
hi @glazychev-art I think if you are in need, I can try my best to help you, as long as I know how to deal with it
my cilium file no others in k8s cluster cilium.txt Change the suffix to yaml
No, I mean k8s-cluster. You sent me a cilium CNI config.
Which k8s provider are you using? For example, AWS, AKS, GKE?
No, I mean k8s-cluster. You sent me a cilium CNI config.
Which k8s provider are you using? For example, AWS, AKS, GKE?
hi @glazychev-art I installed the k8s cluster locally and used three physical servers. I did not use k8s provided by the cloud service provider.
The core configuration is as follows: 1、linux kernel:
- kernel-4.19.94-300.el7.x86_64.rpm
- kernel-core-4.19.94-300.el7.x86_64.rpm
- kernel-devel-4.19.94-300.el7.x86_64.rpm
- kernel-headers-4.19.94-300.el7.x86_64.rpm
- kernel-modules-4.19.94-300.el7.x86_64.rpm
2、systemctl disable firewalld && systemctl stop firewalld systemctl disable NetworkManager && systemctl stop NetworkManager
3、selinux /etc/selinux/config:SELINUX=disabled
4、swap echo "vm.swappiness=0" >> /etc/sysctl.conf sed -i 's$/dev/mapper/centos-swap$#/dev/mapper/centos-swap$g' /etc/fstab
5、sysctl.conf
net.ipv6.conf.eth0.accept_dad = 0 net.ipv6.conf.eth0.accept_ra = 1 net.ipv6.conf.eth0.accept_ra_defrtr = 1 net.ipv6.conf.eth0.accept_ra_rtr_pref = 1 net.ipv6.conf.eth0.accept_ra_rt_info_max_plen = 1
6、time yum install -y chrony sed -i.bak '3,6d' /etc/chrony.conf && sed -i '3c server ntp1.aliyun.com iburst' /etc/chrony.conf systemctl enable chronyd --now && systemctl restart chronyd chronyc sources mv /etc/localtime /etc/localtime.bak && ln -s /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
7、/etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-iptables = 1 net.ipv4.ip_forward = 1 net.ipv6.conf.all.forwarding = 1 modprobe br_netfilter && sysctl -p /etc/sysctl.d/k8s.conf 8、ipvs
cat > /etc/sysconfig/modules/ipvs.modules <<EOF
modprobe ip_vs modprobe ip_vs_rr modprobe ip_vs_wrr modprobe ip_vs_sh modprobe nf_conntrack EOF chmod 755 /etc/sysconfig/modules/ipvs.modules && bash /etc/sysconfig/modules/ipvs.modules yum -y install ipset ipvsadm 9、 yum install -y ebtables socat ipset conntrack yum install -y yum-utils yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo yum install -y docker-ce systemctl enable docker && systemctl restart docker
sed -i.bak "s#^ExecStart=/usr/bin/dockerd.*#ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock --exec-opt native.cgroupdriver=systemd#g" /usr/lib/systemd/system/docker.service systemctl daemon-reload && systemctl restart docker && systemctl status docker 10、kubernetes
cat > /etc/yum.repos.d/kubernetes.repo <<EOF [kubernetes] name=Kubernetes baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/ enabled=1 gpgcheck=0 repo_gpgcheck=0 gpgkey=http://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg http://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg EOF
yum makecache fast
11、yum install -y kubelet-1.23.10 kubectl-1.23.10 kubeadm-1.23.10 systemctl enable kubelet
Thank you very much for the script, but unfortunately I cannot reproduce your problem.
I can suggest the following:
Follow this script, but please skip steps 3, 7, 8, 9, 10
On the node where NSE is deployed, please run tcpdump. We are interested in frames with dst:
2. On the node where NSE is deployed, please run tcpdump. We are interested in frames with dst:
(for example, 172.16.102.11) port 4789
hi @glazychev-art see it a.cap.txt See if there is the information you want here. Change the file suffix.
Yes, thank you. As I can see, the required traffic does not arrive at the NSE node.
Can we collect the same but from the NSC node?
Yes, thank you. As I can see, the required traffic does not arrive at the NSE node.
Can we collect the same but from the NSC node?
HI @glazychev-art
no package too
@316953425 Actually we are interested in UDP Then can we collect frames from NSE node again?
@316953425 Actually we are interested in UDP Then can we collect frames from NSE node again?
hi @glazychev-art
nsc node:
nse node:
I have posted the detailed information. Can you help me see if it is correct? UDP also has no data packets, whether in nsc-node or nse-node. thanks
This tcpdump
flag doesn't help either?
tcpdump -i ens3 udp port 4789 --immediate-mode
tcpdump -i ens3 udp port 4789 --immediate-mode
hi @glazychev-art no package
Ok, thanks Correct me if I'm wrong: this example worked for you on NSM v1.6.1, right?
Correct me if I'm wrong: this example worked for you on NSM v1.6.1, right? hi @glazychev-art v1.6.1 v1.11.0 v1.11.1 master all not working properly
hi @glazychev-art I have built three environments, but none of them work if I use cilium. Have you reproduced using cilium locally?
Got it. So far I'm having problems with cilium on my servers.
I was only able to reproduce the problem without hostNetwork: true
, so we fixed it.
Question: have you tried other CNI plugins? For example flannel? https://github.com/flannel-io/flannel
Got it. So far I'm having problems with cilium on my servers. I was only able to reproduce the problem without
hostNetwork: true
, so we fixed it.Question: have you tried other CNI plugins? For example flannel? https://github.com/flannel-io/flannel
hi @glazychev-art i not try other cni
It would be great if you could check another CNI. This way we could determine whether the problem is really in the cilium CNI plugin or somewhere in the configuration of your servers.
Got it. So far I'm having problems with cilium on my servers. I was only able to reproduce the problem without
hostNetwork: true
, so we fixed it.
hi @glazychev-art
On the server you use cilium, is there any problem if you use hostNetwork: true?
hi @glazychev-art I found that the ip prefix of the nsm-1 interface of nsc and nse is /32. How can I let nsc or nse get the ip prefix of the interface to be /24?
I found that the ip prefix of the nsm-1 interface of nsc and nse is /32. How can I let nsc or nse get the ip prefix of the interface to be /24?
This is a separate topic. In short, NSM allocates /32 addresses for P2P connections. You can look at these chain elements in more detail: https://github.com/networkservicemesh/cmd-nse-icmp-responder/blob/main/main.go#L208 https://github.com/networkservicemesh/sdk/tree/main/pkg/networkservice/ipam/groupipam https://github.com/networkservicemesh/sdk/tree/main/pkg/networkservice/ipam/point2pointipam
On the server you use cilium, is there any problem if you use hostNetwork: true?
On my local cluster, cilium works fine with hostNetwork: true
. But maybe my configuration is different.
I believe that the problem may also be in the configuration of your servers.
So to find out, can you please remove cilium and try other CNI plugin (eg flannel)?
hi @glazychev-art I tried calico, no problems, it work well thanks
env:
Do you need my cilium configuration?
Yes, please
Yes, please
hi @glazychev-art
see it
cilium.yaml.txt
So, I've tested cilium on standalone servers with Ubuntu 20.04 on which kubernetes is installed, using this cilium guide:
https://docs.cilium.io/en/stable/gettingstarted/k8s-install-default/
cilium install --version 1.14.4
And it works well for me with hostNetwork: true
on forwarders.
Edit: works on CentOS 7 too
Perhaps checking your cilium installation will help you: https://docs.cilium.io/en/stable/gettingstarted/k8s-install-default/#validate-the-installation
cilium
hi @glazychev-art Can you send me a copy of your cilium configuration so that I can check the difference with my configuration? If you don't mind, you can send it to my email (316953425@qq.com) thanks a lot
I used only cilium install --version 1.14.4
, without any changes
Config (I removed a few Secrets from there):
cilium_config.txt
It seems like this one is fixed by https://github.com/networkservicemesh/cmd-forwarder-vpp/pull/1003.
If it's still reproducing, feel free to reopen.
(Also, I've renamed the ticket from 'Urgent' to 'ASAP' label because we want to add this to the release notes)
hi @glazychev-art and other developers
I really like the nsm project, it feels like an exciting project and I'm in production this week, I deploy deployments-k8s version 1.11.0, and then deploy the officially provided Kernel2IP2Kernel example, but the alpine pod cannot be started. The error message is as follows:
My environment is roughly as follows. The namespace starting with cncp is developed by ourselves, and the others are open source.cncp should have no impact on our nsm