openyurtio / openyurt

OpenYurt - Extending your native Kubernetes to edge(project under CNCF)
https://openyurt.io
Apache License 2.0
1.69k stars 398 forks source link

[Question] Unable to do "kubectl logs" for pods running in edge node. #1838

Open chunfungintel opened 9 months ago

chunfungintel commented 9 months ago

What happened: Unable to do "kubectl logs" for pods in edge node.

What you expected to happen: Success to view logs in edge node.

How to reproduce it (as minimally and precisely as possible): Control-panel setup: Kubernetes version: kubeadm version: &version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.17", GitCommit:"953be8927218ec8067e1af2641e540238ffd7576", GitTreeState:"clean", BuildDate:"2023-02-22T13:33:14Z", GoVersion:"go1.19.6", Compiler:"gc", Platform:"linux/amd64"}

Kubernete initialization: sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --service-cidr=10.96.0.0/12 kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml

OpenYurt installation: helm upgrade --install yurt-manager -n kube-system openyurt/yurt-manager --version 1.3.4 helm upgrade --install yurt-hub -n kube-system --set kubernetesServerAddr=https://${KUBERNETES_SERVER_ADDRESS}:6443 openyurt/yurthub --version 1.3.4 helm upgrade --install raven-agent -n kube-system openyurt/raven-agent

Edge node: Installation: sudo rm which kubelet kubeadm kubectl wget https://github.com/openyurtio/openyurt/releases/download/v1.3.4/yurtadm-v1.3.4-linux-amd64.zip unzip yurtadm-v1.3.4-linux-amd64.zip sudo cp linux-amd64/yurtadm /usr/local/bin/yurtadm && \ sudo chmod +x /usr/local/bin/yurtadm

Joining: sudo yurtadm join \ ${CONTROL_PANEL_ADDRESS}:6443 \ --token=${JOIN_TOKEN} --node-type=edge \ --cri-socket=unix:///run/containerd/containerd.sock \ --discovery-token-ca-cert-hash=${CA_HASH} --v=5

kubectl logs -n kube-system raven-agent-ds-r7lbl
Error from server: Get "https://192.168.0.111:10250/containerLogs/kube-system/raven-agent-ds-r7lbl/raven-agent": dial tcp 192.168.0.111:10250: i/o timeout
NAMESPACE      NAME                                               READY   STATUS    RESTARTS   AGE
kube-flannel   kube-flannel-ds-ncq42                              1/1     Running   0          21h
kube-flannel   kube-flannel-ds-pjl2w                              1/1     Running   0          21h
kube-system    coredns-bd6b6df9f-4bwgn                            1/1     Running   0          21h
kube-system    coredns-bd6b6df9f-f9dwj                            1/1     Running   0          21h
kube-system    etcd-adl-control                                   1/1     Running   0          21h
kube-system    kube-apiserver-adl-control                         1/1     Running   0          21h
kube-system    kube-controller-manager-adl-control                1/1     Running   0          21h
kube-system    kube-proxy-4ct9t                                   1/1     Running   0          21h
kube-system    kube-proxy-55vls                                   1/1     Running   0          21h
kube-system    kube-scheduler-adl-control                         1/1     Running   0          21h
kube-system    raven-agent-ds-2hvct                               1/1     Running   0          21h
kube-system    raven-agent-ds-r7lbl                               1/1     Running   0          21h
kube-system    yurt-hub-ubuntu-platform                           1/1     Running   0          21h
kube-system    yurt-manager-7f5bbb5744-fp5m8                      1/1     Running   0          21h

Anything else we need to know?: Control panel node in subnet 10.226.76.0/23, while edge node in 192.168.0.0/24. I am able to join and deploy workload, but failed to view its logs. I am not sure which steps I missed?

Environment:

others /kind question

YTGhost commented 9 months ago

@chunfungintel Hi, I think you should deploy Raven like this to enable node IP forward:

helm upgrade --install raven-agent -n kube-system openyurt/raven-agent --set vpn.forwardNodeIP=true

After that, you need to create the Gateway CR, see here

chunfungintel commented 9 months ago

Hi,

Thank you for your suggestion.

I modified my steps as below:

  1. Raven deployment change(Note: Raven image 0.4.0 still N/A):

    helm upgrade --install raven-agent -n kube-system openyurt/raven-agent --set vpn.forwardNodeIP=true \
    --set image.tag=latest --version 0.4.0 
  2. Nodes labelling:

    # Edge node
    kubectl label nodes adl-edge-node raven.openyurt.io/gateway=gw-edge
    # Cloud node
    kubectl label nodes adl-cloud-node raven.openyurt.io/gateway=gw-cloud
  3. Gateway settings:

    cat <<EOF | kubectl apply -f -
    apiVersion: raven.openyurt.io/v1alpha1
    kind: Gateway
    metadata:
    name: gw-edge
    spec:
    endpoints:
    - nodeName: adl-edge-node
      underNAT: true
    ---
    apiVersion: raven.openyurt.io/v1alpha1
    kind: Gateway
    metadata:
    name: gw-cloud
    spec:
    endpoints:
    - nodeName: adl-cloud-node
      underNAT: false
    EOF
  4. Modified Raven agent according to here(https://github.com/openyurtio/raven/blob/main/docs/raven-agent-tutorial.md#install-raven-agent). I can see Raven pods restart after deploy.

    make deploy
    bash hack/gen-yaml.sh openyurt/raven-agent:latest libreswan false ":8080"
    ==== create raven-agent.yaml in /home/chunfung/Github/raven/_output/yamls ====
    # Warning: 'bases' is deprecated. Please use 'resources' instead. Run 'kustomize edit fix' to update your Kustomization automatically.
    # Warning: 'patchesStrategicMerge' is deprecated. Please use 'patches' instead. Run 'kustomize edit fix' to update your Kustomization automatically.
    kubectl apply -f _output/yamls/raven-agent.yaml
    Warning: resource serviceaccounts/raven-agent-account is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
    serviceaccount/raven-agent-account configured
    Warning: resource clusterroles/raven-agent-role is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
    clusterrole.rbac.authorization.k8s.io/raven-agent-role configured
    Warning: resource clusterrolebindings/raven-agent-role-binding is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
    clusterrolebinding.rbac.authorization.k8s.io/raven-agent-role-binding configured
    Warning: resource configmaps/raven-agent-config is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
    configmap/raven-agent-config configured
    Warning: resource secrets/raven-agent-secret is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
    secret/raven-agent-secret configured
    Warning: resource daemonsets/raven-agent-ds is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
    daemonset.apps/raven-agent-ds configured

Unfortunately, I still unable to do 'kubectl logs' on edge node successfully. Any idea yet? :)

YTGhost commented 9 months ago

@chunfungintel I think you should use v0.3.2 instead of v0.4 for raven's image version if you are still deploying v1.3 openyurt

chunfungintel commented 9 months ago

Hi @YTGhost

Actually, these are the only available versions available in helm

helm search repo raven-agent --versions
NAME                    CHART VERSION   APP VERSION     DESCRIPTION
openyurt/raven-agent    0.4.0           0.4.0           A Helm chart for Kubernetes
openyurt/raven-agent    0.1.1           0.2.0           A Helm chart for Kubernetes
openyurt/raven-agent    0.1.0           0.2.0           A Helm chart for Kubernetes

I do not need to specifically need to use v1.3 OpenYurt, do you have any version that I should try on?

It seems in specific version, Raven controller is merged into yurt-manager(correct me if I am wrong), is that a version before 1.3?

YTGhost commented 9 months ago

I do not need to specifically need to use v1.3 OpenYurt, do you have any version that I should try on?

@chunfungintel raven's previous version of Chart doesn't look like managed very well, I think you can use openyurt v1.4 since v0.4 raven upgraded the CRD. Of course you can also use openyurt v1.3, maybe you have to manually change raven's Chart package. For example, using version 0.1.1 of Chart and manually adjusting the image version of raven-agent to v0.3.2.

It seems in specific version, Raven controller is merged into yurt-manager(correct me if I am wrong), is that a version before 1.3?

We merged raven-controller-manager into yurt-manager in v1.3, so in v1.3 and beyond, you only need to install yurt-manager.

chunfungintel commented 9 months ago

Revised steps:

Control-panel initialization:

sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --service-cidr=10.96.0.0/12
mkdir -p $HOME/.kube && sudo cp /etc/kubernetes/admin.conf $HOME/.kube/config && sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml
kubectl taint nodes --all node-role.kubernetes.io/master-

Using OpenYurt 1.4.0 + Raven agent 0.4.0

helm upgrade --install yurt-manager -n kube-system openyurt/yurt-manager --version 1.4.0 --set image.tag=latest
helm upgrade --install yurt-hub -n kube-system --set kubernetesServerAddr=https://${KUBERNETES_SERVER_ADDRESS}:6443 openyurt/yurthub --version 1.4.0
helm upgrade --install raven-agent -n kube-system openyurt/raven-agent --set vpn.forwardNodeIP=true \
--set image.tag=0.4.0 --version 0.4.0

Install OpenYurt 1.4 in Edge

wget https://github.com/openyurtio/openyurt/releases/download/v1.4.0/yurtadm-v1.4.0-linux-amd64.tar.gz
tar -xvf yurtadm-v1.4.0-linux-amd64.tar.gz
sudo cp linux-amd64/yurtadm /usr/local/bin/yurtadm && sudo chmod +x /usr/local/bin/yurtadm

Edge node joining:

sudo yurtadm join \
${CONTROL_PANEL_ADDRESS}:6443 \
--token=${JOIN_TOKEN} --node-type=edge \
--cri-socket=unix:///run/containerd/containerd.sock \
--discovery-token-ca-cert-hash=${CA_HASH} --v=5

Gateway configuration:

kubectl label nodes adl-edge-node raven.openyurt.io/gateway=gw-edge; \
kubectl label nodes adl-cloud-node raven.openyurt.io/gateway=gw-cloud

cat <<EOF | kubectl apply -f -
apiVersion: raven.openyurt.io/v1alpha1
kind: Gateway
metadata:
  name: gw-edge
spec:
  endpoints:
    - nodeName: adl-edge-node
      underNAT: true
---
apiVersion: raven.openyurt.io/v1alpha1
kind: Gateway
metadata:
  name: gw-cloud
spec:
  endpoints:
    - nodeName: adl-cloud-node
      underNAT: false
EOF

git clone https://github.com/openyurtio/raven.git
cd raven && git checkout v0.4.0
make deploy

Results: Still unable to do 'kubectl logs'

Anything still missing?

YTGhost commented 9 months ago

Revised steps:

Control-panel initialization:

sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --service-cidr=10.96.0.0/12
mkdir -p $HOME/.kube && sudo cp /etc/kubernetes/admin.conf $HOME/.kube/config && sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml
kubectl taint nodes --all node-role.kubernetes.io/master-

Using OpenYurt 1.4.0 + Raven agent 0.4.0

helm upgrade --install yurt-manager -n kube-system openyurt/yurt-manager --version 1.4.0 --set image.tag=latest
helm upgrade --install yurt-hub -n kube-system --set kubernetesServerAddr=https://${KUBERNETES_SERVER_ADDRESS}:6443 openyurt/yurthub --version 1.4.0
helm upgrade --install raven-agent -n kube-system openyurt/raven-agent --set vpn.forwardNodeIP=true \
--set image.tag=0.4.0 --version 0.4.0

Install OpenYurt 1.4 in Edge

wget https://github.com/openyurtio/openyurt/releases/download/v1.4.0/yurtadm-v1.4.0-linux-amd64.tar.gz
tar -xvf yurtadm-v1.4.0-linux-amd64.tar.gz
sudo cp linux-amd64/yurtadm /usr/local/bin/yurtadm && sudo chmod +x /usr/local/bin/yurtadm

Edge node joining:

sudo yurtadm join \
${CONTROL_PANEL_ADDRESS}:6443 \
--token=${JOIN_TOKEN} --node-type=edge \
--cri-socket=unix:///run/containerd/containerd.sock \
--discovery-token-ca-cert-hash=${CA_HASH} --v=5

Gateway configuration:

kubectl label nodes adl-edge-node raven.openyurt.io/gateway=gw-edge; \
kubectl label nodes adl-cloud-node raven.openyurt.io/gateway=gw-cloud

cat <<EOF | kubectl apply -f -
apiVersion: raven.openyurt.io/v1alpha1
kind: Gateway
metadata:
  name: gw-edge
spec:
  endpoints:
    - nodeName: adl-edge-node
      underNAT: true
---
apiVersion: raven.openyurt.io/v1alpha1
kind: Gateway
metadata:
  name: gw-cloud
spec:
  endpoints:
    - nodeName: adl-cloud-node
      underNAT: false
EOF

git clone https://github.com/openyurtio/raven.git
cd raven && git checkout v0.4.0
make deploy

Results: Still unable to do 'kubectl logs'

Anything still missing?

@chunfungintel Hi, could you please provide the logs of raven-agent?

chunfungintel commented 9 months ago

@YTGhost This is logs from control panel only:

W1208 03:07:36.262826 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. I1208 03:07:36.283129 1 start.go:61] Start raven agent I1208 03:07:36.283810 1 engine.go:69] RavenEngine: engine successfully start I1208 03:07:36.385317 1 engine.go:107] "RavenEngine: adding gateway gw-edge" I1208 03:07:36.385426 1 engine.go:95] RavenEngine: enqueue gateway gw-edge to tunnel queue I1208 03:07:36.385472 1 engine.go:100] RavenEngine: enqueue gateway gw-edge to proxy queue I1208 03:07:36.385541 1 engine.go:107] "RavenEngine: adding gateway gw-cloud" I1208 03:07:36.385567 1 engine.go:95] RavenEngine: enqueue gateway gw-cloud to tunnel queue I1208 03:07:36.385594 1 engine.go:100] RavenEngine: enqueue gateway gw-cloud to proxy queue I1208 03:07:36.385897 1 tunnel.go:55] RavenEngine: update raven l3 tunnel config for gateway gw-edge I1208 03:07:36.390257 1 tunnel.go:80] RavenEngine: route driver vxlan initialized I1208 03:07:36.393717 1 libreswan.go:363] starting pluto Initializing NSS database

I1208 03:07:37.395489 1 libreswan.go:385] start pluto successfully I1208 03:07:37.395594 1 tunnel.go:89] RavenEngine: VPN driver libreswan initialized E1208 03:09:07.398446 1 tunnelagent.go:92] "error config gateway public ip" err="error get public ip by any of the apis: [https://api.ipify.org https://api.my-ip.io/ip https://ip4.seeip.org]" gateway="gw-cloud" I1208 03:09:07.398573 1 tunnelagent.go:203] "no public IP for gateway, waiting for sync" gateway="gw-edge" I1208 03:09:07.398598 1 tunnelagent.go:203] "no public IP for gateway, waiting for sync" gateway="gw-cloud" I1208 03:09:07.398698 1 tunnelagent.go:113] "applying network" localEndpoint= remoteEndpoint=map[] I1208 03:09:07.398723 1 libreswan.go:102] Tunnel: no local gateway or remote gateway is found, cleaning vpn connections I1208 03:09:07.420646 1 vxlan.go:77] Tunnel: no local gateway or remote gateway is found, cleaning up route setting I1208 03:09:07.467244 1 tunnel.go:55] RavenEngine: update raven l3 tunnel config for gateway gw-cloud E1208 03:10:37.470153 1 tunnelagent.go:92] "error config gateway public ip" err="error get public ip by any of the apis: [https://api.ipify.org https://api.my-ip.io/ip https://ip4.seeip.org]" gateway="gw-cloud" I1208 03:10:37.470231 1 tunnelagent.go:203] "no public IP for gateway, waiting for sync" gateway="gw-edge" I1208 03:10:37.470261 1 tunnelagent.go:203] "no public IP for gateway, waiting for sync" gateway="gw-cloud" I1208 03:10:37.470303 1 tunnelagent.go:109] network not changed, skip to process

It seems to me the configuration failed due to I am behind cooperate proxy?

YTGhost commented 9 months ago

@YTGhost This is logs from control panel only:

W1208 03:07:36.262826 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. I1208 03:07:36.283129 1 start.go:61] Start raven agent I1208 03:07:36.283810 1 engine.go:69] RavenEngine: engine successfully start I1208 03:07:36.385317 1 engine.go:107] "RavenEngine: adding gateway gw-edge" I1208 03:07:36.385426 1 engine.go:95] RavenEngine: enqueue gateway gw-edge to tunnel queue I1208 03:07:36.385472 1 engine.go:100] RavenEngine: enqueue gateway gw-edge to proxy queue I1208 03:07:36.385541 1 engine.go:107] "RavenEngine: adding gateway gw-cloud" I1208 03:07:36.385567 1 engine.go:95] RavenEngine: enqueue gateway gw-cloud to tunnel queue I1208 03:07:36.385594 1 engine.go:100] RavenEngine: enqueue gateway gw-cloud to proxy queue I1208 03:07:36.385897 1 tunnel.go:55] RavenEngine: update raven l3 tunnel config for gateway gw-edge I1208 03:07:36.390257 1 tunnel.go:80] RavenEngine: route driver vxlan initialized I1208 03:07:36.393717 1 libreswan.go:363] starting pluto Initializing NSS database

I1208 03:07:37.395489 1 libreswan.go:385] start pluto successfully I1208 03:07:37.395594 1 tunnel.go:89] RavenEngine: VPN driver libreswan initialized E1208 03:09:07.398446 1 tunnelagent.go:92] "error config gateway public ip" err="error get public ip by any of the apis: [https://api.ipify.org https://api.my-ip.io/ip https://ip4.seeip.org]" gateway="gw-cloud" I1208 03:09:07.398573 1 tunnelagent.go:203] "no public IP for gateway, waiting for sync" gateway="gw-edge" I1208 03:09:07.398598 1 tunnelagent.go:203] "no public IP for gateway, waiting for sync" gateway="gw-cloud" I1208 03:09:07.398698 1 tunnelagent.go:113] "applying network" localEndpoint= remoteEndpoint=map[] I1208 03:09:07.398723 1 libreswan.go:102] Tunnel: no local gateway or remote gateway is found, cleaning vpn connections I1208 03:09:07.420646 1 vxlan.go:77] Tunnel: no local gateway or remote gateway is found, cleaning up route setting I1208 03:09:07.467244 1 tunnel.go:55] RavenEngine: update raven l3 tunnel config for gateway gw-cloud E1208 03:10:37.470153 1 tunnelagent.go:92] "error config gateway public ip" err="error get public ip by any of the apis: [https://api.ipify.org https://api.my-ip.io/ip https://ip4.seeip.org]" gateway="gw-cloud" I1208 03:10:37.470231 1 tunnelagent.go:203] "no public IP for gateway, waiting for sync" gateway="gw-edge" I1208 03:10:37.470261 1 tunnelagent.go:203] "no public IP for gateway, waiting for sync" gateway="gw-cloud" I1208 03:10:37.470303 1 tunnelagent.go:109] network not changed, skip to process

It seems to me the configuration failed due to I am behind cooperate proxy?

@chunfungintel I think it should be, raven will go to the public network and request to get the PublicIp, however maybe it's because of your network environment, there was a problem with the request process.

If there is no way to get it automatically, you can also get it manually and set the PublicIP field directly in the CR.

YTGhost commented 9 months ago

@chunfungintel Hi, has this been resolved or any progress made?

chunfungintel commented 8 months ago

@YTGhost Actually I was collecting logs when you asking :)

What I do currently is inject http_proxy, https_proxy and no_proxy with

kubectl edit daemonsets.apps -n kube-system raven-agent-ds
        env:
        - name: http_proxy
          value: http://PROXY_NAME:PORT
        - name: https_proxy
          value: http://PROXY_NAME:PORT
        - name: no_proxy
          value: 169.254.2.1/32,10.0.0.0/8,192.168.0.0/16,localhost,.local,127.0.0.0/8,172.16.0.0/12,134.134.0.0/16,10.226.76.0/23,.svc,kube-system.svc,192.168.0.0/24
        - name: HTTP_PROXY
          value: http://PROXY_NAME:PORT
        - name: HTTPS_PROXY
          value: http://PROXY_NAME:PORT
        - name: NO_PROXY
          value: 169.254.2.1/32,10.0.0.0/8,192.168.0.0/16,localhost,.local,127.0.0.0/8,172.16.0.0/12,134.134.0.0/16,10.226.76.0/23,.svc,kube-system.svc,192.168.0.0/24

Raven's logs from control panel:

W1219 05:49:53.701467       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I1219 05:49:53.711849       1 start.go:61] Start raven agent
I1219 05:49:53.711976       1 engine.go:69] RavenEngine: engine successfully start
I1219 05:56:08.179137       1 engine.go:107] "RavenEngine: adding gateway gw-edge"
I1219 05:56:08.179159       1 engine.go:95] RavenEngine: enqueue gateway gw-edge to tunnel queue
I1219 05:56:08.179169       1 engine.go:100] RavenEngine: enqueue gateway gw-edge to proxy queue
I1219 05:56:08.179210       1 tunnel.go:55] RavenEngine: update raven l3 tunnel config for gateway gw-edge
I1219 05:56:08.185324       1 engine.go:107] "RavenEngine: adding gateway gw-cloud"
I1219 05:56:08.185340       1 engine.go:95] RavenEngine: enqueue gateway gw-cloud to tunnel queue
I1219 05:56:08.185348       1 engine.go:100] RavenEngine: enqueue gateway gw-cloud to proxy queue
I1219 05:56:08.185681       1 engine.go:121] "RavenEngine: updating gateway, gw-edge"
I1219 05:56:08.185696       1 engine.go:95] RavenEngine: enqueue gateway gw-edge to tunnel queue
I1219 05:56:08.185703       1 engine.go:100] RavenEngine: enqueue gateway gw-edge to proxy queue
I1219 05:56:08.185762       1 tunnel.go:80] RavenEngine: route driver vxlan initialized
I1219 05:56:08.186634       1 libreswan.go:363] starting pluto
I1219 05:56:08.191901       1 engine.go:121] "RavenEngine: updating gateway, gw-cloud"
I1219 05:56:08.191916       1 engine.go:95] RavenEngine: enqueue gateway gw-cloud to tunnel queue
I1219 05:56:08.191924       1 engine.go:100] RavenEngine: enqueue gateway gw-cloud to proxy queue
Initializing NSS database

I1219 05:56:09.187474       1 libreswan.go:385] start pluto successfully
I1219 05:56:09.187628       1 tunnel.go:89] RavenEngine: VPN driver libreswan initialized
I1219 05:56:11.682578       1 engine.go:121] "RavenEngine: updating gateway, gw-cloud"
I1219 05:56:11.682643       1 engine.go:95] RavenEngine: enqueue gateway gw-cloud to tunnel queue
I1219 05:56:11.682680       1 engine.go:100] RavenEngine: enqueue gateway gw-cloud to proxy queue
I1219 05:56:11.684039       1 tunnelagent.go:203] "no public IP for gateway, waiting for sync" gateway="gw-cloud"
I1219 05:56:11.684187       1 tunnelagent.go:203] "no public IP for gateway, waiting for sync" gateway="gw-edge"
I1219 05:56:11.684374       1 tunnelagent.go:113] "applying network" localEndpoint=<nil> remoteEndpoint=map[]
I1219 05:56:11.684473       1 libreswan.go:102] Tunnel: no local gateway or remote gateway is found, cleaning vpn connections
I1219 05:56:11.700547       1 engine.go:121] "RavenEngine: updating gateway, gw-cloud"
I1219 05:56:11.700624       1 engine.go:95] RavenEngine: enqueue gateway gw-cloud to tunnel queue
I1219 05:56:11.700666       1 engine.go:100] RavenEngine: enqueue gateway gw-cloud to proxy queue
I1219 05:56:11.709070       1 vxlan.go:77] Tunnel: no local gateway or remote gateway is found, cleaning up route setting
I1219 05:56:11.746785       1 tunnel.go:55] RavenEngine: update raven l3 tunnel config for gateway gw-cloud
I1219 05:56:11.746981       1 tunnelagent.go:203] "no public IP for gateway, waiting for sync" gateway="gw-edge"
I1219 05:56:11.747062       1 tunnelagent.go:113] "applying network" localEndpoint="10.226.76.105" remoteEndpoint=map[]
I1219 05:56:11.747082       1 libreswan.go:102] Tunnel: no local gateway or remote gateway is found, cleaning vpn connections
I1219 05:56:11.762037       1 vxlan.go:77] Tunnel: no local gateway or remote gateway is found, cleaning up route setting
I1219 05:56:11.784486       1 engine.go:121] "RavenEngine: updating gateway, gw-edge"
I1219 05:56:11.784538       1 engine.go:95] RavenEngine: enqueue gateway gw-edge to tunnel queue
I1219 05:56:11.784566       1 engine.go:100] RavenEngine: enqueue gateway gw-edge to proxy queue
I1219 05:56:11.797713       1 engine.go:121] "RavenEngine: updating gateway, gw-edge"
I1219 05:56:11.797737       1 engine.go:95] RavenEngine: enqueue gateway gw-edge to tunnel queue
I1219 05:56:11.797754       1 engine.go:100] RavenEngine: enqueue gateway gw-edge to proxy queue
I1219 05:56:11.818785       1 tunnel.go:55] RavenEngine: update raven l3 tunnel config for gateway gw-edge
I1219 05:56:11.819122       1 tunnelagent.go:113] "applying network" localEndpoint="10.226.76.105" remoteEndpoint=map[gw-edge:192.168.0.111]
I1219 05:56:11.822511       1 libreswan.go:316] "whacking with" args=[--psk --encrypt --forceencaps --name 10.226.76.105-192.168.0.111-10.226.76.105/32-10.244.1.0/24 --id @10.226.76.105-10.226.76.105/32-10.244.1.0/24 --host 10.226.76.105 --client 10.226.76.105/32 --ikeport 4500 --to --id @192.168.0.111-10.244.1.0/24-10.226.76.105/32 --host %any --client 10.244.1.0/24] output="002 \"10.226.76.105-192.168.0.111-10.226.76.105/32-10.244.1.0/24\": added IKEv2 connection\n"
I1219 05:56:11.835984       1 libreswan.go:316] "whacking with" args=[--psk --encrypt --forceencaps --name 10.226.76.105-192.168.0.111-10.226.76.105/32-192.168.0.111/32 --id @10.226.76.105-10.226.76.105/32-192.168.0.111/32 --host 10.226.76.105 --client 10.226.76.105/32 --ikeport 4500 --to --id @192.168.0.111-192.168.0.111/32-10.226.76.105/32 --host %any --client 192.168.0.111/32] output="002 \"10.226.76.105-192.168.0.111-10.226.76.105/32-192.168.0.111/32\": added IKEv2 connection\n"
I1219 05:56:11.842535       1 libreswan.go:316] "whacking with" args=[--psk --encrypt --forceencaps --name 10.226.76.105-192.168.0.111-10.244.0.0/24-10.244.1.0/24 --id @10.226.76.105-10.244.0.0/24-10.244.1.0/24 --host 10.226.76.105 --client 10.244.0.0/24 --ikeport 4500 --to --id @192.168.0.111-10.244.1.0/24-10.244.0.0/24 --host %any --client 10.244.1.0/24] output="002 \"10.226.76.105-192.168.0.111-10.244.0.0/24-10.244.1.0/24\": added IKEv2 connection\n"
I1219 05:56:11.849240       1 libreswan.go:316] "whacking with" args=[--psk --encrypt --forceencaps --name 10.226.76.105-192.168.0.111-10.244.0.0/24-192.168.0.111/32 --id @10.226.76.105-10.244.0.0/24-192.168.0.111/32 --host 10.226.76.105 --client 10.244.0.0/24 --ikeport 4500 --to --id @192.168.0.111-192.168.0.111/32-10.244.0.0/24 --host %any --client 192.168.0.111/32] output="002 \"10.226.76.105-192.168.0.111-10.244.0.0/24-192.168.0.111/32\": added IKEv2 connection\n"
I1219 05:56:11.853424       1 vxlan.go:81] Tunnel: only gateway node exist in current gateway, cleaning up route setting
I1219 05:56:11.910735       1 tunnel.go:55] RavenEngine: update raven l3 tunnel config for gateway gw-cloud
I1219 05:56:11.911057       1 tunnelagent.go:109] network not changed, skip to process
I1219 05:56:11.911077       1 tunnel.go:55] RavenEngine: update raven l3 tunnel config for gateway gw-cloud
I1219 05:56:11.911198       1 tunnelagent.go:109] network not changed, skip to process
I1219 05:56:11.911224       1 tunnel.go:55] RavenEngine: update raven l3 tunnel config for gateway gw-cloud
I1219 05:56:11.911335       1 tunnelagent.go:109] network not changed, skip to process
I1219 05:56:11.911349       1 tunnel.go:55] RavenEngine: update raven l3 tunnel config for gateway gw-edge
I1219 05:56:11.911477       1 tunnelagent.go:109] network not changed, skip to process
I1219 05:56:11.911495       1 tunnel.go:55] RavenEngine: update raven l3 tunnel config for gateway gw-edge
I1219 05:56:11.911600       1 tunnelagent.go:109] network not changed, skip to process

Raven's logs from edge node(grabbed from /var/log/pods/kube-system_raven-agent-ds)

2023-12-19T13:55:18.916883189+08:00 stderr F W1219 05:55:18.916769       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2023-12-19T13:55:19.024656536+08:00 stderr F I1219 05:55:19.024571       1 start.go:61] Start raven agent
2023-12-19T13:55:19.024698613+08:00 stderr F I1219 05:55:19.024656       1 engine.go:69] RavenEngine: engine successfully start
2023-12-19T13:56:08.181413498+08:00 stderr F I1219 05:56:08.181226       1 engine.go:107] "RavenEngine: adding gateway gw-edge"
2023-12-19T13:56:08.181434171+08:00 stderr F I1219 05:56:08.181236       1 engine.go:95] RavenEngine: enqueue gateway gw-edge to tunnel queue
2023-12-19T13:56:08.181436027+08:00 stderr F I1219 05:56:08.181241       1 engine.go:100] RavenEngine: enqueue gateway gw-edge to proxy queue
2023-12-19T13:56:08.181437719+08:00 stderr F I1219 05:56:08.181271       1 tunnel.go:55] RavenEngine: update raven l3 tunnel config for gateway gw-edge
2023-12-19T13:56:08.187248523+08:00 stderr F I1219 05:56:08.187019       1 engine.go:107] "RavenEngine: adding gateway gw-cloud"
2023-12-19T13:56:08.187261786+08:00 stderr F I1219 05:56:08.187029       1 engine.go:95] RavenEngine: enqueue gateway gw-cloud to tunnel queue
2023-12-19T13:56:08.187263583+08:00 stderr F I1219 05:56:08.187034       1 engine.go:100] RavenEngine: enqueue gateway gw-cloud to proxy queue
2023-12-19T13:56:08.18743989+08:00 stderr F I1219 05:56:08.187357       1 engine.go:121] "RavenEngine: updating gateway, gw-edge"
2023-12-19T13:56:08.187445895+08:00 stderr F I1219 05:56:08.187368       1 engine.go:95] RavenEngine: enqueue gateway gw-edge to tunnel queue
2023-12-19T13:56:08.187447777+08:00 stderr F I1219 05:56:08.187376       1 engine.go:100] RavenEngine: enqueue gateway gw-edge to proxy queue
2023-12-19T13:56:08.193960361+08:00 stderr F I1219 05:56:08.193744       1 engine.go:121] "RavenEngine: updating gateway, gw-cloud"
2023-12-19T13:56:08.193973497+08:00 stderr F I1219 05:56:08.193752       1 engine.go:95] RavenEngine: enqueue gateway gw-cloud to tunnel queue
2023-12-19T13:56:08.193975305+08:00 stderr F I1219 05:56:08.193758       1 engine.go:100] RavenEngine: enqueue gateway gw-cloud to proxy queue
2023-12-19T13:56:08.200998585+08:00 stderr F I1219 05:56:08.200839       1 tunnel.go:80] RavenEngine: route driver vxlan initialized
2023-12-19T13:56:08.201821126+08:00 stderr F I1219 05:56:08.201771       1 libreswan.go:363] starting pluto
2023-12-19T13:56:08.377037145+08:00 stdout F Initializing NSS database
2023-12-19T13:56:08.377049455+08:00 stdout F
2023-12-19T13:56:09.204237114+08:00 stderr F I1219 05:56:09.204026       1 libreswan.go:385] start pluto successfully
2023-12-19T13:56:09.204253142+08:00 stderr F I1219 05:56:09.204076       1 tunnel.go:89] RavenEngine: VPN driver libreswan initialized
2023-12-19T13:56:11.684339366+08:00 stderr F I1219 05:56:11.684084       1 engine.go:121] "RavenEngine: updating gateway, gw-cloud"
2023-12-19T13:56:11.684369557+08:00 stderr F I1219 05:56:11.684099       1 engine.go:95] RavenEngine: enqueue gateway gw-cloud to tunnel queue
2023-12-19T13:56:11.684370606+08:00 stderr F I1219 05:56:11.684108       1 engine.go:100] RavenEngine: enqueue gateway gw-cloud to proxy queue
2023-12-19T13:56:11.70151363+08:00 stderr F I1219 05:56:11.701442       1 engine.go:121] "RavenEngine: updating gateway, gw-cloud"
2023-12-19T13:56:11.701544862+08:00 stderr F I1219 05:56:11.701452       1 engine.go:95] RavenEngine: enqueue gateway gw-cloud to tunnel queue
2023-12-19T13:56:11.701545796+08:00 stderr F I1219 05:56:11.701466       1 engine.go:100] RavenEngine: enqueue gateway gw-cloud to proxy queue
2023-12-19T13:56:11.786530579+08:00 stderr F I1219 05:56:11.786351       1 engine.go:121] "RavenEngine: updating gateway, gw-edge"
2023-12-19T13:56:11.786537547+08:00 stderr F I1219 05:56:11.786368       1 engine.go:95] RavenEngine: enqueue gateway gw-edge to tunnel queue
2023-12-19T13:56:11.786538571+08:00 stderr F I1219 05:56:11.786378       1 engine.go:100] RavenEngine: enqueue gateway gw-edge to proxy queue
2023-12-19T13:56:11.786548476+08:00 stderr F I1219 05:56:11.786414       1 tunnelagent.go:203] "no public IP for gateway, waiting for sync" gateway="gw-cloud"
2023-12-19T13:56:11.786549519+08:00 stderr F I1219 05:56:11.786421       1 tunnelagent.go:203] "no public IP for gateway, waiting for sync" gateway="gw-edge"
2023-12-19T13:56:11.786550289+08:00 stderr F I1219 05:56:11.786446       1 tunnelagent.go:113] "applying network" localEndpoint=<nil> remoteEndpoint=map[]
2023-12-19T13:56:11.786551073+08:00 stderr F I1219 05:56:11.786450       1 libreswan.go:102] Tunnel: no local gateway or remote gateway is found, cleaning vpn connections
2023-12-19T13:56:11.7949514+08:00 stderr F I1219 05:56:11.794852       1 vxlan.go:77] Tunnel: no local gateway or remote gateway is found, cleaning up route setting
2023-12-19T13:56:11.799016888+08:00 stderr F I1219 05:56:11.798999       1 engine.go:121] "RavenEngine: updating gateway, gw-edge"
2023-12-19T13:56:11.799023241+08:00 stderr F I1219 05:56:11.799011       1 engine.go:95] RavenEngine: enqueue gateway gw-edge to tunnel queue
2023-12-19T13:56:11.799025304+08:00 stderr F I1219 05:56:11.799020       1 engine.go:100] RavenEngine: enqueue gateway gw-edge to proxy queue
2023-12-19T13:56:11.851155499+08:00 stderr F I1219 05:56:11.851014       1 tunnel.go:55] RavenEngine: update raven l3 tunnel config for gateway gw-cloud
2023-12-19T13:56:11.851166507+08:00 stderr F I1219 05:56:11.851054       1 tunnelagent.go:113] "applying network" localEndpoint="192.168.0.111" remoteEndpoint=map[gw-cloud:10.226.76.105]
2023-12-19T13:56:11.854136169+08:00 stderr F I1219 05:56:11.853981       1 libreswan.go:316] "whacking with" args=[--psk --encrypt --forceencaps --name 192.168.0.111-10.226.76.105-10.244.1.0/24-10.226.76.105/32 --id @192.168.0.111-10.244.1.0/24-10.226.76.105/32 --host 192.168.0.111 --client 10.244.1.0/24 --to --id @10.226.76.105-10.226.76.105/32-10.244.1.0/24 --host 192.198.146.186 --client 10.226.76.105/32 --ikeport 4500] output="002 \"192.168.0.111-10.226.76.105-10.244.1.0/24-10.226.76.105/32\": added IKEv2 connection\n"
2023-12-19T13:56:11.867305679+08:00 stderr F I1219 05:56:11.867173       1 libreswan.go:316] "whacking with" args=[--route --name 192.168.0.111-10.226.76.105-10.244.1.0/24-10.226.76.105/32] output=""
2023-12-19T13:56:11.867637563+08:00 stderr F I1219 05:56:11.867580       1 libreswan.go:316] "whacking with" args=[--initiate --asynchronous --name 192.168.0.111-10.226.76.105-10.244.1.0/24-10.226.76.105/32] output="181 \"192.168.0.111-10.226.76.105-10.244.1.0/24-10.226.76.105/32\" #1: initiating IKEv2 connection\n"
2023-12-19T13:56:11.868168601+08:00 stderr F I1219 05:56:11.868097       1 libreswan.go:316] "whacking with" args=[--psk --encrypt --forceencaps --name 192.168.0.111-10.226.76.105-10.244.1.0/24-10.244.0.0/24 --id @192.168.0.111-10.244.1.0/24-10.244.0.0/24 --host 192.168.0.111 --client 10.244.1.0/24 --to --id @10.226.76.105-10.244.0.0/24-10.244.1.0/24 --host 192.198.146.186 --client 10.244.0.0/24 --ikeport 4500] output="002 \"192.168.0.111-10.226.76.105-10.244.1.0/24-10.244.0.0/24\": added IKEv2 connection\n"
2023-12-19T13:56:11.874707837+08:00 stderr F I1219 05:56:11.874657       1 libreswan.go:316] "whacking with" args=[--route --name 192.168.0.111-10.226.76.105-10.244.1.0/24-10.244.0.0/24] output=""
2023-12-19T13:56:11.875043389+08:00 stderr F I1219 05:56:11.875026       1 libreswan.go:316] "whacking with" args=[--initiate --asynchronous --name 192.168.0.111-10.226.76.105-10.244.1.0/24-10.244.0.0/24] output="181 \"192.168.0.111-10.226.76.105-10.244.1.0/24-10.244.0.0/24\" #2: initiating IKEv2 connection\n"
2023-12-19T13:56:11.875626108+08:00 stderr F I1219 05:56:11.875599       1 libreswan.go:316] "whacking with" args=[--psk --encrypt --forceencaps --name 192.168.0.111-10.226.76.105-192.168.0.111/32-10.226.76.105/32 --id @192.168.0.111-192.168.0.111/32-10.226.76.105/32 --host 192.168.0.111 --client 192.168.0.111/32 --to --id @10.226.76.105-10.226.76.105/32-192.168.0.111/32 --host 192.198.146.186 --client 10.226.76.105/32 --ikeport 4500] output="002 \"192.168.0.111-10.226.76.105-192.168.0.111/32-10.226.76.105/32\": added IKEv2 connection\n"
2023-12-19T13:56:11.875853109+08:00 stderr F I1219 05:56:11.875843       1 libreswan.go:316] "whacking with" args=[--route --name 192.168.0.111-10.226.76.105-192.168.0.111/32-10.226.76.105/32] output=""
2023-12-19T13:56:11.876065435+08:00 stderr F I1219 05:56:11.876056       1 libreswan.go:316] "whacking with" args=[--initiate --asynchronous --name 192.168.0.111-10.226.76.105-192.168.0.111/32-10.226.76.105/32] output="181 \"192.168.0.111-10.226.76.105-192.168.0.111/32-10.226.76.105/32\" #3: initiating IKEv2 connection\n"
2023-12-19T13:56:11.876528375+08:00 stderr F I1219 05:56:11.876490       1 libreswan.go:316] "whacking with" args=[--psk --encrypt --forceencaps --name 192.168.0.111-10.226.76.105-192.168.0.111/32-10.244.0.0/24 --id @192.168.0.111-192.168.0.111/32-10.244.0.0/24 --host 192.168.0.111 --client 192.168.0.111/32 --to --id @10.226.76.105-10.244.0.0/24-192.168.0.111/32 --host 192.198.146.186 --client 10.244.0.0/24 --ikeport 4500] output="002 \"192.168.0.111-10.226.76.105-192.168.0.111/32-10.244.0.0/24\": added IKEv2 connection\n"
2023-12-19T13:56:11.876711446+08:00 stderr F I1219 05:56:11.876701       1 libreswan.go:316] "whacking with" args=[--route --name 192.168.0.111-10.226.76.105-192.168.0.111/32-10.244.0.0/24] output=""
2023-12-19T13:56:11.876973883+08:00 stderr F I1219 05:56:11.876935       1 libreswan.go:316] "whacking with" args=[--initiate --asynchronous --name 192.168.0.111-10.226.76.105-192.168.0.111/32-10.244.0.0/24] output="181 \"192.168.0.111-10.226.76.105-192.168.0.111/32-10.244.0.0/24\" #4: initiating IKEv2 connection\n"
2023-12-19T13:56:11.876979496+08:00 stderr F I1219 05:56:11.876947       1 vxlan.go:81] Tunnel: only gateway node exist in current gateway, cleaning up route setting
2023-12-19T13:56:11.939109516+08:00 stderr F I1219 05:56:11.938954       1 tunnel.go:55] RavenEngine: update raven l3 tunnel config for gateway gw-edge
2023-12-19T13:56:11.939132288+08:00 stderr F I1219 05:56:11.939043       1 tunnelagent.go:109] network not changed, skip to process
2023-12-19T13:56:11.939133533+08:00 stderr F I1219 05:56:11.939051       1 tunnel.go:55] RavenEngine: update raven l3 tunnel config for gateway gw-cloud
2023-12-19T13:56:11.939134365+08:00 stderr F I1219 05:56:11.939089       1 tunnelagent.go:109] network not changed, skip to process
2023-12-19T13:56:11.939135243+08:00 stderr F I1219 05:56:11.939094       1 tunnel.go:55] RavenEngine: update raven l3 tunnel config for gateway gw-cloud
2023-12-19T13:56:11.939136024+08:00 stderr F I1219 05:56:11.939122       1 tunnelagent.go:109] network not changed, skip to process
2023-12-19T13:56:11.939136804+08:00 stderr F I1219 05:56:11.939127       1 tunnel.go:55] RavenEngine: update raven l3 tunnel config for gateway gw-cloud
2023-12-19T13:56:11.939254413+08:00 stderr F I1219 05:56:11.939165       1 tunnelagent.go:109] network not changed, skip to process
2023-12-19T13:56:11.939255434+08:00 stderr F I1219 05:56:11.939170       1 tunnel.go:55] RavenEngine: update raven l3 tunnel config for gateway gw-edge
2023-12-19T13:56:11.93925623+08:00 stderr F I1219 05:56:11.939197       1 tunnelagent.go:109] network not changed, skip to process
2023-12-19T13:56:11.939257033+08:00 stderr F I1219 05:56:11.939201       1 tunnel.go:55] RavenEngine: update raven l3 tunnel config for gateway gw-edge
2023-12-19T13:56:11.939257895+08:00 stderr F I1219 05:56:11.939229       1 tunnelagent.go:109] network not changed, skip to process
2023-12-19T13:57:03.556793336+08:00 stderr F I1219 05:57:03.556605       1 streamwatcher.go:111] Unexpected EOF during watch stream event decoding: unexpected EOF

what i am thinking with last line of Edge's Raven's last line of log is it is routed to a proxy, what no_proxy not captured. 2023-12-19T13:57:03.556793336+08:00 stderr F I1219 05:57:03.556605 1 streamwatcher.go:111] Unexpected EOF during watch stream event decoding: unexpected EOF

I had other observation too, sharing in other post.

chunfungintel commented 8 months ago

Another observation I noticed after setting up gateway, is the nodes became "nonready" shortly after

NAME             STATUS     ROLES                  AGE    VERSION
adl-cloud-node   Ready      control-plane,master   127m   v1.23.17
adl-edge-node    NotReady   <none>                 119m   v1.23.17

From YurtHub logs, it failed to connect to the control-panel: 2023-12-19T22:36:59.056936515+08:00 stderr F E1219 14:36:59.056667 1 prober.go:97] failed to probe: backoff ensure lease error: Get "https://10.226.76.105:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/adl-edge-node?timeout=2s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers), remote server https://10.226.76.105:6443

chunfungintel commented 8 months ago

@YTGhost Can I know how to do this?

If there is no way to get it automatically, you can also get it manually and set the PublicIP field directly in the CR.
YTGhost commented 8 months ago

@YTGhost Can I know how to do this?

If there is no way to get it automatically, you can also get it manually and set the PublicIP field directly in the CR.

@chunfungintel Hi, Sorry I've been so busy the last couple days, I'll check it out tonight. @River-sh Can you help with this issue?

YTGhost commented 8 months ago

@YTGhost Can I know how to do this?

If there is no way to get it automatically, you can also get it manually and set the PublicIP field directly in the CR.

About How to get PublicIp manually, you can use some public API to get it, for example, https://ifconfig.me/. After you get the publicIP, you can check the gateway crd, and then you can find the publicIP field.

River-sh commented 8 months ago

@YTGhost Can I know how to do this?

If there is no way to get it automatically, you can also get it manually and set the PublicIP field directly in the CR.

Please refer to the document https://openyurt.io/zh/docs/next/user-manuals/network/raven , you can set the field spec.endpoints.publicIP = 129.xxx.xxx.xxx

chunfungintel commented 8 months ago

Hi @River-sh @YTGhost

This is my testing topology and gateway configuration, please advice. For the edge node, what PublicIP shall I used?

graph 
B("Control-Panel (adl-cloud-node)")
    B ---|10.226.xx.xx/23| C{Router}
    C ---|192.168.1.100/24| D["Edge (adl-edge-node)"]
    C ---|192.168.1.200/24| E["Edge (adl-edge-node-2)"]
kubectl label nodes adl-cloud-node raven.openyurt.io/gateway=gw-cloud
cat <<EOF | kubectl apply -f -
apiVersion: raven.openyurt.io/v1beta1
kind: Gateway
metadata:
  name: gw-cloud
spec:
  exposeType: PublicIP
  proxyConfig:
    Replicas: 1
    proxyHTTPPort: 10255,9445
    proxyHTTPSPort: 10250,9100
  tunnelConfig:
    Replicas: 1
  endpoints:
    - nodeName: adl-cloud-node
      underNAT: false
      port: 10262
      type: proxy
      publicIP: 10.226.xx.xx
    - nodeName: adl-cloud-node
      underNAT: false
      port: 4500
      type: tunnel
      publicIP: 10.226.xx.xx
EOF
kubectl label nodes adl-edge-node raven.openyurt.io/gateway=gw-edge
cat <<EOF | kubectl apply -f -
apiVersion: raven.openyurt.io/v1beta1
kind: Gateway
metadata:
  name: gw-edge
spec:
  proxyConfig:
    Replicas: 1
  tunnelConfig:
    Replicas: 1
  endpoints:
    - nodeName: adl-edge-node
      underNAT: true
      port: 4500
      type: tunnel
EOF

Logs from raven in edge node:

2024-01-02T22:33:07.857685277+08:00 stderr F I0102 14:33:07.857475       1 tunnelagent.go:203] "no public IP for gateway, waiting for sync" gateway="gw-edge"
2024-01-02T22:33:07.857689261+08:00 stderr F I0102 14:33:07.857520       1 tunnelagent.go:113] "applying network" localEndpoint=<nil> remoteEndpoint=map[gw-cloud:10.226.76.105 gw-rbf:10.107.249.110]
2024-01-02T22:33:07.857690312+08:00 stderr F I0102 14:33:07.857531       1 libreswan.go:102] Tunnel: no local gateway or remote gateway is found, cleaning vpn connections
2024-01-02T22:33:07.86175371+08:00 stderr F I0102 14:33:07.861604       1 vxlan.go:77] Tunnel: no local gateway or remote gateway is found, cleaning up route setting
2024-01-02T22:33:07.901832865+08:00 stderr F I0102 14:33:07.901692       1 tunnel.go:55] RavenEngine: update raven l3 tunnel config for gateway gw-rbf
2024-01-02T22:33:30.581124834+08:00 stderr F I0102 14:33:30.580844       1 engine.go:121] "RavenEngine: updating gateway, gw-rbf"
2024-01-02T22:33:30.581136463+08:00 stderr F I0102 14:33:30.580854       1 engine.go:95] RavenEngine: enqueue gateway gw-rbf to tunnel queue
2024-01-02T22:33:30.58114176+08:00 stderr F I0102 14:33:30.580860       1 engine.go:100] RavenEngine: enqueue gateway gw-rbf to proxy queue
2024-01-02T22:33:32.20392066+08:00 stderr F I0102 14:33:32.203602       1 engine.go:121] "RavenEngine: updating gateway, gw-rbf"
2024-01-02T22:33:32.203930395+08:00 stderr F I0102 14:33:32.203611       1 engine.go:95] RavenEngine: enqueue gateway gw-rbf to tunnel queue
2024-01-02T22:33:32.203931555+08:00 stderr F I0102 14:33:32.203616       1 engine.go:100] RavenEngine: enqueue gateway gw-rbf to proxy queue
2024-01-02T22:34:15.668335876+08:00 stderr F I0102 14:34:15.668216       1 engine.go:121] "RavenEngine: updating gateway, gw-rbf"
2024-01-02T22:34:15.668344997+08:00 stderr F I0102 14:34:15.668226       1 engine.go:95] RavenEngine: enqueue gateway gw-rbf to tunnel queue
2024-01-02T22:34:15.668346022+08:00 stderr F I0102 14:34:15.668232       1 engine.go:100] RavenEngine: enqueue gateway gw-rbf to proxy queue
2024-01-02T22:34:37.903910881+08:00 stderr F E0102 14:34:37.903642       1 tunnelagent.go:92] "error config gateway public ip" err="error get public ip by any of the apis: [https://api.ipify.org https://api.my-ip.io/ip https://ip4.seeip.org]" gateway="gw-edge"

Obviously, my cooperate network blocking used of stun, checking with pystun3:

pystun3
NAT Type: Blocked
External IP: None
External Port: None
Press any key to continue
chunfungintel commented 7 months ago

Update: I managed to get "kubectl logs" working by using configuration as below:

apiVersion: raven.openyurt.io/v1beta1
kind: Gateway
metadata:
  name: gw-cloud
spec:
  exposeType: PublicIP
  endpoints:
  - nodeName: adl-cloud-node
    port: 4500
    type: tunnel
    publicIP: LOCAL_NETWORK_IP
  proxyConfig:
    Replicas: 1
  tunnelConfig:
    Replicas: 1
EOF
---
apiVersion: raven.openyurt.io/v1alpha1
kind: Gateway
metadata:
  name: gw-edge
spec:
  endpoints:
    - nodeName: adl-edge-node
      underNAT: true

AND

set correct proxy settings in raven-agent-ds

kubectl set env -n kube-system daemonset raven-agent-ds http_proxy=${http_proxy}
kubectl set env -n kube-system daemonset raven-agent-ds https_proxy=${https_proxy}
kubectl set env -n kube-system daemonset raven-agent-ds no_proxy=${no_proxy}
kubectl set env -n kube-system daemonset raven-agent-ds HTTP_PROXY=${HTTP_PROXY}
kubectl set env -n kube-system daemonset raven-agent-ds HTTPS_PROXY=${HTTPS_PROXY}
kubectl set env -n kube-system daemonset raven-agent-ds NO_PROXY=${NO_PROXY}

Thanks a lot for yours support!

River-sh commented 7 months ago

Update: I managed to get "kubectl logs" working by using configuration as below:

apiVersion: raven.openyurt.io/v1beta1
kind: Gateway
metadata:
  name: gw-cloud
spec:
  exposeType: PublicIP
  endpoints:
  - nodeName: adl-cloud-node
    port: 4500
    type: tunnel
    publicIP: LOCAL_NETWORK_IP
  proxyConfig:
    Replicas: 1
  tunnelConfig:
    Replicas: 1
EOF
---
apiVersion: raven.openyurt.io/v1alpha1
kind: Gateway
metadata:
  name: gw-edge
spec:
  endpoints:
    - nodeName: adl-edge-node
      underNAT: true

AND

set correct proxy settings in raven-agent-ds

kubectl set env -n kube-system daemonset raven-agent-ds http_proxy=${http_proxy}
kubectl set env -n kube-system daemonset raven-agent-ds https_proxy=${https_proxy}
kubectl set env -n kube-system daemonset raven-agent-ds no_proxy=${no_proxy}
kubectl set env -n kube-system daemonset raven-agent-ds HTTP_PROXY=${HTTP_PROXY}
kubectl set env -n kube-system daemonset raven-agent-ds HTTPS_PROXY=${HTTPS_PROXY}
kubectl set env -n kube-system daemonset raven-agent-ds NO_PROXY=${NO_PROXY}

Thanks a lot for yours support!

You don't need this complicated configuration, you just need to enable Raven's Tunnel mode and configure the correct Gateway CR https://openyurt.io/zh/docs/user-manuals/network/raven/ and yurt-manager will elect activeEndpoints in Gateway.Status.ActiveEndpoints.

You can kubectl get gw gw-cloud -o yaml to verify that the gateway node is elected

chunfungintel commented 7 months ago

@River-sh Thank you, I will try and let you know.

qpanpony commented 4 months ago

Troubled by same question several days. Maybe a bit different network environments from @chunfungintel . Both my control-plane nodes and edge nodes are behind NAT. I'm able to join edge nodes successfully(using cmd: yurtadm join k8s-api-server-PublicIP:Port --token xxxxx --discovery-token-ca-cert-hash xxxxxx --node-type=edge, k8s-api-server-PublicIP:Port mapped to PrivateIP:6443 in cloud). I am able to deploy busybox workload to edge nodes too, but I am not able to do "kubectl exec/logs" for pods running in edge nodes.

How could I configure Gateway CR correctly when both control-plane nodes and edge nodes are behind NAT?

River-sh commented 4 months ago

Troubled by same question several days. Maybe a bit different network environments from @chunfungintel . Both my control-plane nodes and edge nodes are behind NAT. I'm able to join edge nodes successfully(using cmd: yurtadm join k8s-api-server-PublicIP:Port --token xxxxx --discovery-token-ca-cert-hash xxxxxx --node-type=edge, k8s-api-server-PublicIP:Port mapped to PrivateIP:6443 in cloud). I am able to deploy busybox workload to edge nodes too, but I am not able to do "kubectl exec/logs" for pods running in edge nodes.

How could I configure Gateway CR correctly when both control-plane nodes and edge nodes are behind NAT?

You can choose to expose the gateway node of the control plane on the public network (configure DNAT on the NAT so that the UDP 4500 of this gateway node can be accessed), and the Gateway is set to UnderNAT=false. You can also set underNat = true to test whether NAT traversal is implemented to build a VPN between two gateway nodes. You can let raven-agent enable nat traversal,but not all NATs can be traversed

qpanpony commented 4 months ago

I used the same revised step except that raven-agent-0.4.1 was used.

Revised steps:

Control-panel initialization:

sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --service-cidr=10.96.0.0/12
mkdir -p $HOME/.kube && sudo cp /etc/kubernetes/admin.conf $HOME/.kube/config && sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml
kubectl taint nodes --all node-role.kubernetes.io/master-

Using OpenYurt 1.4.0 + Raven agent 0.4.0

helm upgrade --install yurt-manager -n kube-system openyurt/yurt-manager --version 1.4.0 --set image.tag=latest
helm upgrade --install yurt-hub -n kube-system --set kubernetesServerAddr=https://${KUBERNETES_SERVER_ADDRESS}:6443 openyurt/yurthub --version 1.4.0
helm upgrade --install raven-agent -n kube-system openyurt/raven-agent --set vpn.forwardNodeIP=true \
--set image.tag=0.4.0 --version 0.4.0

Install OpenYurt 1.4 in Edge

wget https://github.com/openyurtio/openyurt/releases/download/v1.4.0/yurtadm-v1.4.0-linux-amd64.tar.gz
tar -xvf yurtadm-v1.4.0-linux-amd64.tar.gz
sudo cp linux-amd64/yurtadm /usr/local/bin/yurtadm && sudo chmod +x /usr/local/bin/yurtadm

Edge node joining:

sudo yurtadm join \
${CONTROL_PANEL_ADDRESS}:6443 \
--token=${JOIN_TOKEN} --node-type=edge \
--cri-socket=unix:///run/containerd/containerd.sock \
--discovery-token-ca-cert-hash=${CA_HASH} --v=5

Gateway configuration:

kubectl label nodes adl-edge-node raven.openyurt.io/gateway=gw-edge; \
kubectl label nodes adl-cloud-node raven.openyurt.io/gateway=gw-cloud

cat <<EOF | kubectl apply -f -
apiVersion: raven.openyurt.io/v1alpha1
kind: Gateway
metadata:
  name: gw-edge
spec:
  endpoints:
    - nodeName: adl-edge-node
      underNAT: true
---
apiVersion: raven.openyurt.io/v1alpha1
kind: Gateway
metadata:
  name: gw-cloud
spec:
  endpoints:
    - nodeName: adl-cloud-node
      underNAT: false
EOF

git clone https://github.com/openyurtio/raven.git
cd raven && git checkout v0.4.0
make deploy

Results: Still unable to do 'kubectl logs'

Anything still missing?

River-sh commented 4 months ago

I used the same revised step except that raven-agent-0.4.1 was used.

Revised steps: Control-panel initialization:

sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --service-cidr=10.96.0.0/12
mkdir -p $HOME/.kube && sudo cp /etc/kubernetes/admin.conf $HOME/.kube/config && sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml
kubectl taint nodes --all node-role.kubernetes.io/master-

Using OpenYurt 1.4.0 + Raven agent 0.4.0

helm upgrade --install yurt-manager -n kube-system openyurt/yurt-manager --version 1.4.0 --set image.tag=latest
helm upgrade --install yurt-hub -n kube-system --set kubernetesServerAddr=https://${KUBERNETES_SERVER_ADDRESS}:6443 openyurt/yurthub --version 1.4.0
helm upgrade --install raven-agent -n kube-system openyurt/raven-agent --set vpn.forwardNodeIP=true \
--set image.tag=0.4.0 --version 0.4.0

Install OpenYurt 1.4 in Edge

wget https://github.com/openyurtio/openyurt/releases/download/v1.4.0/yurtadm-v1.4.0-linux-amd64.tar.gz
tar -xvf yurtadm-v1.4.0-linux-amd64.tar.gz
sudo cp linux-amd64/yurtadm /usr/local/bin/yurtadm && sudo chmod +x /usr/local/bin/yurtadm

Edge node joining:

sudo yurtadm join \
${CONTROL_PANEL_ADDRESS}:6443 \
--token=${JOIN_TOKEN} --node-type=edge \
--cri-socket=unix:///run/containerd/containerd.sock \
--discovery-token-ca-cert-hash=${CA_HASH} --v=5

Gateway configuration:

kubectl label nodes adl-edge-node raven.openyurt.io/gateway=gw-edge; \
kubectl label nodes adl-cloud-node raven.openyurt.io/gateway=gw-cloud

cat <<EOF | kubectl apply -f -
apiVersion: raven.openyurt.io/v1alpha1
kind: Gateway
metadata:
  name: gw-edge
spec:
  endpoints:
    - nodeName: adl-edge-node
      underNAT: true
---
apiVersion: raven.openyurt.io/v1alpha1
kind: Gateway
metadata:
  name: gw-cloud
spec:
  endpoints:
    - nodeName: adl-cloud-node
      underNAT: false
EOF

git clone https://github.com/openyurtio/raven.git
cd raven && git checkout v0.4.0
make deploy

Results: Still unable to do 'kubectl logs' Anything still missing?

As you said, your cloud nodes cannot be accessed on the public network, cross-network domain VPNs cannot be established, and can not use kubectl logs/exec

River-sh commented 4 months ago

@qpanpony You can read this document step by step. https://openyurt.io/zh/docs/user-manuals/network/raven

qpanpony commented 3 months ago

Just a feedback. I quitted to use raven-agent component since cross-network domain VPNs cannot be established under my network environment. Have deployed edgemesh which provided the ability to communicate across subnets based on LibP2P tunnel.

stale[bot] commented 1 week ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.