kubeovn / kube-ovn

A Bridge between SDN and Cloud Native (Project under CNCF)
https://kubeovn.github.io/docs/stable/en/
Apache License 2.0
2k stars 451 forks source link

[BUG] change the ```ovn.kubernetes.io/external-gw=true``` node label to false on restart! #4772

Open cybercoder opened 2 days ago

cybercoder commented 2 days ago

Kube-OVN Version

v1.12.28

Kubernetes Version

v1.30.6+k3s1

Operation-system/Kernel Version

not important

Description

These are my configs:

apiVersion: kubeovn.io/v1
kind: ProviderNetwork
metadata:
  name: external
spec:
  defaultInterface: ether1
---
apiVersion: kubeovn.io/v1
kind: Vlan
metadata:
  name: vlan0
spec:
  id: 0
  provider: external
---

apiVersion: kubeovn.io/v1
kind: Subnet
metadata:
  name: external
spec:
  protocol: IPv4
  cidrBlock: 192.168.128.0/17
  gateway: 192.168.128.2
  vlan: vlan0
  excludeIps:
  - 192.168.128.1..192.168.128.10

Steps To Reproduce

  1. kubectl label node <> ovn.kubernetes.io/external-gw=true --overwrite
  2. kubectl -n kube-system rollout restart deployment kube-ovn-controller # or simply delete the pod.

Current Behavior

Reset the node label to false and all OEIP,OFIP and pings get down. We must set all labels for nodes and restart it again.

Expected Behavior

detect the situation!

dirty workaround:

A cronjob:

#!/bin/bash

# Find all Kubernetes nodes with the label `node-role.kubernetes.io/compute=true`
NODES=$(kubectl get nodes -l node-role.kubernetes.io/compute=true -o jsonpath='{.items[*].metadata.name}')

# Apply the label `ovn.kubernetes.io/external-gw=true` to each node
for NODE in $NODES; do
    kubectl label node "$NODE" ovn.kubernetes.io/external-gw=true --overwrite
done

# Restart the kube-ovn-controller deployment by deleting its pod in the kube-system namespace
kubectl delete pod -n kube-system -l app=kube-ovn-controller

# Apply the label `ovn.kubernetes.io/external-gw=true` to each node
for NODE in $NODES; do
    kubectl label node "$NODE" ovn.kubernetes.io/external-gw=true --overwrite
done
dosubot[bot] commented 2 days ago

The issue you're experiencing, where the ovn.kubernetes.io/external-gw node label resets to false after a restart of the kube-ovn-controller, is not directly addressed in the available documentation or codebase information. However, this behavior suggests that the label might not be persisted correctly across restarts.

Here are a few steps you can consider to troubleshoot and potentially resolve this issue:

  1. Check Controller Logs: Review the logs of the kube-ovn-controller for any messages related to node label changes during startup. This might provide insights into why the label is being reset.

  2. Configuration Review: Ensure that there are no configurations or scripts that might be resetting the label on startup. This could include any automation tools or scripts that manage node configurations.

  3. Kubernetes Events: Check Kubernetes events for any actions related to node label changes. This might help identify if there is an external factor causing the label reset.

  4. Persistence Mechanism: If the label is critical for your setup, consider implementing a mechanism to reapply the label automatically after a restart. This could be done using a Kubernetes operator or a startup script.

  5. Version Check: Ensure that you are using the latest stable version of Kube-OVN, as there might be bug fixes or improvements related to label handling in newer versions.

If these steps do not resolve the issue, further investigation into the specific behavior of the kube-ovn-controller regarding node labels might be necessary.

To continue talking to Dosu, mention @dosu.

cybercoder commented 1 day ago

@dosu

I have used this workaround, after reviewing the code:

apiVersion: v1
kind: ConfigMap
metadata:
  name: ovn-external-gw-config
  namespace: kube-system
data:
  type: "distributed"
  enable-external-gw: "true"

Is it ok?

oilbeater commented 1 day ago

@cybercoder Yes, I think you need this configmap and set enable-external-gw to true