kubeedge / kubeedge

Kubernetes Native Edge Computing Framework (project under CNCF)
https://kubeedge.io
Apache License 2.0
6.52k stars 1.68k forks source link

`KUBERNETES_SERVICE_HOST` and `KUBERNETES_SERVICE_PORT` environment variables injection not working on K8s 1.26 #5586

Closed IterableTrucks closed 1 month ago

IterableTrucks commented 1 month ago

What happened: After uprading kubeedge on both cloud side and edge side to v1.17.0, the pod on edge nodes still doesn't have KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT configured What you expected to happen: The two environment variables are configured in pods. How to reproduce it (as minimally and precisely as possible):

The cloudcore configmap is: ```yaml apiVersion: cloudcore.config.kubeedge.io/v1alpha2 kind: CloudCore featureGates: requireAuthorization: true kubeAPIConfig: kubeConfig: "" master: "" modules: cloudHub: advertiseAddress: - 192.168.3.45 dnsNames: - nodeLimit: 1000 tlsCAFile: /etc/kubeedge/ca/rootCA.crt tlsCertFile: /etc/kubeedge/certs/edge.crt tlsPrivateKeyFile: /etc/kubeedge/certs/edge.key unixsocket: address: unix:///var/lib/kubeedge/kubeedge.sock enable: true websocket: address: 0.0.0.0 enable: true port: 10000 quic: address: 0.0.0.0 enable: false maxIncomingStreams: 10000 port: 10001 https: address: 0.0.0.0 enable: true port: 10002 cloudStream: enable: true streamPort: 10003 tunnelPort: 10004 dynamicController: enable: true router: enable: true iptablesManager: enable: true mode: internal taskManager: enable: true ``` The requireAuthorization feature gate and dynamicController are enabled as noted in [changelog ](https://github.com/kubeedge/kubeedge/blob/master/CHANGELOG/CHANGELOG-1.17.md)and '192.168.3.45' is the IP of cloud side node.
The edgecore config on edge node: ```yaml apiVersion: edgecore.config.kubeedge.io/v1alpha2 database: aliasName: default dataSource: /var/lib/kubeedge/edgecore.db driverName: sqlite3 kind: EdgeCore modules: dbTest: enable: false deviceTwin: dmiSockPath: /etc/kubeedge/dmi.sock enable: true edgeHub: enable: true heartbeat: 15 httpServer: https://192.168.3.45:10002 messageBurst: 60 messageQPS: 30 projectID: e632aba927ea4ac2b575ec1603d56f10 quic: enable: false handshakeTimeout: 30 readDeadline: 15 server: 192.168.3.45:10001 writeDeadline: 15 rotateCertificates: true tlsCaFile: /etc/kubeedge/ca/rootCA.crt tlsCertFile: /etc/kubeedge/certs/server.crt tlsPrivateKeyFile: /etc/kubeedge/certs/server.key token: "" websocket: enable: true handshakeTimeout: 30 readDeadline: 15 server: 192.168.3.45:10000 writeDeadline: 15 edgeStream: enable: true handshakeTimeout: 30 readDeadline: 15 server: 192.168.3.45:10004 tlsTunnelCAFile: /etc/kubeedge/ca/rootCA.crt tlsTunnelCertFile: /etc/kubeedge/certs/server.crt tlsTunnelPrivateKeyFile: /etc/kubeedge/certs/server.key writeDeadline: 15 edged: containerRuntime: remote enable: true hostnameOverride: edge01 masterServiceNamespace: default maxContainerCount: -1 maxPerPodContainerCount: 1 minimumGCAge: 0s podSandboxImage: kubeedge/pause:3.6 registerNode: true registerNodeNamespace: default registerSchedulable: true remoteImageEndpoint: unix:///run/containerd/containerd.sock remoteRuntimeEndpoint: unix:///run/containerd/containerd.sock rootDirectory: /var/lib/edged tailoredKubeletConfig: address: 127.0.0.1 cgroupDriver: systemd cgroupsPerQOS: true clusterDNS: - 169.254.96.16 clusterDomain: cluster.local configMapAndSecretChangeDetectionStrategy: Get containerLogMaxFiles: 5 containerLogMaxSize: 10Mi containerRuntimeEndpoint: unix:///var/run/crio/crio.sock contentType: application/json cpuCFSQuota: true cpuCFSQuotaPeriod: 100ms cpuManagerPolicy: none cpuManagerReconcilePeriod: 10s enableControllerAttachDetach: true enableDebugFlagsHandler: true enableDebuggingHandlers: true enableProfilingHandler: true enableSystemLogHandler: true enforceNodeAllocatable: - pods eventBurst: 100 eventRecordQPS: 50 evictionHard: imagefs.available: 5% memory.available: 100Mi nodefs.available: 3% nodefs.inodesFree: 5% evictionPressureTransitionPeriod: 5m0s failSwapOn: false fileCheckFrequency: 20s hairpinMode: promiscuous-bridge imageGCHighThresholdPercent: 85 imageGCLowThresholdPercent: 80 imageMinimumGCAge: 2m0s imageServiceEndpoint: unix:///var/run/crio/crio.sock iptablesDropBit: 15 iptablesMasqueradeBit: 14 localStorageCapacityIsolation: true logging: flushFrequency: 5s format: text options: json: infoBufferSize: "0" verbosity: 0 makeIPTablesUtilChains: true maxOpenFiles: 1000000 maxPods: 110 memoryManagerPolicy: None memorySwap: {} memoryThrottlingFactor: 0.9 nodeLeaseDurationSeconds: 40 nodeStatusMaxImages: 0 nodeStatusReportFrequency: 5m0s nodeStatusUpdateFrequency: 10s oomScoreAdj: -999 podPidsLimit: -1 readOnlyPort: 10350 registerNode: true registryBurst: 10 registryPullQPS: 5 resolvConf: /etc/resolv.conf runtimeRequestTimeout: 2m0s seccompDefault: false serializeImagePulls: true shutdownGracePeriod: 0s shutdownGracePeriodCriticalPods: 0s staticPodPath: /etc/kubeedge/manifests streamingConnectionIdleTimeout: 4h0m0s syncFrequency: 1m0s topologyManagerPolicy: none topologyManagerScope: container volumePluginDir: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/ volumeStatsAggPeriod: 1m0s eventBus: enable: true eventBusTLS: enable: false tlsMqttCAFile: /etc/kubeedge/ca/rootCA.crt tlsMqttCertFile: /etc/kubeedge/certs/server.crt tlsMqttPrivateKeyFile: /etc/kubeedge/certs/server.key mqttMode: 2 mqttPassword: "" mqttPubClientID: "" mqttQOS: 0 mqttRetain: false mqttServerExternal: tcp://192.168.6.176:1883 mqttServerInternal: tcp://127.0.0.1:1884 mqttSessionQueueSize: 100 mqttSubClientID: "" mqttUsername: "" metaManager: contextSendGroup: hub contextSendModule: websocket enable: true metaServer: apiAudiences: null dummyServer: 169.254.30.10:10550 enable: true server: 127.0.0.1:10550 serviceAccountIssuers: - https://kubernetes.default.svc.cluster.local serviceAccountKeyFiles: null tlsCaFile: /etc/kubeedge/ca/rootCA.crt tlsCertFile: /etc/kubeedge/certs/server.crt tlsPrivateKeyFile: /etc/kubeedge/certs/server.key remoteQueryTimeout: 60 serviceBus: enable: false port: 9060 server: 127.0.0.1 timeout: 60 ``` MetaServer is enabled and '192.168.6.176' is the IP of the edge node.

Anything else we need to know?:

Environment:

WillardHu commented 1 month ago

cc @Shelley-BaoYue

Shelley-BaoYue commented 1 month ago

Have you ever enable metaserver and dynamiccontroller and set featureGates requireAuthorization=true both in cloudcore and edgecore yaml? https://github.com/kubeedge/kubeedge/blob/master/CHANGELOG/CHANGELOG-1.17.md#important-steps-before-upgrading

IterableTrucks commented 1 month ago

dynamicController and requireAuthorization are enabled on cloudcore and metaServer is enabled on edgecore.

IterableTrucks commented 1 month ago

Have you ever enable metaserver and dynamiccontroller and set featureGates requireAuthorization=true both in cloudcore and edgecore yaml? https://github.com/kubeedge/kubeedge/blob/master/CHANGELOG/CHANGELOG-1.17.md#important-steps-before-upgrading

Do you mean the metaServer option should be enabled on cloudcore and dynamicController & requireAuthorization feature gate should be enabled on edgecore? But I cannot find these options in helm chart of cloudcore and edgecore.yaml of edgecore

Shelley-BaoYue commented 1 month ago
  1. in edgecore config: enable metaServer; set requireAuthorization=true
    apiVersion: edgecore.config.kubeedge.io/v1alpha1
    kind: EdgeCore
    featureGates:
    requireAuthorization: true
    ...
  2. in cloudcore config: enable dynamiccontroller; set requireAuthorization=true
    apiVersion: cloudcore.config.kubeedge.io/v1alpha1
    kind: CloudCore
    featureGates:
    requireAuthorization: true
    ...

featureGates option is supported in helm chart of cloudcore now with .Values.cloudCore.featureGates.requireAuthorization https://github.com/kubeedge/kubeedge/blob/master/manifests/charts/cloudcore/templates/configmap_cloudcore.yaml#L13; and is not supported in edgecore.yaml now, you should modify the edgecore.yaml and restart edgecore.

IterableTrucks commented 1 month ago
  1. in edgecore config: enable metaServer; set requireAuthorization=true
apiVersion: edgecore.config.kubeedge.io/v1alpha1
kind: EdgeCore
featureGates:
  requireAuthorization: true
...
  1. in cloudcore config: enable dynamiccontroller; set requireAuthorization=true
apiVersion: cloudcore.config.kubeedge.io/v1alpha1
kind: CloudCore
featureGates:
  requireAuthorization: true
...

featureGates option is supported in helm chart of cloudcore now with .Values.cloudCore.featureGates.requireAuthorization https://github.com/kubeedge/kubeedge/blob/master/manifests/charts/cloudcore/templates/configmap_cloudcore.yaml#L13; and is not supported in edgecore.yaml now, you should modify the edgecore.yaml and restart edgecore.

The edgecore failed to restart after I configured featureGates.requireAuthorization=true in edgecore.yaml

The log of edgecore service: > W0510 10:41:02.810555 36360 validation.go:71] NodeIP is empty , use default ip which can connect to cloud. > I0510 10:41:02.810807 36360 server.go:102] Version: v1.17.0 > I0510 10:41:02.810905 36360 sql.go:21] Begin to register twin db model > I0510 10:41:02.811499 36360 module.go:54] Module twin registered successfully > I0510 10:41:02.817724 36360 module.go:54] Module edged registered successfully > I0510 10:41:02.817818 36360 module.go:54] Module websocket registered successfully > I0510 10:41:02.817842 36360 module.go:54] Module eventbus registered successfully > I0510 10:41:02.817967 36360 metamanager.go:41] Begin to register metamanager db model > I0510 10:41:02.818215 36360 module.go:54] Module metamanager registered successfully > W0510 10:41:02.818245 36360 module.go:57] Module servicebus is disabled, do not register > I0510 10:41:02.819570 36360 edgestream.go:55] Get node local IP address successfully: 192.168.6.176 > I0510 10:41:02.819626 36360 module.go:54] Module edgestream registered successfully > W0510 10:41:02.819653 36360 module.go:57] Module testManager is disabled, do not register > table `device` already exists, skip > table `device_attr` already exists, skip > table `device_twin` already exists, skip > table `sub_topics` already exists, skip > table `meta` already exists, skip > table `meta_v2` already exists, skip > table `target_urls` already exists, skip > I0510 10:41:02.824329 36360 core.go:52] starting module edgestream > I0510 10:41:02.824623 36360 core.go:52] starting module twin > I0510 10:41:02.824768 36360 core.go:52] starting module edged > I0510 10:41:02.824830 36360 process.go:119] Begin to sync sqlite > I0510 10:41:02.824908 36360 core.go:52] starting module websocket > I0510 10:41:02.824912 36360 edged.go:123] Starting edged... > I0510 10:41:02.824980 36360 core.go:52] starting module eventbus > I0510 10:41:02.825058 36360 core.go:52] starting module metamanager > I0510 10:41:02.825646 36360 common.go:97] start connect to mqtt server with client id: hub-client-sub-1715308862 > I0510 10:41:02.825725 36360 common.go:99] client hub-client-sub-1715308862 isconnected: false > I0510 10:41:02.825971 36360 dmiworker.go:68] dmi worker start > I0510 10:41:02.826390 36360 certmanager.go:165] Certificate rotation is enabled. > I0510 10:41:02.826465 36360 websocket.go:51] Websocket start to connect Access > I0510 10:41:02.826667 36360 client.go:134] finish hub-client sub > I0510 10:41:02.826759 36360 common.go:97] start connect to mqtt server with client id: hub-client-pub-1715308862 > I0510 10:41:02.826791 36360 common.go:99] client hub-client-pub-1715308862 isconnected: false > I0510 10:41:02.827129 36360 server.go:410] "Kubelet version" kubeletVersion="v0.0.0-master+$Format:%H$" > I0510 10:41:02.827221 36360 server.go:412] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK="" > I0510 10:41:02.827718 36360 client.go:153] finish hub-client pub > I0510 10:41:02.827764 36360 eventbus.go:71] Init Sub And Pub Client for external mqtt broker tcp://192.168.6.176:1883 successfully > I0510 10:41:02.827962 36360 client.go:89] edge-hub-cli subscribe topic to $hw/events/upload/# > I0510 10:41:02.828566 36360 client.go:89] edge-hub-cli subscribe topic to $hw/events/device/+/+/state/update > I0510 10:41:02.828980 36360 client.go:89] edge-hub-cli subscribe topic to $hw/events/device/+/+/twin/+ > I0510 10:41:02.829329 36360 dmiworker.go:219] success to init device model info from db > I0510 10:41:02.829475 36360 client.go:89] edge-hub-cli subscribe topic to $hw/events/node/+/membership/get > I0510 10:41:02.829771 36360 dmiworker.go:240] success to init device info from db > I0510 10:41:02.829808 36360 client.go:89] edge-hub-cli subscribe topic to SYS/dis/upload_records > I0510 10:41:02.830071 36360 dmiworker.go:260] success to init device mapper info from db > I0510 10:41:02.830161 36360 server_others.go:13] init uds socket: /etc/kubeedge/dmi.sock > I0510 10:41:02.830214 36360 client.go:89] edge-hub-cli subscribe topic to +/user/# > I0510 10:41:02.830756 36360 client.go:97] list edge-hub-cli-topics status, no record, skip sync > I0510 10:41:02.837791 36360 ws.go:46] dial wss://192.168.3.45:10000/e632aba927ea4ac2b575ec1603d56f10/nm176/events successfully > I0510 10:41:02.837977 36360 websocket.go:93] Websocket connect to cloud access successful > I0510 10:41:02.838111 36360 process.go:305] DeviceTwin receive msg > W0510 10:41:02.838476 36360 eventbus.go:168] Action not found > I0510 10:41:02.838488 36360 process.go:70] Send msg to the CommModule module in twin > E0510 10:41:02.842297 36360 process.go:419] metamanager not supported operation: connect > W0510 10:41:02.844351 36360 dummy_device_linux.go:43] No dummy device edge-dummy0, link it > panic: setup dummy interface err: operation not supported > goroutine 180 [running]: > github.com/kubeedge/kubeedge/edge/pkg/metamanager/metaserver.(*MetaServer).Start(0x4000050e10, 0x40006966c0) > /work/edge/pkg/metamanager/metaserver/server.go:158 +0x1b4 > created by github.com/kubeedge/kubeedge/edge/pkg/metamanager.(*metaManager).Start > /work/edge/pkg/metamanager/metamanager.go:65 +0xa8 > > edgecore.service: Main process exited, code=exited, status=2/INVALIDARGUMENT > edgecore.service: Failed with result 'exit-code'.
Shelley-BaoYue commented 1 month ago

Maybe somethink unusual about your kernel? Could you please try ip link add test type dummy and see if you are able to create dummy interface manually?

IterableTrucks commented 1 month ago

Maybe somethink unusual about your kernel? Could you please try ip link add test type dummy and see if you are able to create dummy interface manually?

You are right. Everything is OK after I recompile the kernel with CONFIG_DUMMY=y and reboot the edge node with the new kernel.