rancher / rke2

https://docs.rke2.io/
Apache License 2.0
1.57k stars 268 forks source link

Check calico or felix related env variables in windows #4580

Closed manuelbuil closed 1 year ago

manuelbuil commented 1 year ago

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Currently, calico-node or felix do not check on OS level env variables when executing their processes, something that makes it hard to change the default config

Describe alternatives you've considered

Additional context

rbrtbnfgl commented 1 year ago

The Calico services on Windows aren't getting the OS env they need to be manually configured on the code. We are generating the env with https://github.com/rancher/rke2/blob/master/pkg/windows/calico.go#L470 The env from the OS should be manually copied using os.Getenv https://github.com/rancher/rke2/blob/master/pkg/windows/calico.go#L398

manuelbuil commented 1 year ago

The Calico services on Windows aren't getting the OS env they need to be manually configured on the code. We are generating the env with https://github.com/rancher/rke2/blob/master/pkg/windows/calico.go#L470 The env from the OS should be manually copied using os.Getenv https://github.com/rancher/rke2/blob/master/pkg/windows/calico.go#L398

Yes! I have a PR on the works ;)

manuelbuil commented 1 year ago

/backport v1.26.8+rke2r1

manuelbuil commented 1 year ago

/backport v1.25.13+rke2r1

est-suse commented 1 year ago

For Visibility.

The C:\var\log/felix is not being created, check with the commit ID: https://github.com/rancher/rke2/commit/70c3aee2bb1f46b8a19540e13e857786ecd4606c

The result of Get-HNSNetwork is nor showing calico as a network:

ActivityId : C6B0351B-A290-4F80-827F-EFAA302F624D AdditionalParams : CurrentEndpointCount : 0 Extensions : {@{Id=E7C3B2F0-F3C5-48DF-AF2B-10FED6D72E7A; IsEnabled=False; Name=Microsoft Windows Filtering Platform}, @{Id=E9B59CFA-2BE1-4B21-828F-B6FBDBDDC017; IsEnabled=False; Name=Microsoft Azure VFP Switch Extension}, @{Id=EA24CD6C-D17A-4348-9190-09F0D5BE83DD; IsEnabled=True; Name=Microsoft NDIS Capture}} Flags : 0 Health : @{AddressNotificationMissedCount=0; AddressNotificationSequenceNumber=0; InterfaceNotificationMissedCount=0; InterfaceNotificationSequenceNumber=0; LastErrorCode=0; LastUpdateTime=133370975609612046; RouteNotificationMissedCount=0; RouteNotificationSequenceNumber=0} ID : 926EFCED-6D9E-4B1C-B592-00D0C7900CEF IPv6 : False LayeredOn : C8ED85A5-51B2-42A3-BD3E-1A55D540771C MacPools : {@{EndMacAddress=00-15-5D-FF-5F-FF; StartMacAddress=00-15-5D-FF-50-00}} MaxConcurrentEndpoints : 0 Name : nat NatName : ICSEFD2DDE5-4FDD-47D7-9187-13C004F22EB8 Policies : {} Resources : @{AdditionalParams=; AllocationOrder=2; Allocators=System.Object[]; Health=; ID=C6B0351B-A290-4F80-827F-EFAA302F624D; PortOperationTime=0; State=1; SwitchOperationTime=0; VfpOperationTime=0; parentId=E5562E80-6618-482A-B00E-CB56070E028B} State : 1 Subnets : {@{AdditionalParams=; AddressPrefix=172.30.128.0/20; GatewayAddress=172.30.128.1; Health=; ID=F404D3EB-68A2-4F04-B560-CB2F08D04623; Policies=System.Object[]; State=0}} TotalEndpoints : 0 Type : nat Version : 38654705669

ActivityId : 371F0876-AE5C-44A9-93E4-D7FBD56747C2 AdditionalParams : CurrentEndpointCount : 0 DNSServerCompartment : 3 DrMacAddress : 00-15-5D-D3-EB-1F Extensions : {@{Id=E7C3B2F0-F3C5-48DF-AF2B-10FED6D72E7A; IsEnabled=False; Name=Microsoft Windows Filtering Platform}, @{Id=E9B59CFA-2BE1-4B21-828F-B6FBDBDDC017; IsEnabled=True; Name=Microsoft Azure VFP Switch Extension}, @{Id=EA24CD6C-D17A-4348-9190-09F0D5BE83DD; IsEnabled=True; Name=Microsoft NDIS Capture}} Flags : 0 Health : @{LastErrorCode=0; LastUpdateTime=133371053797007946} ID : A2F52629-6E15-40BD-90C6-541DD593D1A8 IPv6 : False LayeredOn : 0FE14CDD-6626-4D62-9F40-66EB11FCD67A MacPools : {@{EndMacAddress=00-15-5D-7E-5F-FF; StartMacAddress=00-15-5D-7E-50-00}} ManagementIP : 172.31.7.77 MaxConcurrentEndpoints : 0 Name : External NetworkAdapterName : Ethernet 2 Policies : {} Resources : @{AdditionalParams=; AllocationOrder=1; Allocators=System.Object[]; Health=; ID=371F0876-AE5C-44A9-93E4-D7FBD56747C2; PortOperationTime=0; State=1; SwitchOperationTime=0; VfpOperationTime=0; parentId=D5A5C34A-D3C8-4D7D-915B-C7973991CEFD} State : 1 Subnets : {@{AdditionalParams=; AddressPrefix=192.168.255.0/30; GatewayAddress=192.168.255.1; Health=; ID=8496FCB5-2B3E-45C1-81F8-11F0F5188B77; ObjectType=5; Policies=System.Object[]; State=0}} TotalEndpoints : 0 Type : Overlay Version : 38654705669 Manuel is currently working on a solution

est-suse commented 1 year ago

Validated on master branch with commit 3ab96dd0016a637f48672b011822de5a35c96045

Environment Details Infrastructure

Cloud Hosted Node(s) CPU architecture, OS, and Version:

Ubuntu 22.04 as Linux server and agent
Windows 2019 (1809) as Windows agent node

Cluster Configuration:

NAME                                         STATUS   ROLES                       AGE   VERSION
ip-172-31-1-182.us-east-2.compute.internal   Ready    control-plane,etcd,master   18h   v1.27.4+rke2r1
ip-172-31-3-150.us-east-2.compute.internal   Ready    control-plane,etcd,master   18h   v1.27.4+rke2r1
ip-172-31-3-228.us-east-2.compute.internal   Ready    <none>                      18h   v1.27.4+rke2r1
ip-172-31-6-253.us-east-2.compute.internal   Ready    control-plane,etcd,master   18h   v1.27.4+rke2r1
ip-ac1f055f                                  Ready    <none>                      58s   v1.27.4

Config.yaml:

write-kubeconfig-mode: "0644"
cni: calico

Testing Steps

Copy config.yaml
$ sudo mkdir -p /etc/rancher/rke2 && sudo cp config.yaml /etc/rancher/rke2
Install RKE2 on server node
Join agent node and Windows agent node
Add the system variable: [System.Environment]::SetEnvironmentVariable('FELIX_DATASTORETYPE','etcdv3', 'Machine')

Validate the logs are shwoing the OS varaibale:


 Get-EventLog -LogName Application -Source 'rke2'  -Newest 200 | select-object -Property TimeWritten,ReplacementStrings | Format-Table -Wrap

8/23/2023 3:59:31 PM  {Felix Envs: [KUBE_NETWORK=Calico.* KUBECONFIG=c:\var\lib\rancher\rke2\agent\calico.kubeconfig NODENAME=ip-ac1f055f CALICO_K8S_NODE_REF=ip-ac1f055f IP=172.31.5.95 USE_POD_CIDR=false FELIX_FELIXHOSTNAME=ip-ac1f055f FELIX_VXLANVNI=4096 FELIX_DATASTORETYPE=kubernetes FELIX_DATASTORETYPE=etcdv3]}

from var/logs/felix:

2023-08-23 15:59:31.505 [INFO][5912] felix/config_params.go 612: Parsing value for DatastoreType: etcdv3 (from environment variable)
2023-08-23 15:59:31.505 [INFO][5912] felix/config_params.go 648: Parsed value for DatastoreType: etcdv3 (from environment variable)
est-suse commented 1 year ago
NAMESPACE         NAME                                                                  READY   STATUS      RESTARTS   AGE
calico-system     calico-kube-controllers-958554c49-gkx5r                               1/1     Running     0          18h
calico-system     calico-node-5hxtc                                                     1/1     Running     0          18h
calico-system     calico-node-ckwbh                                                     1/1     Running     0          18h
calico-system     calico-node-f6ff5                                                     1/1     Running     0          18h
calico-system     calico-node-tc7zh                                                     1/1     Running     0          18h
calico-system     calico-typha-6fdbdb7844-28k44                                         1/1     Running     0          18h
calico-system     calico-typha-6fdbdb7844-8dswt                                         1/1     Running     0          18h
calico-system     calico-typha-6fdbdb7844-hqp56                                         1/1     Running     0          37m
kube-system       cloud-controller-manager-ip-172-31-1-182.us-east-2.compute.internal   1/1     Running     0          18h
kube-system       cloud-controller-manager-ip-172-31-3-150.us-east-2.compute.internal   1/1     Running     0          18h
kube-system       cloud-controller-manager-ip-172-31-6-253.us-east-2.compute.internal   1/1     Running     0          18h
kube-system       etcd-ip-172-31-1-182.us-east-2.compute.internal                       1/1     Running     0          18h
kube-system       etcd-ip-172-31-3-150.us-east-2.compute.internal                       1/1     Running     0          18h
kube-system       etcd-ip-172-31-6-253.us-east-2.compute.internal                       1/1     Running     0          18h
kube-system       helm-install-rke2-calico-9rxwf                                        0/1     Completed   2          18h
kube-system       helm-install-rke2-calico-crd-ktmcb                                    0/1     Completed   0          18h
kube-system       helm-install-rke2-coredns-bz7cn                                       0/1     Completed   0          18h
kube-system       helm-install-rke2-ingress-nginx-z2qrv                                 0/1     Completed   0          18h
kube-system       helm-install-rke2-metrics-server-qqcxw                                0/1     Completed   0          18h
kube-system       helm-install-rke2-snapshot-controller-crd-qx25j                       0/1     Completed   0          18h
kube-system       helm-install-rke2-snapshot-controller-wfhr2                           0/1     Completed   0          18h
kube-system       helm-install-rke2-snapshot-validation-webhook-9kl2g                   0/1     Completed   0          18h
kube-system       kube-apiserver-ip-172-31-1-182.us-east-2.compute.internal             1/1     Running     0          18h
kube-system       kube-apiserver-ip-172-31-3-150.us-east-2.compute.internal             1/1     Running     0          18h
kube-system       kube-apiserver-ip-172-31-6-253.us-east-2.compute.internal             1/1     Running     0          18h
kube-system       kube-controller-manager-ip-172-31-1-182.us-east-2.compute.internal    1/1     Running     0          18h
kube-system       kube-controller-manager-ip-172-31-3-150.us-east-2.compute.internal    1/1     Running     0          18h
kube-system       kube-controller-manager-ip-172-31-6-253.us-east-2.compute.internal    1/1     Running     0          18h
kube-system       kube-proxy-ip-172-31-1-182.us-east-2.compute.internal                 1/1     Running     0          18h
kube-system       kube-proxy-ip-172-31-3-150.us-east-2.compute.internal                 1/1     Running     0          18h
kube-system       kube-proxy-ip-172-31-3-228.us-east-2.compute.internal                 1/1     Running     0          18h
kube-system       kube-proxy-ip-172-31-6-253.us-east-2.compute.internal                 1/1     Running     0          18h
kube-system       kube-scheduler-ip-172-31-1-182.us-east-2.compute.internal             1/1     Running     0          18h
kube-system       kube-scheduler-ip-172-31-3-150.us-east-2.compute.internal             1/1     Running     0          18h
kube-system       kube-scheduler-ip-172-31-6-253.us-east-2.compute.internal             1/1     Running     0          18h
kube-system       rke2-coredns-rke2-coredns-5f5d6b54c7-7dtqq                            1/1     Running     0          18h
kube-system       rke2-coredns-rke2-coredns-5f5d6b54c7-wdvbs                            1/1     Running     0          18h
kube-system       rke2-coredns-rke2-coredns-autoscaler-6bf8f59fd5-8vfmk                 1/1     Running     0          18h
kube-system       rke2-ingress-nginx-controller-9mgqk                                   1/1     Running     0          18h
kube-system       rke2-ingress-nginx-controller-crf4q                                   1/1     Running     0          18h
kube-system       rke2-ingress-nginx-controller-mjvp4                                   1/1     Running     0          18h
kube-system       rke2-ingress-nginx-controller-vt4ph                                   1/1     Running     0          18h
kube-system       rke2-metrics-server-6d79d977db-jv7th                                  1/1     Running     0          18h
kube-system       rke2-snapshot-controller-7d6476d7cb-c5xgq                             1/1     Running     0          18h
kube-system       rke2-snapshot-validation-webhook-5649fbd66c-kndmz                     1/1     Running     0          18h
tigera-operator   tigera-operator-569cff7b5b-94lsg                                      1/1     Running     0          18h

Get-HNSNetwork

MaxConcurrentEndpoints : 0
Name                   : Calico
Policies               : {@{Type=HostRoute}, @{DestinationPrefix=10.42.151.192/26; DistributedRouterMacAddress=66-46-5f-64-6b-c3; IsolationId=4096; ProviderAddress=172.31.1.182; Type=RemoteSubnetRoute}, @{DestinationPrefix=10.42.74.128/26; DistributedRouterMacAddress=66-a1-e8-ee-99-f3; IsolationId=4096; ProviderAddress=172.31.3.228; Type=RemoteSubnetRoute}, @{DestinationPrefix=10.42.169.128/26; DistributedRouterMacAddress=66-c8-f9-e9-c5-7c; IsolationId=4096;
                         ProviderAddress=172.31.6.253; Type=RemoteSubnetRoute}...}
Resources              : @{AdditionalParams=; AllocationOrder=1; Allocators=System.Object[]; Health=; ID=B77497F2-D1C5-4376-A61E-0CFAD42529E6; PortOperationTime=0; State=1; SwitchOperationTime=0; VfpOperationTime=0; parentId=0D5F1C9C-F946-40CC-AF8F-0D000CC754FC}
State                  : 1
Subnets                : {@{AdditionalParams=; AddressPrefix=10.42.123.128/26; GatewayAddress=10.42.123.129; Health=; ID=77056D11-6890-43DB-8213-12EA0826EB44; ObjectType=5; Policies=System.Object[]; State=0}}
TotalEndpoints         : 0
Type                   : Overlay
Version                : 38654705669
manuelbuil commented 1 year ago

/backport v1.24.17+rke2r1