Closed manuelbuil closed 1 year ago
The Calico services on Windows aren't getting the OS env they need to be manually configured on the code.
We are generating the env with https://github.com/rancher/rke2/blob/master/pkg/windows/calico.go#L470
The env from the OS should be manually copied using os.Getenv
https://github.com/rancher/rke2/blob/master/pkg/windows/calico.go#L398
The Calico services on Windows aren't getting the OS env they need to be manually configured on the code. We are generating the env with https://github.com/rancher/rke2/blob/master/pkg/windows/calico.go#L470 The env from the OS should be manually copied using
os.Getenv
https://github.com/rancher/rke2/blob/master/pkg/windows/calico.go#L398
Yes! I have a PR on the works ;)
/backport v1.26.8+rke2r1
/backport v1.25.13+rke2r1
For Visibility.
The C:\var\log/felix is not being created, check with the commit ID: https://github.com/rancher/rke2/commit/70c3aee2bb1f46b8a19540e13e857786ecd4606c
The result of Get-HNSNetwork is nor showing calico as a network:
ActivityId : C6B0351B-A290-4F80-827F-EFAA302F624D AdditionalParams : CurrentEndpointCount : 0 Extensions : {@{Id=E7C3B2F0-F3C5-48DF-AF2B-10FED6D72E7A; IsEnabled=False; Name=Microsoft Windows Filtering Platform}, @{Id=E9B59CFA-2BE1-4B21-828F-B6FBDBDDC017; IsEnabled=False; Name=Microsoft Azure VFP Switch Extension}, @{Id=EA24CD6C-D17A-4348-9190-09F0D5BE83DD; IsEnabled=True; Name=Microsoft NDIS Capture}} Flags : 0 Health : @{AddressNotificationMissedCount=0; AddressNotificationSequenceNumber=0; InterfaceNotificationMissedCount=0; InterfaceNotificationSequenceNumber=0; LastErrorCode=0; LastUpdateTime=133370975609612046; RouteNotificationMissedCount=0; RouteNotificationSequenceNumber=0} ID : 926EFCED-6D9E-4B1C-B592-00D0C7900CEF IPv6 : False LayeredOn : C8ED85A5-51B2-42A3-BD3E-1A55D540771C MacPools : {@{EndMacAddress=00-15-5D-FF-5F-FF; StartMacAddress=00-15-5D-FF-50-00}} MaxConcurrentEndpoints : 0 Name : nat NatName : ICSEFD2DDE5-4FDD-47D7-9187-13C004F22EB8 Policies : {} Resources : @{AdditionalParams=; AllocationOrder=2; Allocators=System.Object[]; Health=; ID=C6B0351B-A290-4F80-827F-EFAA302F624D; PortOperationTime=0; State=1; SwitchOperationTime=0; VfpOperationTime=0; parentId=E5562E80-6618-482A-B00E-CB56070E028B} State : 1 Subnets : {@{AdditionalParams=; AddressPrefix=172.30.128.0/20; GatewayAddress=172.30.128.1; Health=; ID=F404D3EB-68A2-4F04-B560-CB2F08D04623; Policies=System.Object[]; State=0}} TotalEndpoints : 0 Type : nat Version : 38654705669
ActivityId : 371F0876-AE5C-44A9-93E4-D7FBD56747C2 AdditionalParams : CurrentEndpointCount : 0 DNSServerCompartment : 3 DrMacAddress : 00-15-5D-D3-EB-1F Extensions : {@{Id=E7C3B2F0-F3C5-48DF-AF2B-10FED6D72E7A; IsEnabled=False; Name=Microsoft Windows Filtering Platform}, @{Id=E9B59CFA-2BE1-4B21-828F-B6FBDBDDC017; IsEnabled=True; Name=Microsoft Azure VFP Switch Extension}, @{Id=EA24CD6C-D17A-4348-9190-09F0D5BE83DD; IsEnabled=True; Name=Microsoft NDIS Capture}} Flags : 0 Health : @{LastErrorCode=0; LastUpdateTime=133371053797007946} ID : A2F52629-6E15-40BD-90C6-541DD593D1A8 IPv6 : False LayeredOn : 0FE14CDD-6626-4D62-9F40-66EB11FCD67A MacPools : {@{EndMacAddress=00-15-5D-7E-5F-FF; StartMacAddress=00-15-5D-7E-50-00}} ManagementIP : 172.31.7.77 MaxConcurrentEndpoints : 0 Name : External NetworkAdapterName : Ethernet 2 Policies : {} Resources : @{AdditionalParams=; AllocationOrder=1; Allocators=System.Object[]; Health=; ID=371F0876-AE5C-44A9-93E4-D7FBD56747C2; PortOperationTime=0; State=1; SwitchOperationTime=0; VfpOperationTime=0; parentId=D5A5C34A-D3C8-4D7D-915B-C7973991CEFD} State : 1 Subnets : {@{AdditionalParams=; AddressPrefix=192.168.255.0/30; GatewayAddress=192.168.255.1; Health=; ID=8496FCB5-2B3E-45C1-81F8-11F0F5188B77; ObjectType=5; Policies=System.Object[]; State=0}} TotalEndpoints : 0 Type : Overlay Version : 38654705669 Manuel is currently working on a solution
Validated on master branch with commit 3ab96dd0016a637f48672b011822de5a35c96045
Environment Details Infrastructure
Cloud Hosted Node(s) CPU architecture, OS, and Version:
Ubuntu 22.04 as Linux server and agent
Windows 2019 (1809) as Windows agent node
Cluster Configuration:
NAME STATUS ROLES AGE VERSION
ip-172-31-1-182.us-east-2.compute.internal Ready control-plane,etcd,master 18h v1.27.4+rke2r1
ip-172-31-3-150.us-east-2.compute.internal Ready control-plane,etcd,master 18h v1.27.4+rke2r1
ip-172-31-3-228.us-east-2.compute.internal Ready <none> 18h v1.27.4+rke2r1
ip-172-31-6-253.us-east-2.compute.internal Ready control-plane,etcd,master 18h v1.27.4+rke2r1
ip-ac1f055f Ready <none> 58s v1.27.4
Config.yaml:
write-kubeconfig-mode: "0644"
cni: calico
Testing Steps
Copy config.yaml
$ sudo mkdir -p /etc/rancher/rke2 && sudo cp config.yaml /etc/rancher/rke2
Install RKE2 on server node
Join agent node and Windows agent node
Add the system variable: [System.Environment]::SetEnvironmentVariable('FELIX_DATASTORETYPE','etcdv3', 'Machine')
Validate the logs are shwoing the OS varaibale:
Get-EventLog -LogName Application -Source 'rke2' -Newest 200 | select-object -Property TimeWritten,ReplacementStrings | Format-Table -Wrap
8/23/2023 3:59:31 PM {Felix Envs: [KUBE_NETWORK=Calico.* KUBECONFIG=c:\var\lib\rancher\rke2\agent\calico.kubeconfig NODENAME=ip-ac1f055f CALICO_K8S_NODE_REF=ip-ac1f055f IP=172.31.5.95 USE_POD_CIDR=false FELIX_FELIXHOSTNAME=ip-ac1f055f FELIX_VXLANVNI=4096 FELIX_DATASTORETYPE=kubernetes FELIX_DATASTORETYPE=etcdv3]}
from var/logs/felix:
2023-08-23 15:59:31.505 [INFO][5912] felix/config_params.go 612: Parsing value for DatastoreType: etcdv3 (from environment variable)
2023-08-23 15:59:31.505 [INFO][5912] felix/config_params.go 648: Parsed value for DatastoreType: etcdv3 (from environment variable)
NAMESPACE NAME READY STATUS RESTARTS AGE
calico-system calico-kube-controllers-958554c49-gkx5r 1/1 Running 0 18h
calico-system calico-node-5hxtc 1/1 Running 0 18h
calico-system calico-node-ckwbh 1/1 Running 0 18h
calico-system calico-node-f6ff5 1/1 Running 0 18h
calico-system calico-node-tc7zh 1/1 Running 0 18h
calico-system calico-typha-6fdbdb7844-28k44 1/1 Running 0 18h
calico-system calico-typha-6fdbdb7844-8dswt 1/1 Running 0 18h
calico-system calico-typha-6fdbdb7844-hqp56 1/1 Running 0 37m
kube-system cloud-controller-manager-ip-172-31-1-182.us-east-2.compute.internal 1/1 Running 0 18h
kube-system cloud-controller-manager-ip-172-31-3-150.us-east-2.compute.internal 1/1 Running 0 18h
kube-system cloud-controller-manager-ip-172-31-6-253.us-east-2.compute.internal 1/1 Running 0 18h
kube-system etcd-ip-172-31-1-182.us-east-2.compute.internal 1/1 Running 0 18h
kube-system etcd-ip-172-31-3-150.us-east-2.compute.internal 1/1 Running 0 18h
kube-system etcd-ip-172-31-6-253.us-east-2.compute.internal 1/1 Running 0 18h
kube-system helm-install-rke2-calico-9rxwf 0/1 Completed 2 18h
kube-system helm-install-rke2-calico-crd-ktmcb 0/1 Completed 0 18h
kube-system helm-install-rke2-coredns-bz7cn 0/1 Completed 0 18h
kube-system helm-install-rke2-ingress-nginx-z2qrv 0/1 Completed 0 18h
kube-system helm-install-rke2-metrics-server-qqcxw 0/1 Completed 0 18h
kube-system helm-install-rke2-snapshot-controller-crd-qx25j 0/1 Completed 0 18h
kube-system helm-install-rke2-snapshot-controller-wfhr2 0/1 Completed 0 18h
kube-system helm-install-rke2-snapshot-validation-webhook-9kl2g 0/1 Completed 0 18h
kube-system kube-apiserver-ip-172-31-1-182.us-east-2.compute.internal 1/1 Running 0 18h
kube-system kube-apiserver-ip-172-31-3-150.us-east-2.compute.internal 1/1 Running 0 18h
kube-system kube-apiserver-ip-172-31-6-253.us-east-2.compute.internal 1/1 Running 0 18h
kube-system kube-controller-manager-ip-172-31-1-182.us-east-2.compute.internal 1/1 Running 0 18h
kube-system kube-controller-manager-ip-172-31-3-150.us-east-2.compute.internal 1/1 Running 0 18h
kube-system kube-controller-manager-ip-172-31-6-253.us-east-2.compute.internal 1/1 Running 0 18h
kube-system kube-proxy-ip-172-31-1-182.us-east-2.compute.internal 1/1 Running 0 18h
kube-system kube-proxy-ip-172-31-3-150.us-east-2.compute.internal 1/1 Running 0 18h
kube-system kube-proxy-ip-172-31-3-228.us-east-2.compute.internal 1/1 Running 0 18h
kube-system kube-proxy-ip-172-31-6-253.us-east-2.compute.internal 1/1 Running 0 18h
kube-system kube-scheduler-ip-172-31-1-182.us-east-2.compute.internal 1/1 Running 0 18h
kube-system kube-scheduler-ip-172-31-3-150.us-east-2.compute.internal 1/1 Running 0 18h
kube-system kube-scheduler-ip-172-31-6-253.us-east-2.compute.internal 1/1 Running 0 18h
kube-system rke2-coredns-rke2-coredns-5f5d6b54c7-7dtqq 1/1 Running 0 18h
kube-system rke2-coredns-rke2-coredns-5f5d6b54c7-wdvbs 1/1 Running 0 18h
kube-system rke2-coredns-rke2-coredns-autoscaler-6bf8f59fd5-8vfmk 1/1 Running 0 18h
kube-system rke2-ingress-nginx-controller-9mgqk 1/1 Running 0 18h
kube-system rke2-ingress-nginx-controller-crf4q 1/1 Running 0 18h
kube-system rke2-ingress-nginx-controller-mjvp4 1/1 Running 0 18h
kube-system rke2-ingress-nginx-controller-vt4ph 1/1 Running 0 18h
kube-system rke2-metrics-server-6d79d977db-jv7th 1/1 Running 0 18h
kube-system rke2-snapshot-controller-7d6476d7cb-c5xgq 1/1 Running 0 18h
kube-system rke2-snapshot-validation-webhook-5649fbd66c-kndmz 1/1 Running 0 18h
tigera-operator tigera-operator-569cff7b5b-94lsg 1/1 Running 0 18h
Get-HNSNetwork
MaxConcurrentEndpoints : 0
Name : Calico
Policies : {@{Type=HostRoute}, @{DestinationPrefix=10.42.151.192/26; DistributedRouterMacAddress=66-46-5f-64-6b-c3; IsolationId=4096; ProviderAddress=172.31.1.182; Type=RemoteSubnetRoute}, @{DestinationPrefix=10.42.74.128/26; DistributedRouterMacAddress=66-a1-e8-ee-99-f3; IsolationId=4096; ProviderAddress=172.31.3.228; Type=RemoteSubnetRoute}, @{DestinationPrefix=10.42.169.128/26; DistributedRouterMacAddress=66-c8-f9-e9-c5-7c; IsolationId=4096;
ProviderAddress=172.31.6.253; Type=RemoteSubnetRoute}...}
Resources : @{AdditionalParams=; AllocationOrder=1; Allocators=System.Object[]; Health=; ID=B77497F2-D1C5-4376-A61E-0CFAD42529E6; PortOperationTime=0; State=1; SwitchOperationTime=0; VfpOperationTime=0; parentId=0D5F1C9C-F946-40CC-AF8F-0D000CC754FC}
State : 1
Subnets : {@{AdditionalParams=; AddressPrefix=10.42.123.128/26; GatewayAddress=10.42.123.129; Health=; ID=77056D11-6890-43DB-8213-12EA0826EB44; ObjectType=5; Policies=System.Object[]; State=0}}
TotalEndpoints : 0
Type : Overlay
Version : 38654705669
/backport v1.24.17+rke2r1
Is your feature request related to a problem? Please describe.
Describe the solution you'd like
Currently, calico-node or felix do not check on OS level env variables when executing their processes, something that makes it hard to change the default config
Describe alternatives you've considered
Additional context