Closed jcshare closed 3 days ago
can some expert help fix this issue as I have no authority to do it, many thanks
please attach the err log in the kube-ovn-controller pod about the nat gw pod allocate ip
701 I0619 12:07:04.058569 6 ipam.go:60] allocate v4 192.168.1.10, v6 , mac for kube-system/vpc-nat-gw-gw1-vpc-1-0 from subnet ovn-vpc-external-network
702 I0619 12:07:04.071551 6 ipam.go:72] allocating static ip 10.0.1.254 from subnet net1-vpc-1
703 E0619 12:07:04.072121 6 pod.go:1762] failed to get static ip 10.0.1.254, mac
err: failed to get static ip 10.0.1.254, mac , subnet ovn-default, err AddressOutOfRange 719 E0619 12:07:04.097851 6 pod.go:620] AddressOutOfRange
please attatch kubectl get subnet
details:
the root cause should be identified as above, we need to handle the exception gracefully. I have rebuild my setup, so paste the subnet and VPC's definition as below:
kind: Vpc
apiVersion: kubeovn.io/v1
metadata:
name: vpc-1
spec:
staticRoutes:
- cidr: 0.0.0.0/0
nextHopIP: 10.0.1.254
policy: policyDst
namespaces:
- ns1
---
kind: Subnet
apiVersion: kubeovn.io/v1
metadata:
name: net1-vpc-1
spec:
vpc: vpc-1
cidrBlock: 10.0.1.0/24
protocol: IPv4
excludeIps:
- 10.0.1.254
namespaces:
- ns1
kind: VpcNatGateway
apiVersion: kubeovn.io/v1
metadata:
name: gw-vpc-1
spec:
vpc: vpc-1
subnet: net1-vpc-1
lanIp: 10.0.1.254
selector:
- "kubernetes.io/hostname: worker2"
- "kubernetes.io/os: linux"
externalSubnets:
- ovn-vpc-external-network
ubuntu@master:~/project/debug/1.12.7/test$ kubectl get subnet
NAME PROVIDER VPC PROTOCOL CIDR PRIVATE NAT DEFAULT GATEWAYTYPE V4USED V4AVAILABLE V6USED V6AVAILABLE EXCLUDEIPS U2OINTERCONNECTIONIP
join ovn ovn-cluster IPv4 100.64.0.0/16 false false false distributed 3 65530 0 0 ["100.64.0.1"]
ovn-default ovn ovn-cluster IPv4 10.16.0.0/16 false true true distributed 5 65528 0 0 ["10.16.0.1"]
ovn-vpc-external-network ovn-vpc-external-network.kube-system IPv4 192.168.1.0/24 false false false distributed 3 7 0 0 ["192.168.1.1..192.168.1.9","192.168.1.20..192.168.1.255"]
ubuntu@master:~/project/debug/1.12.7/test$
per the log above,it looks exist another problem(as you mentioned ), the controller shouldn't allocate the "10.0.1.254" for ovn-default subnet
where is your 10.0.1.0/24 subnet ???
where is your 10.0.1.0/24 subnet ???
could you help take a deep look at the problem? it should be easy to reproduce with my configuration above. my testbed got broken by the problem and I have rebuild it, so you cannot see the subnet in my current setup.
many thanks
anyway, I will reproduce it and upload all the log file later, thanks
I have reproduced it with a new VPC named "vpc-3" with related log files could you help take a look? many thanks 1.12.7-IP-Allocation-bug.zip
where is your 10.0.1.0/24 subnet ???
sorry, I think, when you get subnet: it shows all the subnets, but, i do not find the 10.0.1.0/24 subnet
where is your 10.0.1.0/24 subnet ???
sorry, I think, when you get subnet: it shows all the subnets, but, i do not find the 10.0.1.0/24 subnet
could you refer to my reply above : https://github.com/kubeovn/kube-ovn/issues/4210#issuecomment-2185711409
it looks like the problem is obviously, can we help fix it if possible? many thanks
you do not have the vpc subnet 10.0.1.0/24
, if you use the 10.0.1.0.254, you should create it.
if you use vpc3 subnet, I think you should use 10.0.3.254.
"if you use vpc3 subnet, I think you should use 10.0.3.254."
yes, I'm using 10.0.3.254 for vpc3, please refer to vpc3(rather than vpc1) related configuration/debug info in the tar ball the information you mentioned was the stale configuration of vpc1(should be another issue that need to be handled)
thanks
hi @zhangzujian, it seems IPAM has a problem ?
Kube-OVN Version
v.1.12.17 and master
Kubernetes Version
v1.27
Operation-system/Kernel Version
"Ubuntu 20.04.6 LTS" / 5.4.0-186-generic
Description
it looks like no handling for the failure of IP allocation during create the VPC GW pod, and the whole External IP Pool get exhausted by this problem.
after done some research, I found out the root cause looks like below:
an vpc-gw pod info:
Steps To Reproduce
create and delete vpc nat gate multiple times
Current Behavior
the external IP CIDR get exhausted by this problem
Expected Behavior
nice handling for the IP allocating/releasing to avoid such a problem.