Open crandles opened 4 years ago
+1 re: the solution for configuring the pod as the owner of the IP allocation, that's quite excellent, great suggestion.
Definitely up for this change / refactor. Thanks for the outline, looking forward to the CRD proposal.
Another consideration: upgrade path (I'll think on this one, too)
Another quick thought on the labels, I have this idea about a "sticky IP address using MAC address" -- so if a workload comes back up with the same mac address, they get the same IP address.
This could be a separate store/CRD, but... we could label the IP address CR with a mac address to query for it. We could also release the ownership when this is used so that the IP CR sticks around, too.
Moving to IP as the "base" resource type solves for preventing duplicate IP allocations from occurring in overlapping ranges, but introduces a problem:
How do we easily query kubernetes to determine the next available IP? The IPPool type is a useful bucket that we are removing
I am having second thoughts about this idea. (using a per-ip resource and labels to query for subnet allocations)
IPv6 subnets are very large, and there are many possible subnets; generating 128 labels to enable query lookups seems like a poor design. Additionally, I worry how well this scales.
Alternatively: I believe Kubernetes resources are limited to ~1MB in size (based on etcd limit); the current pool implementation does not factor this in (Should it?)
I could imagine a CRD model that involved ip blocks
, owned by an ip pool; they contain n addresses and are allotted to the pool in blocks. This could be a sort-of-medium between sharing IPs between pools and avoiding a per-ip resource.
We would need to do some testing to find the right sizes. The pool data type would hold metadata pointing to the full blocks and the current block with free ips.
IPv6 would need additional testing for the maximum number of sub-blocks, etc. May be impossible to support many subnets sizes in 1MB?
Is it important to support allocating the same IP in multiple ranges?.
This still isn't clear. Should it be optimized for?
Under certain failure scenarios IP allocations may become orphaned
I think we can still leverage Pod owner references + garbage collection by:
PoolReservation
/ PodIPReservation
)and leveraging an operator + finalizer to ensure the Pool/blocks are maintained.
The current data model for the IPPool CRD stores allocations for a given range in a single Kubernetes resource.
Known Issues
CNI DEL
to be skipped127.0.0.0/22
and127.0.0.0/23
both be able to allocate127.0.0.1
?Open Questions
Idea
TBD, come up with draft yaml spec/examples. Including some rough ideas now:
Moving to
IP
as the "base" resource type solves for preventing duplicate IP allocations from occurring in overlapping ranges, but introduces a problem:IPPool
type is a useful bucket that we are removingCan we solve querying IPs from a given range with using well-crafted labels?
kubectl get ip -l subnet_31=127.0.0.0
(?) Need to determine labels and consider IPV6.Given we have a single resource type that is associated with an allocation, we can leverage kubernetes' built-in garbage collection capabilities to solve for the orphaned IP allocation problem.
By configuring the pod as the owner of the IP allocation resource we can instruct Kubernetes to automatically delete the resource type when the pod is deleted. We may still clean up our resources via the
CNI DEL
, but this would serve as a fall-through to prevent IP exhaustion (With no operator, cron, or other process necessary).We should be able to create a Namespace-scoped and a Cluster-scoped client: I think either might make sense but it would not make sense to use both concurrently. This should be configured in the whereabouts IPAM config.
Limitations