k8snetworkplumbingwg / whereabouts

A CNI IPAM plugin that assigns IP addresses cluster-wide
Apache License 2.0
276 stars 120 forks source link

CRDv2 data model #51

Open crandles opened 4 years ago

crandles commented 4 years ago

The current data model for the IPPool CRD stores allocations for a given range in a single Kubernetes resource.

Known Issues

Open Questions

Idea

TBD, come up with draft yaml spec/examples. Including some rough ideas now:

Moving to IP as the "base" resource type solves for preventing duplicate IP allocations from occurring in overlapping ranges, but introduces a problem:

Can we solve querying IPs from a given range with using well-crafted labels? kubectl get ip -l subnet_31=127.0.0.0 (?) Need to determine labels and consider IPV6.

Given we have a single resource type that is associated with an allocation, we can leverage kubernetes' built-in garbage collection capabilities to solve for the orphaned IP allocation problem.

By configuring the pod as the owner of the IP allocation resource we can instruct Kubernetes to automatically delete the resource type when the pod is deleted. We may still clean up our resources via the CNI DEL, but this would serve as a fall-through to prevent IP exhaustion (With no operator, cron, or other process necessary).

We should be able to create a Namespace-scoped and a Cluster-scoped client: I think either might make sense but it would not make sense to use both concurrently. This should be configured in the whereabouts IPAM config.

Limitations

  1. Moving to such a model would not make it easy to support overlapping IP ranges, we may have to drop that use case. We could potentially keep multiple CRD versions around (v1alpha1, v1beta1, etc) if this was an important use-case.
  2. This does not apply well to the etcd datastore as it relies on Kubernetes to perform the garbage collection.
dougbtv commented 4 years ago

+1 re: the solution for configuring the pod as the owner of the IP allocation, that's quite excellent, great suggestion.

Definitely up for this change / refactor. Thanks for the outline, looking forward to the CRD proposal.

Another consideration: upgrade path (I'll think on this one, too)

dougbtv commented 4 years ago

Another quick thought on the labels, I have this idea about a "sticky IP address using MAC address" -- so if a workload comes back up with the same mac address, they get the same IP address.

This could be a separate store/CRD, but... we could label the IP address CR with a mac address to query for it. We could also release the ownership when this is used so that the IP CR sticks around, too.

crandles commented 3 years ago

Moving to IP as the "base" resource type solves for preventing duplicate IP allocations from occurring in overlapping ranges, but introduces a problem:

How do we easily query kubernetes to determine the next available IP? The IPPool type is a useful bucket that we are removing

I am having second thoughts about this idea. (using a per-ip resource and labels to query for subnet allocations)

IPv6 subnets are very large, and there are many possible subnets; generating 128 labels to enable query lookups seems like a poor design. Additionally, I worry how well this scales.

Alternatively: I believe Kubernetes resources are limited to ~1MB in size (based on etcd limit); the current pool implementation does not factor this in (Should it?)

I could imagine a CRD model that involved ip blocks, owned by an ip pool; they contain n addresses and are allotted to the pool in blocks. This could be a sort-of-medium between sharing IPs between pools and avoiding a per-ip resource.

We would need to do some testing to find the right sizes. The pool data type would hold metadata pointing to the full blocks and the current block with free ips.

IPv6 would need additional testing for the maximum number of sub-blocks, etc. May be impossible to support many subnets sizes in 1MB?

Is it important to support allocating the same IP in multiple ranges?.

This still isn't clear. Should it be optimized for?

Under certain failure scenarios IP allocations may become orphaned

I think we can still leverage Pod owner references + garbage collection by:

and leveraging an operator + finalizer to ensure the Pool/blocks are maintained.