Investigate Node Affinity Scheduling

dvonthenen commented 5 years ago

Is this a BUG REPORT or FEATURE REQUEST?: /kind feature

What happened: Based in part on the work done here: https://github.com/vmware/vsphere-affinity-scheduling-plugin

dvonthenen commented 5 years ago

/assign

dvonthenen commented 5 years ago

I have been researching this alongside my normal day-to-day work. I should have some ideas to discuss shortly.

dvonthenen commented 5 years ago

There are discussions in SIG Cloud Provider concerning affinity and anti-affinity scheduling as it pertains to the underlying infrastructure. Going to table this for now and see if we can provide input to that effort in order to bite off this feature/functionality.

frapposelli commented 5 years ago

/lifecycle frozen

dvonthenen commented 5 years ago

Going to start taking a look at this again. We are re-visit since consensus from SIG Cloud Provider was to do it out of core k8s.

Should still target for post-1.0 release.

sujeet-banerjee commented 5 years ago

I have been working on this proposal since the beginning of the year (attached). I am working on putting a patch for the same. Spec_changes_for_AntiAffinity.docx Test_n_Demo.pdf

dvonthenen commented 5 years ago

@sujeet-banerjee I read through the doc and it looks like the doc is in relation to cluster api. Maybe I am missing something... There definitely needs to be an understanding about VMs and which physical hosts they are on, but the issue is that the scheduler doesn't know about the backing infrastructure when it comes time to scheduling pods on those worker nodes. The VMs themselves can move around within the cluster because of DRS, node failure and etc. It's also more than compute, but this also concerns fault domains on storage like VSAN.

As example, let's assume you have the VMs distributed based on your doc in an ideal configuration, if you target a statefulset to be deployed to a certain region/zone, it's possible that all pods get placed on different (or even the same) VMs but they are placed in the same VSAN fault domain. If that particular fault domain, dies you will end up losing all your data. This is one such problem this issue is planning on addressing.

The doc has a very cluster api centric view of how to address the problem looking at infrastructure upwards, but this proposed component needs to look at workload placement from the pod view downward. Maybe this proposal can help improve the ease of pod scheduling but having VM sit on hosts in an ideal fashion, but we are still talking about pod scheduling and workload placement within those VMs at the end of the day.

s0uky commented 11 months ago

Hi folks, is there any update here?

kubernetes / cloud-provider-vsphere

Investigate Node Affinity Scheduling #179