saltstack / salt-ext-modules-vmware

Salt Extension Modules for VMware
Apache License 2.0
20 stars 36 forks source link

[Architecture] ESXi [and probably other] modules don't follow salt/CM best practice #257

Open dfidler opened 2 years ago

dfidler commented 2 years ago

salt's biggest market value lies in the fact that it can do declarative config management across a wide variety of IT asset types.

A fundamental requirement to declarative configuration management, is that managed objects [henceforth, asset] MUST have an identity in the management system. However, when you look at how the esxi sddc module works, it deprives salt of the identity of the esxi hosts that you're managing.

The modules are designed to be executed on a minion that has the vmware_config pillar data attached to it. When you want to run a command (like set_advanced_config), you specify the "host_name[s]" in the method call to filter which hosts get the config parameter.

The invocation looks like this:

salt saltmaster vmware_esxi.set_advanced_config Power.PerfBias 18

This will set the config for all hosts. Optionally, you can supply a "host_name[s]" parameter to filter which hosts get the config. If the esxi state module had a "set_advanced_config option" (it doesn't yet), I would expect it to look like this:

# /srv/salt/esxi/power_perf_bias.sls
ensure power perf bias set:
  vmware_esxi.advanced_config_present:
    - name: Power.PerfBias
    - value: 18
    - hosts:
      - esxihost01
      - esxihost02
      - etc

What this is doing is forcing you to define your infrastructure inside of your state files. You could abstract this away using pillar and jinja loops, etc, but at the end of the day, the only "identity" that the vsphere environment has is behind a single salt-minion; like this:

salt saltmaster state.apply esxi.power_perf_bias

All changes would be aggregated into one big changes block.

The "salty" way to do this would be to have every host have a minion_id (esxihost01, esxihost02, etc) and then you'd have the following state file:

# /srv/salt/esxi/power_perf_bias.sls
ensure power perf bias set:
  vmware_esxi.advanced_config_present:
    - name: Power.PerfBias
    - value: 18

And you'd configure this in your top.sls

base:
  'esxihost*':
    - esxi.power_perf_bias

Now you can execute highstate against individual systems and easily see which objects had changes, etc.

Why is this important?

Primarily for SecOps. I keep getting requests from customers for them to be able to "assure my esxi configs using my standard benchmarks - I want to ensure that all of my esxi servers in a given cluster have the same configuration and if one changes, I want to know about it".

Customers want the SecOps reports, and events to show that data. I have received "standard operating environment" specifications from a couple of customers now and they include configurations at several levels - the host, vsphere, datacenter, cluster and vm levels.

Using the modules the way that they are currently architected means that any deviation from a standard config shows up as a deviation of the entire environment instead of just for a specific object within it. There's not enough granularity.

I understand why the esxi module was written the way it was; it more closely mirrors the vsphere api, it removes the necessity to stand up hundreds of salt-proxy processes (one for each host in your environment). This method is also loosly akin to managing infrastructure objects inside of aws (using the boto modules).

Older efforts to manage esxi were using a proxy minion (in lieu of a native agent) to give the different management objects identity inside of salt. We have several of them:

This architecture actually makes sense because if you want to configure your vsphere estate, you will do it at different levels; at the host, vecenter, cluster and DC levels. Each of those objects should have identity. There are many users out there that have standard configs that their systems must adhere to and they look to the secops comply module to implement them (because of its reporting and remediation capabilities).

IMHO, this is a "better" (though not ideal) way of managing vsphere. The traditional way would be to stand up these proxy minions (one for each host) and manage the config using highstate and rules in the top.sls.

_Aside: Using the current modules, you can work around this by installing a separate salt-minion for each esxi host (each with a config in a different directory) and then assign the vmware_config to each of them. Then, in the state file, reference the "id" grain to limit operations like the "hostname(s)" parameters. Although, IMHO, the idea would be to have a multiplexing super-proxy minion (kind of like the delta proxy) that detects which hosts are in a vsphere environment, and then assumes the identity of all of them but appears, to the salt-master, to be each of them.

ggiesen commented 2 years ago

Honestly I think the best way to do this would be to take a balance between being forced to run a minion for each level of the hierarchy, and this extension of having a single minion to rule them all. I actually prefer the single (proxy) minion in most cases, as it's much easier to coordinate states and requisites across the vCenter > Datacenter (or Cluster) > Cluster (or Datacenter) > Host hierarchy, and you're not bound to running a proxy minion for each level. That being said I think making it optional would be best.

Basically, structure the modules in such a way such that the parameters are specified by proxy type. For example, if you're running the execution module or state on an esxcluster proxy minion, then vcenter and datacenter are already populated (assuming datacenter is above cluster in your hierarchy), and all parameters below cluster in the hierarchy need to be specified (ie. host). If you're running the state on an esxi proxy minion, then all parameters (vcenter, datacenter, cluster, host) are already populated.

Operations on objects in the hierarchy above the proxy minion type would be locked out, and operations on objects below the proxy minion could optionally locked out by a proxy pillar option (so for example an esxdatacenter proxy could operate on the datacenter, cluster, and host or optionally datacenter only).

I think this gives the most flexibility as to how users can structure their minions.

dfidler commented 2 years ago

Honestly I think the best way to do this would be to take a balance between being forced to run a minion for each level of the hierarchy, and this extension of having a single minion to rule them all. I actually prefer the single (proxy) minion in most cases, as it's much easier to coordinate states and requisites across the vCenter > Datacenter (or Cluster) > Cluster (or Datacenter) > Host hierarchy, and you're not bound to running a proxy minion for each level. That being said I think making it optional would be best.

That's why my last sentence says this: Although, IMHO, the idea[l] would be to have a multiplexing super-proxy minion (kind of like the delta proxy) that detects which hosts are in a vsphere environment, and then assumes the identity of all of them but appears, to the salt-master, to be each of them.

It offers the best of both words - a "salty" way of managing systems that gives each "object" an identity in salt, but takes away the pain of managing proxy minion processes. If that super-proxy assumed the identity of all minions, that abstraction layer could translate salt targeting into the "hostname" parameter in the SDDC modules.

So you could craft something like:

salt G@os:esxi state.apply somestatefile

The minion looks at it's grains and determines which "virtual" minions satisfy that targeting string (that it owns) and it constructs the sddc module call appropriately.

The cost savings in such a solution would be significant. You wouldn't be paying extran infrastructure costs for running 1000 minions processes (memory & CPU costs) and you wouldn't incur the operational costs of managing salt-proxy processes and all of their many configurations - just give this multiplexing minion the vcenter credentials and it manages everything else.

Best of both worlds.