Allow default value lookup in Hiera/Data in Modules

SimonHoenscheid commented 4 years ago

Use Case

By now, puppet-strings is able to look up a default value of a variable in puppet code

Describe the Solution You Would Like

extend the used ruby code

Describe Alternatives You've Considered

none

Additional Context

context should be clear

ekohl commented 4 years ago

How do you imagine this should be represented? I'm not sure how this could be done reliably. You have to determine that there is no possibility of fact interpolation to get a default, or say "may be set via hiera" if there is some hiera entry. Consider Optional[String] $myvar and it is in data/os/RedHat/7.yaml, but nowhere else. What's "the default"?

binford2k commented 4 years ago

This would require using rspec-puppet-facts and iterating over on_supported_os and generating a table of default values. I'm not sure the UX would be any better than just looking at the code.

ekohl commented 4 years ago

And even that would not be guaranteed correct since there can be custom facts.

bastelfreak commented 4 years ago

What about displaying all possible values for one puppet parameter / hiera key in the REFERECNE.md?

ekohl commented 4 years ago

What about displaying all possible values for one puppet parameter / hiera key in the REFERECNE.md?

I've been thinking about that for a while. So one theory I have is that you can convert interpolation into a regex. In https://gist.github.com/ekohl/6b136d1dab4313edd5bd8eab5a7aacc0 I took a stab at this.

The general idea is that hiera.yaml can contain hierarchies and every level can contain one or more paths. In he first section the files are shown and which interpolations we've found. Since these are regexes, I don't know how reliable they are but it looks fairly OK.

Then it builds a list of overrides, showing which variables map to which value in each file. This is the short overview.

The long version combines both into a single overview. I think it does a good job of showing how many things can be involved, but it's based on my somewhat limited Hiera knowledge.

You can probably aggregate on interpolations to build a minimal table, but I'll defer to others what they prefer to see. Do you want to see it parameter centric? You can show default with a table that lists possible overrides and which file has them. Each file can be a link to a Hiera section which lists each file and the interpolations it found. Currently I discard the name of the level in the hierarchy, but that could also help.

SimonHoenscheid commented 4 years ago

How do you imagine this should be represented? I'm not sure how this could be done reliably. You have to determine that there is no possibility of fact interpolation to get a default, or say "may be set via hiera" if there is some hiera entry. Consider Optional[String] $myvar and it is in data/os/RedHat/7.yaml, but nowhere else. What's "the default"?

I did not think about details, but as I spoke with @bastelfreak the past week, we figured out, if we would migrate a module completely to Data in Modules, it would lack documentation. I would be fine with just analyzing common.yaml or first match.

SimonHoenscheid commented 4 years ago

What about displaying all possible values for one puppet parameter / hiera key in the REFERECNE.md?

I've been thinking about that for a while. So one theory I have is that you can convert interpolation into a regex. In https://gist.github.com/ekohl/6b136d1dab4313edd5bd8eab5a7aacc0 I took a stab at this.

The general idea is that hiera.yaml can contain hierarchies and every level can contain one or more paths. In he first section the files are shown and which interpolations we've found. Since these are regexes, I don't know how reliable they are but it looks fairly OK.

Then it builds a list of overrides, showing which variables map to which value in each file. This is the short overview.

The long version combines both into a single overview. I think it does a good job of showing how many things can be involved, but it's based on my somewhat limited Hiera knowledge.

You can probably aggregate on interpolations to build a minimal table, but I'll defer to others what they prefer to see. Do you want to see it parameter centric? You can show default with a table that lists possible overrides and which file has them. Each file can be a link to a Hiera section which lists each file and the interpolations it found. Currently I discard the name of the level in the hierarchy, but that could also help.

I like this approach, let me think about this for a while. Ideally this would also reflect merge behavior (if any)

bastelfreak commented 4 years ago

I like the idea from @ekohl . Maybe someone from Puppet (@scotje ? :)) can look into this.

SimonHoenscheid commented 4 years ago

I like the idea from @ekohl . Maybe someone from Puppet (@scotje ? :)) can look into this.

I think it's a good way to go

bastelfreak commented 4 years ago

hello maintainers, could one of you look into this and provide some feedback? The issue is open since a few months now (and probably discussed at various places for two years).

bastelfreak commented 3 years ago

hey people. Is it possible to get a statement here? @b4ldr made some comments about this in https://github.com/voxpupuli/puppet-unbound/pull/265. @binford2k are you able to take a look here?

binford2k commented 3 years ago

I'm going to close this as won't-do for now. The UX of exploding defaults from in-module hiera to populate defaults is super long lists, confusing UX, and inconsistent presentation. Here's a mockup of the really neat POC data that @ekohl generated as it might be rendered for that class. https://gist.github.com/binford2k/c4e9277f040bde1634086ac43c3bc2a9

First off, the list becomes much longer. It's 71% longer. Not quite as noticeable with 7 parameters, but imagine that on puppetlabs-apache which has 86 parameters.

Second, there are many inconsistencies within the list. For example, on FreeBSD desktop_package_name is set to open-vm-tools-nox11 and on every other platform it's.... nothing? That seems odd, so I have to look at the code and there I see that all other platforms default to open-vm-tools-desktop. But what are those other platforms? As a user, I can extrapolate relatively easily, but for the computer to do it programatically, it would have to iterate all the platforms listed in metadata.json (35 of them)

Note; that could be somewhat alleviated by listing a "default default" and then iterate the auto-discovered default value overrides. That's shown in override.md. But that seems more confusing than just looking at the data directory, since that will be more obvious which yaml file applies to your usage, and then from there you can see specifically which values have overridden default values.

Third, and related to the note ☝️, every parameter listed has a different list of the discovered overrides. That means that as a reader, I already have to mental post-processing of what I'm reading to figure out what it's actually saying. To me, that mental energy could be better spent with a skim through the data directory to see what overrides apply to the platforms you care about.

Finally, this is a pretty constrained example. There are a lot of ways that it could go wrong in a misleading way. For example, if the hierarchy had a typo in the OS Major Version layer, then all of those auto-discovered values would be misleading and wouldn't actually calculate out to the values listed when run on a real system. To be sure, that problem exists currently, but right now we aren't trying to tell the reader that we know the default because we've calculated it. To alleviate that, we'd need to print the original layer that resolved for each override too (as in the long overrides example), and that makes the information presented even worse! An example of that is in typo.md and it's 144 lines long all on it's own!

That said I am not opposed to someone taking this POC and turning it into a full PR. It would need to address the UX concerns here, especially the last one where data source paths that don't resolve properly lead to displaying misleading values in strings output.

ekohl commented 3 years ago

First off, the list becomes much longer. It's 71% longer. Not quite as noticeable with 7 parameters, but imagine that on puppetlabs-apache which has 86 parameters.

I agree with this but note that you can have expandable blocks in HTML. This means you can build this:

desktop_package_name

Name of the desktop package. Only set this if your platform is not supported or you know what you are doing.

Puppet default: 'open-vm-tools-desktop'

Hiera overrides

| Filename | Value | |----------|-------| | `data/Debian.Debian.7.yaml` | `true` | | `data/Debian.Debian.8.yaml` | `true` | | `data/Debian.Debian.9.yaml` | `true` | | `data/Debian.Ubuntu.14.yaml` | `true` | | `data/Debian.Ubuntu.16.yaml` | `true` | | `data/Debian.Ubuntu.18.yaml` | `true` | | `data/FreeBSD.FreeBSD.10.yaml` | `true` | | `data/FreeBSD.FreeBSD.11.yaml` | `true` | | `data/FreeBSD.FreeBSD.12.yaml` | `true` | | `data/RedHat.CentOS.6.yaml` | `true` | | `data/RedHat.CentOS.7.yaml` | `true` | | `data/RedHat.CentOS.8.yaml` | `true` | | `data/RedHat.Fedora.19.yaml` | `true` | | `data/RedHat.Fedora.20.yaml` | `true` | | `data/RedHat.Fedora.21.yaml` | `true` | | `data/RedHat.Fedora.22.yaml` | `true` | | `data/RedHat.Fedora.23.yaml` | `true` | | `data/RedHat.Fedora.24.yaml` | `true` | | `data/RedHat.Fedora.25.yaml` | `true` | | `data/RedHat.OracleLinux.6.yaml` | `true` | | `data/RedHat.OracleLinux.7.yaml` | `true` | | `data/RedHat.OracleLinux.8.yaml` | `true` | | `data/RedHat.RedHat.6.yaml` | `true` | | `data/RedHat.RedHat.7.yaml` | `true` | | `data/RedHat.RedHat.8.yaml` | `true` | | `data/Suse.OpenSUSE.11.yaml` | `true` | | `data/Suse.OpenSUSE.12.yaml` | `true` | | `data/Suse.OpenSUSE.13.yaml` | `true` | | `data/Suse.OpenSUSE.15.yaml` | `true` | | `data/Suse.OpenSUSE.42.yaml` | `true` | | `data/Suse.SLES.12.yaml` | `true` | | `data/Suse.SLES.13.yaml` | `true` | | `data/Suse.SLES.14.yaml` | `true` | | `data/Suse.SLES.15.yaml` | `true` |

(Note that I cheated a bit and got 2 non-matching ones just to show what can be done. Long lists can be hidden this way).

Second, there are many inconsistencies within the list. For example, on FreeBSD desktop_package_name is set to open-vm-tools-nox11 and on every other platform it's.... nothing? That seems odd, so I have to look at the code and there I see that all other platforms default to open-vm-tools-desktop. But what are those other platforms? As a user, I can extrapolate relatively easily, but for the computer to do it programatically, it would have to iterate all the platforms listed in metadata.json (35 of them)

Note that my code did this the other way around. It converts the Hiera hierarchy to regular expressions and then matches that to filenames. It can then translate that back to interpolation. No need for metadata.json. This may break down for very complex hierarchies but really, how complicated is it for most modules?

Third, and related to the note point_up, every parameter listed has a different list of the discovered overrides. That means that as a reader, I already have to mental post-processing of what I'm reading to figure out what it's actually saying. To me, that mental energy could be better spent with a skim through the data directory to see what overrides apply to the platforms you care about.

Just to show, I added an additional override for Debian 10:

openvmtools::desktop_package_name
  data/Debian.Debian.10.yaml => open-vm-tools2
    facts.os.family => Debian
    facts.os.name => Debian
    facts.os.release.major => 10
  data/Debian.yaml => open-vm-toolbox
    facts.os.family => Debian
  data/FreeBSD.yaml => open-vm-tools
    facts.os.family => FreeBSD

This means you can render it like this:

openvmtools::desktop_package_name

Hiera overrides in a detailed table

| Filename | Interpolations | Value | |----------|----------------|-------| | `data/Debian.Debian.10.yaml` | `facts.os.family` => `Debian`
`facts.os.name` => `Debian`
`facts.os.release.major` => `10` | `open-vm-tools2` | | `data/Debian.yaml` | `facts.os.family` => `Debian` | `open-vm-toolbox` | | `data/FreeBSD.yaml` | `facts.os.family` => `FreeBSD` | `open-vm-tools` |

Note that in the order it should respect the Hiera order.

So I certainly understand there are some UX concerns. It all depends on how clean you write your Hierarchy. For example, in openvmtools you could actually write some code like this (using load_module_metadata):

$metadata = load_module_metadata($module_name)
$supported = $metadata['operatingsystem_support'].any |$os| {
  $os['operatingsystem'] == $facts['operatingsystem'] and $facts['operatingsystemmajrelease'] in $os['operatingsystemrelease']
}

In fact, this would make a good function in itself. I've got distracted and submitted https://github.com/voxpupuli/puppet-openvmtools/pull/41. Perhaps the function deserves to be in stdlib, but gives a quicker turnaround time.

So in short: I do believe it is a challenge, but it's not too complicated. The forge probably needs to verify if it can indeed render details in a reference.

ekohl commented 3 years ago

I couldn't help myself and opened https://github.com/puppetlabs/puppet-strings/pull/273 which implements this.

b4ldr commented 3 years ago

seems to me we have three high level states which puppet-strings could report.

the defaults that are present in the manifest
the value one would get from hiera data with facts = {}
the default values for each supported OS

Currently puppet-strings does 1 the issue with this is it guides people to place defaults in the manifest which IMO (and one that seems to be shared by puppetlabs) is not the best place to put defaults. For me the main reason for this is that it reduces the full power of hiera and prevents one from doing things like:

common.yaml

unbound::service_name: unbound
unbound::restart_cmd: "systemctl restart %{lookup('unbound::service_name')}"

OpenBSD.yaml (i.e. we dont need to redefine service_name)

unbound::restart_cmd: "service restart %{lookup('unbound::service_name')}"

I think options 3 is the ideal state however as has been pointed out it comes with some UX issue (although ekohl work looks promising, assuming forge etc can support this). further i think there may be some edge case which will likely tickle a few bugs

however im not sure option 2 has been given much thought other then early on. This could be as simple as parsing the common.yaml file or some merge of hiera files which are not based on facts (this should be easy to ascertain from the hiera.yaml file). however i also wondered if we could just use the equivalent of loookup('openvmtools::desktop_package_name', {'default_value' => $DEFAULT_FROM_MANIFEST).

Assuming someone knows a simple way to call lookup from ruby* i think this looks like it would be a fairly easy thing to add to the current code and has the additional benefit that it:

encourages people to store puppet default in hiera
allows people to use the full force of heira
Lays the ground work to present per os defaults

The down side, as pointed out is that it doesn't necessarily present the true default for any one os and you still need to check the yaml files. However this is still no worse the the current situation (famous last words). As such i feel that we should not let the enemy be the good of perfect here (of course if ekohl PR gets merged this is all mute)

*ruby hiera lookup

It use to be fairly straight forward to hack around hiera lookups in ruby however im hitting a brick wall trying to do this using the hiera 5 puppet API's. below is what i have so far which dosn't work. may be going in the complete wrong direction. however i guess puppet-rspec hooks into the lookup function so i guess it should be achievable. anyway thought i would leave this here in case it sparks any more thoughts or feedback.

require 'puppet'
Puppet.initialize_settings
environment = Puppet::Node::Environment.create(:puppetstrings, ['modules'])
options = {classes: ['unbound'], facts: {}, environment: environment}
node = Puppet::Node.new('foo.example.org', options)
compiler = Puppet::Parser::Compiler.new(node)
hiera = Puppet::Pops::Lookup::ModuleDataProvider.new('unbound')
scope = Puppet::Parser::Scope.new(compiler)
invocation =  Puppet::Pops::Lookup::ScopeLookupCollectingInvocation.new(scope)
config = Puppet::Pops::Lookup::HieraConfig.create(invocation, Pathname.new('hiera.yaml'), hiera)
hiera.key_lookup_in_default('unbund::package_name', invocation, config)

And of course thanks for all the effort :)

binford2k commented 3 years ago

Yes, the Forge will display <details> tags. And if not, we'll fix it so it will. And all modern browsers will render it too.

The approach that @ekohl takes with #273 basically "reverse engineers" facts by correlating datasource paths and the interpolations from hiera.yaml. I think this approach is workable and we'll likely 🤞 accept the patch, though without reviewing it I can't promise that it won't need any work to get past the gate.

I'll add it to the schedule for our next Community Monday and get some eyeballs on it.

binford2k commented 3 years ago

@b4ldr if there's more work to be done, would you be interested in helping?

b4ldr commented 3 years ago

@b4ldr if there's more work to be done, would you be interested in helping?

sure i have taken a look at @ekohl branch and sent a commit which make some progress towards also parsing the common.yaml file and overriding the @register[:defaults] ill try to take another look later in the week

puppetlabs / puppet-strings