puppetlabs / ruby-pwsh

A ruby gem for interacting with PowerShell
MIT License
15 stars 21 forks source link

Array of Hashes comparison result in a change at each run #327

Open Clebam opened 3 weeks ago

Clebam commented 3 weeks ago

Describe the Bug

When setting a resource that have a parameter with an Array of Hash, it compares with every entry of the hash, even though they are optional. This results in a perma-change (each run sets the resource again and again)

Expected Behavior

The values in the Array of Hash should not be set at each run

Steps to Reproduce

Module : webadministration dsc Resource : dsc_website (assuming you have already set a pool to host the website)

  dsc_website { 'machin.fr':
      dsc_ensure          => 'Present',
      dsc_name            => 'machin.fr',
      dsc_physicalpath    => 'd:/www/sites/machin.fr',
      dsc_applicationpool => 'machin.fr',
      dsc_bindinginfo      => [{
            hostname           => 'machin.fr',
            protocol           => 'http',
            port               => 80,
      }],
    }

The part that is interesting is dsc_bindinginfo The run (each run in fact) results with this

Notice: /Stage[main]/Main/Node[default]/Dsc_website[machin.fr]/dsc_bindinginfo: dsc_bindinginfo changed [
  {
    'bindinginformation' => '*:80:machin.fr',
    'certificatestorename' => undef,
    'certificatesubject' => undef,
    'certificatethumbprint' => undef,
    'hostname' => 'machin.fr',
    'ipaddress' => '*',
    'port' => 80,
    'protocol' => 'http',
    'sslflags' => '0'
  }] to [
  {
    'hostname' => 'machin.fr',
    'protocol' => 'http',
    'port' => 80
  }] (corrective)
Notice: dsc_website[{:name=>"machin.fr", :dsc_name=>"machin.fr"}]: Updating: Finished in 1.32 seconds

Environment

Additional Context

  1. There is an easy fix, but I do not find it acceptable. (It is using validation_mode: resource). It does resolve the issue but the execution time skyrockets from 0.1 sec to 1.5 sec. per website. Moreover, I think it concerns each parameter in every resources that has mof_is_embedded: true So having each resource taking a full second, you end up easily with 5 min runs when you handle many resources. So I hope there is an alternative

  2. So here is the type definition of dsc_bindinginfo

    dsc_bindinginfo: {
      type: "Optional[Array[Struct[{
              sslflags => Optional[Enum['0', '1', '2', '3']],
              certificatestorename => Optional[Enum['My', 'my', 'WebHosting', 'webhosting']],
              certificatethumbprint => Optional[String],
              hostname => Optional[String],
              certificatesubject => Optional[String],
              bindinginformation => Optional[String],
              port => Optional[Integer[0, 65535]],
              ipaddress => Optional[String],
              protocol => Enum['http', 'https', 'msmq.formatname', 'net.msmq', 'net.pipe', 'net.tcp']
            }]]]",
      desc: "Website's binding information in the form of an array of embedded instances of the DSC_WebBindingInformation CIM class.",
    
      mandatory_for_get: false,
      mandatory_for_set: false,
      mof_type: 'DSC_WebBindingInformation[]',
      mof_is_embedded: true,
    },

The sub parameters are almost all optional. I did try to specifiy them all by the way, but I can't specify undef for certificatestorename :( (and this would be a poor workaround)

I this the handling of this type of resources is link to mof_is_embedded: true (with a dedicated handling for pscredential)

I tried to debug dsc_base_provider.rb to understand what is causing this issue, but could not figure it out yet. :/

Clebam commented 3 weeks ago

Hi,

I spent some time trying to find a way to handle this. I would suggest a piece of code that is best effort

Code location : https://github.com/puppetlabs/ruby-pwsh/blob/main/lib/puppet/provider/dsc_base_provider/dsc_base_provider.rb#L391

# Special handling for CIM Instances
if data[type_key].is_a?(Enumerable)
  downcase_hash_keys!(data[type_key])
  munge_cim_instances!(data[type_key])
  if data[type_key].respond_to?('each') and name_hash[type_key].respond_to?('each')
    if data[type_key].length() == name_hash[type_key].length()
      name_hash_merged=name_hash[type_key].zip(data[type_key]).map do |hash1, hash2|
        hash1 ||= {}
        hash2.merge(hash1 || {})
      end
      name_hash_merged_sorted = recursively_sort(name_hash_merged)
      data_sorted = recursively_sort(data[type_key])
      if name_hash_merged_sorted == data_sorted
        name_hash[type_key] = name_hash_merged_sorted
      end
    end
  end
end

Current behavior (without this piece of code)

If the INPUT array of hash is not strictly the same as the one retrieved (each key, each value) => There is a change This cause always a permanent change at each run.

Behavior with the piece of code

Case 1 : Ressources are different length

==> Run as current behavior

Case 2 : Ressources are same length but after merge, ressources are still different

==> Run as current behavior

Case 3 : Ressources are same length and are equal after merge

==> Replace current declaration (name_hash) with the merge. This ends up considering ressource are equal and no changes are done Note : This relies on the fact that gathered data are retrieved in the same order as they are written. Otherwise you fall down to Case 2

This implies that [name: john, surname: doe] will overwrite [surname: doe, name: john]. But if the system reads in the order it was written, it will not redo this on subsequent runs.

In my opinion it is not perfect but still better than always rewriting the resource. And if you fall in a case where you always fall on case 2, well, it is the current behavior anyway and you're doomed to use "validation_mode => resource"