puppetlabs / puppetlabs-stdlib

Puppet Labs Standard Library module
http://forge.puppetlabs.com/puppetlabs/stdlib
Apache License 2.0
348 stars 580 forks source link

Add Ruby function stdlib::cmp_hash #1376

Closed XMol closed 1 year ago

XMol commented 1 year ago

Summary

Add a new function for comparing two (Puppet) Hashes.

Additional Context

To the best of my knowledge, Puppet has no means to compare two Hashes to each other. That is, not unless additional steps are taken to transform them into something that Puppet actually can compare directly. The new function stdlib::cmp_hash should simplify that.

Checklist

CLAassistant commented 1 year ago

CLA assistant check
All committers have signed the CLA.

XMol commented 1 year ago

Hello reviewers,

I need some help with a specific spec test:

stdlib::cmp_hash raise exception in case lambda expects wrong number of arguments is expected to run stdlib::cmp_hash({}, {}) and raise an ArgumentError with the message matching /rejected:\ block\ expects\ 1\ argument,\ got\ 2/

Failure/Error:
  expect(subject).to run \
    .with_params({}, {}) \
    .with_lambda { |_, _| 1 } \
    .and_raise_error(ArgumentError, Regexp.new(Regexp.escape('rejected: block expects 1 argument, got 2')))

  expected stdlib::cmp_hash({}, {}) to have raised ArgumentError matching /rejected:\ block\ expects\ 1\ argument,\ got\ 2/ instead of returning 0

My intention is to make sure that an exception is thrown when the lambda supplied to cmp_hash would accept more than exactly one argument. The spec test with zero lambda arguments did raise an exception according to expectations. puppet apply and ruby both rate this as an error, too.

$ puppet --version
7.23.0

$ puppet apply -e 'notify { stdlib::cmp_hash({}, {}) |$_, $_| { 1 }:  }'
Error: Evaluation Error: Error while evaluating a Function Call, The function 'stdlib::cmp_hash' was called with arguments it does not accept. It expects one of:
  (Hash hash1, Hash hash2)
    rejected: does not expect a block
  (Hash hash1, Hash hash2, Callable[1, 1] block)
    rejected: block expects 1 argument, got 2 (line: 1, column: 10) on node localhost

$ ruby --version
ruby 2.7.6p219 (2022-04-12 revision c9c2245c0a) [x86_64-linux]

$ ruby -e 'l2 = ->(_, _) { 1 }; l2.call("spam")'
Traceback (most recent call last):
        1: from -e:1:in `<main>'
-e:1:in `block in <main>': wrong number of arguments (given 1, expected 2) (ArgumentError)

Does with_lambda() in the Puppet rspec behave differently?

Once this (hopefully final) issue is resolved, I'll squash-rebase the new commits for this pull request.

XMol commented 1 year ago

Hi all,

I won't be able to react to any updates till July 17th. Therefor I've disabled the troublesome spec test and squashed all my commits for this PR. The CI workflow I ran manually still fails in the Acceptance phase though, which I don't see related to my changes. Again, I need assistance from somebody else who is more experienced with the automated tests.

Regarding the pending signature of the CLA, my employer has no decisive opinion on that, so I'm still trying to get a final decision. In case that is a strong requirement before the merge, I'll continue that when I've returned.

smortex commented 1 year ago

To the best of my knowledge, Puppet has no means to compare two Hashes to each other.

Hum :thinking:

$ puppet apply -te 'warning({ a => { b => c }} == { a => { b => c }})' 
Warning: Scope(Class[main]): true
[...]
$ puppet apply -te 'warning({ a => { b => c }} == { a => { b => ce }})'
Warning: Scope(Class[main]): false
[...]

Do you have a specific use case in mind?

XMol commented 1 year ago

@smortex, sorry for the unclear phrasing. Puppet can tell whether two Hashes are identical or not, but it cannot sort them (lesser and greater relations are undefined for Hashes). Even the build-in compare() function cannot handle Hashes. So I implemented this comparison function to be able to sort an Array of Hashes.

$ puppet apply -te 'warning({ a => b } < { b => c})'
Error: Evaluation Error: Comparison of: Hash < Hash, is not possible. Caused by 'Only Strings, Numbers, Timespans, Timestamps, and Versions are comparable'. (line: 1, column: 28)
smortex commented 1 year ago

@XMol, oh! I understand!

But still it does not make sense to me, do you have a practical use cases for this? Because I can't imagine a situation where we may want to turn an array of hash into another like this :thinking: :

# from
[
  { a => b },
  { c => d },
  { e => f, a => c },
  { f => g, c => a},
]

# to
[
  { a => b },
  { e => f, a => c },
  { f => g, c => a },
  { c => d },
]

Ruby Hash does not implement Comparable and the <=> operator is the one form Object that returns 0 if the hashed are the same or equal, and nil otherwise. This is currently what Puppet does with its == operator. Ruby also provide the >=, >, < and <= operators to test if a hash is a subset or a superset of another, and that we do not have in Puppet. IMHO this would makes more sense than returning information about ordering of a hidden transformed representation of the hashes :thinking:.

XMol commented 1 year ago

Admittedly @smortex, our use-case was quite niche. If it wasn't, I'm sure Puppet or stdlib would provide some method that would accommodate it in some way.

Yes, using <=> in Ruby to compare Hashes falls back to Object's operator. But the point of cmp_hash() is to compensate for that! Without a block, cmp_hash() will transform the Hashes to Arrays and then use <=>, which is well-defined. With a block, hopefully something will be yielded that has a properly implemented <=>... 🤪

Ruby also provide the >=, >, < and <= operators to test if a hash is a subset or a superset of another, and that we do not have in Puppet. IMHO this would makes more sense than returning information about ordering of a hidden transformed representation of the hashes

Sure, Puppet then can implement these operators for Hashes. In my opinion, the proposed function is distinctly different from them. And if you wanted to know whether a Hash is a subset of another, the in operator would be much more intuitive for that (as of now, it only can do String in Hash).

I also don't see a reason why you would want to reorder your example like that. But I did find a way to accomplish it! 😄

$ puppet apply -te '
warning(
  sort(
    [
      { "a" => "b" },
      { "c" => "d" },
      { "e" => "f", "a" => "c" },
      { "f" => "g", "c" => "a" },
    ]
  ) |$h1, $h2| {
    stdlib::cmp_hash($h1, $h2) |$h| {
      $h["a"].lest || { "z${h["c"]}" }
    }
  }
)
'
Warning: Scope(Class[main]): [{a => b}, {e => f, a => c}, {f => g, c => a}, {c => d}]

Another example from my work environment would be data input from Puppet's ENC, which actually are Arrays of Hashes, like the designated configuration of the network interfaces of the node. Or the actual use-case we had, which is the configuration of multipath. For a different contrived example, maybe someone wanted to order the mounted devices according to their used space?

Hash(Array($::mountpoints).sort |$mnt1, $mnt2| { stdlib::cmp_hash($mnt1[1], $mnt2[1]) |$mnt| { $mnt["available_bytes"] } })

It is perfectly fine for me to dismiss the pull request with the reasoning that nobody can envision a high demand for it. I felt the same, which is why I had published this function only as a community snippet at first.

smortex commented 1 year ago
Non-important details about by weird example > I also don't see a reason why you would want to reorder your example like that. But I did find a way to accomplish it! smile Oh, I did not intended to sort like that, I was not understanding where this was going so I tried with some sample data and this was the result of my experiment. Since it did not make more sense, I was still confused… hence the request for real world examples :grinning:

Another example from my work environment would be data input from Puppet's ENC, which actually are Arrays of Hashes, like the designated configuration of the network interfaces of the node.

Hum, I still don't see how this help: sorting arrays of hashes is already straightforward:

$data = [
  { size => 3 },
  { size => 1 },
]
$data.sort |$a, $b|  { compare($a.dig('size'), $b.dig('size')) } #=> [{size => 1}, {size => 3}]

Your comparison function avoid the code duplication, let's put this on the side for now and I will write about it at the end :wink:


Or the actual use-case we had, which is the configuration of multipath.

Unfortunately I have no experience with this and can't imagine the issue you want to solve :cry:.


For a different contrived example, maybe someone wanted to order the mounted devices according to their used space?

Hash(Array($::mountpoints).sort |$mnt1, $mnt2| { stdlib::cmp_hash($mnt1[1], $mnt2[1]) |$mnt| { $mnt["available_bytes"] } })

Ah! This example makes sense to me!!! :grinning: Thank you!

Then adding support for Hash to puppet's sort() would be even more awesome I think:

# Proposal 1
$facts['mountpoints'].sort |$a, $b| { compare($a.dig(1, 'used_bytes'), $b.dig(1, 'used_bytes')) }

This also work without a block and sort hashes by key:

# Proposal 1
{
  a => 1,
  c => 2,
  b => 3,
}.sort      #=> {a => 1, b => 3, c => 2}

At this point, the main difference with this and the function you provide is that yours does not do the actual comparison, but yield to a value that is compared internally. This is equivalent to the Enumerable#sort_by function of Ruby for which Puppet has currently no support, but this would definitively make sense as stdlib::sort_by()!

# Proposal 2
$facts['mountpoints'].stdlib::sort_by |$m| { $m.dig(1, 'used_bytes') }

I think we are on the good way, did I still completely fail to understand the problem, or do the above proposals (add Hash support to sort and add a sort_by method) help with your problem in a more generic way? If I am still missing something, please add sample data to show the actual issue you have and what you want to get: I feel like as it is now, this PR helps for some use cases but some generalization is possible.

XMol commented 1 year ago

Hi @smortex, sometimes going through longer discussions can be quite enlightening - for both parties! 😃


sorting arrays of hashes is already straightforward

That's a great solution with current methods that Puppet offers! It actually does the very same as cmp_hash(): comparing two values derived somehow from two given Hashes. So this solution alone would be enough for me to retract the pull request.


I left out all the details regarding configuration of multipath, because that is completely homebrew. I.e. we could avoid the need for comparing Hashes by rewriting other bits of code. So that is a weak argument, but explains our motivation.


Enabling sort() to order the keys of a Hash would be nice to have, but in my honest opinion, would be useful only for esthetic reasons, like dumping Hashes to terminals or files. In which case, one can already do Hash(sort(Array($hash))).


Agreed, sort_by() would be the generalized form of cmp_hash(), that can sort iterables, instead of comparing just two Hashes. Your second proposal with sort_by() is the extension of build-in sort() with a block - perfect!

XMol commented 1 year ago

Closing this pull request, in favor of a new one that wants to add stdlib::sort_by.

Implementing that new function taught me quite a lot of things!

Now I'm struggling with the spec tests again, but that can be discussed in the upcoming new PR.