Token-substitution should apply to all attributes of a check

UnitedMarsupials-zz commented 8 years ago

Currently token-substitution is only applied to the command-attribute. This makes sense, because command is the only place, where arbitrary strings can be encountered.

The only place among standard attributes... However, there can be non-standard ones attached to a check. For example, I'm trying to add a Chart-attribute, that would include a link to Graphite-produced image:

        "NTP": {
            "command":  "check-ntp.rb  -w 2 -c 5 -s 4",
            "interval": 3677,
            "Chart":    "http://graphite.example.net/render/?width=512&height=360&from=-24hours&lineWidth=4&target=stats.:::name:::.ntpstats.offset&vtitle=NTP%20offset%20(milliseconds)&fname/NTP-offset-:::name:::.png",
            "subscribers":  [ "unix" ]
        }

It almost works -- the image is nicely embedded in Uchiwa's check-details:

Uchiwa chart sample

Unfortunately, the image contains no data, because the :::name::: tokens weren't processed by Sensu. I can, probably, write my own mutator to do it, but the right place for it seems to be in the lib/sensu/client/process.rb: loop through all of the check's fields and perform substitution on them before publish_check_results.

Possibly skipping the standard fields (interval, subscribers, handler). The actual coding seems simple enough -- would it be accepted, if I did it?

UnitedMarsupials-zz commented 8 years ago

Actually, perhaps, the right place for the token-substitution is not so obvious... Currently the client performs it every time the check runs. Would not it make more sense to perform the substitution once per check -- whenever one is either received from the server or loaded locally?

Are there tokens today, that change value from one check-execution to another?

Meanwhile, here is the diff I'm currently testing with -- it goes through all of the attributes of the check-hash modifying them in place. (Not sure yet, if it ought to work with a copy instead.) The function also uses some return-shortcuts -- making the main code less indented. One unncessary call to gettimeofday was removed.

--- sensu-0.23.2/lib/sensu/client/process.rb    2016-05-23 19:58:29.305266721 -0400
+++ sensu-0.23.2/lib/sensu/client/process.rb    2016-05-25 11:07:00.320186607 -0400
@@ -112,26 +112,28 @@
       def execute_check_command(check)
         @logger.debug("attempting to execute check command", :check => check)
-        unless @checks_in_progress.include?(check[:name])
-          @checks_in_progress << check[:name]
-          command, unmatched_tokens = substitute_tokens(check[:command], @settings[:client])
-          if unmatched_tokens.empty?
-            check[:executed] = Time.now.to_i
-            started = Time.now.to_f
-            Spawn.process(command, :timeout => check[:timeout]) do |output, status|
-              check[:duration] = ("%.3f" % (Time.now.to_f - started)).to_f
-              check[:output] = output
-              check[:status] = status
-              publish_check_result(check)
-              @checks_in_progress.delete(check[:name])
-            end
-          else
-            check[:output] = "Unmatched command tokens: " + unmatched_tokens.join(", ")
-            check[:status] = 3
-            check[:handle] = false
-            publish_check_result(check)
-            @checks_in_progress.delete(check[:name])
-          end
-        else
+        if @checks_in_progress.include?(check[:name])
           @logger.warn("previous check command execution in progress", :check => check)
+          return
+        end
+        @checks_in_progress << check[:name]
+        check.each do |key, value|
+          next unless value.is_a? String
+          check[key], unmatched_tokens = substitute_tokens(value, @settings[:client])
+          next if unmatched_tokens.empty?
+          check[:output] = "Unmatched " + key + " tokens: " + unmatched_tokens.join(", ")
+          check[:status] = 3
+          check[:handle] = false
+          publish_check_result(check)
+          @checks_in_progress.delete(check[:name])
+          return
+        end
+        started = Time.now.to_f
+        check[:executed] = started.to_i
+        Spawn.process(check[:command], :timeout => check[:timeout]) do |output, status|
+          check[:duration] = ("%.3f" % (Time.now.to_f - started)).to_f
+          check[:output] = output
+          check[:status] = status
+          publish_check_result(check)
+          @checks_in_progress.delete(check[:name])
         end
       end

portertech commented 8 years ago

@UnitedMarsupials I think this idea is great :+1: applying token substitution to any string check attribute value (top-level only) is appropriate, e.g. next unless value.is_a?(String). Your proposed changes look pretty good, however, the Sensu code base avoids the use of return when any async code is present, hence the conditional wrapping. We should probably use string interpolation instead of concatenation, e.g. "Unmatched '#{key}' tokens: #{unmatched_tokens.join(", ")}".

portertech commented 8 years ago

Just a note, we recently (0.23.0) added token substitution to filter eval attribute values https://github.com/sensu/sensu/blob/master/CHANGELOG.md#0230---2016-04-04 so we've already applied this logic elsewhere :+1:

portertech commented 8 years ago

@UnitedMarsupials are you able to submit a pull request with your changes this week?

UnitedMarsupials-zz commented 8 years ago

Thank you for the quick reaction, @portertech!

Sensu code base avoids the use of return

Huh? There must be some deep philosophical reasoning to this rule. To me such code (and I have seen this principle applied to various languages) has always been less readable and looked messier due to the increased indentation.

when any async code is present

Ah, is this to guard against people forgetting to "unlock" things? Not sure, if it is justified -- this function, for example, was already deleting the check from the list of checks in progress -- could as well have returned right there :-)

Your proposed changes look pretty good

Thanks for the compliments, but what about my question about the right place to perform these substitutions? If there are no tokens, that change from one check-execution to another, why not run the substitution once when the check is loaded or received from server? The current place is in a fairly tight loop -- whatever can be moved away from it, should be...

Unless, of course, there are tokens that can legitimately change from run to run -- for example, I'd be interested in seeing :::starttime::: and :::endtime::: added (in a format, that Graphite can understand).

We should probably use string interpolation instead of concatenation

A matter of style I suppose :) I don't have a preference here.

any string check attribute value (top-level only) is appropriate, e.g. next unless value.is_a?(String)

I'm not sure, it should be limited to strings, actually. I'm doing it now, because the current implementation of substitute_tokens can only handle strings. But some day a use-case may be encountered for numeric token-substitution, for example. We can deal with it then...

are you able to submit a pull request with your changes this week?

If that will make it into the 0.23.3, I shall!

portertech commented 8 years ago

re return

Many/most methods deal with blocks/callbacks and use wrapping conditionals with yield(), as many async non-blocking calls are made throughout the code base, I would prefer to keep the use of return to an absolute minimum.
re caching

It's possible for check request payloads to include changes to a definition. Any proper caching solution would probably end up being overly complicated to provide the necessary benefit.
re string interpolation

There's actually a very slight performance benefit :P And we started to change the style to interpolation last year :+1:
re strings

I think targeting only strings is fine :+1:
re release

Your changes would make it into 0.24, the next release (next week) \o/

UnitedMarsupials-zz commented 8 years ago

It's possible for check request payloads to include changes to a definition. Any proper caching solution would probably end up being overly complicated to provide the necessary benefit.

I first looked into this code today, so I'm happy to take your word for it. However... It would seem, there are only two places, where check-definition may change:

When loaded locally.
When received from remote.

It would seem, one familiar with the code could add a token-substituion call to both of these places removing it from execute_check_command. The benefit is substantial -- the proper substitution involves walking through a Hash and making changes to it. The speed-up may not be perceptible to humans (who can not distinguish a microsecond from a millisecond anyway), but for a large number of checks with many different fields...

Multiplied by 1000, for example, a microsecond becomes a still-imperceptible millisecond, but a millisecond turns into a very perceptible second...

Anyway, we can revisit this later.

Your changes would make it into 0.24, the next release

Woo-hoo!

calebhailey commented 8 years ago

I love this idea. I tried to convince @portertech to do this a long time ago, but I think it was too soon. 👍

UnitedMarsupials-zz commented 8 years ago

Ok, how about this patch instead? It goes through not just the top-level strings, but all fields of the check -- possibly diving into sub-hashes and lists (arrays):

--- lib/sensu/client/process.rb 2016-05-23 19:58:29.305266721 -0400
+++ lib/sensu/client/process.rb 2016-05-26 15:16:11.541535107 -0400
@@ -100,4 +100,35 @@
       end

+      # Perform token-substitution for a object. Strings
+      # are passed to substitute_tokens(), arrays and sub-hashes are
+      # processed recursively. Numbers (and other types?) are skipped
+      # at the moment, because substitute_tokens() can not handle them
+      # anyway.
+      #
+      # @param [Object]
+      # @return        [Object] updated object of the same type as the argument
+      #        plus accumulated list of unmatched tokens
+      def obj_substitute_tokens(obj)
+        case obj
+        when Hash
+          unmatched_tokens = []
+          obj.each do |key, value|
+            obj[key], unmatched = obj_substitute_tokens(value)
+            unmatched_tokens.push(*unmatched)
+          end
+        when Array
+          unmatched_tokens = []
+          obj.map! {|value|
+            value, unmatched = obj_substitute_tokens(value)
+            unmatched_tokens.push(*unmatched)
+            value
+          }
+        when String
+          obj, unmatched_tokens = substitute_tokens(obj, @settings[:client])
+        end
+        unmatched_tokens.uniq! if unmatched_tokens
+        [obj, unmatched_tokens]
+      end
+
       # Execute a check command, capturing its output (STDOUT/ERR),
       # exit status code, execution duration, timestamp, and publish
@@ -114,9 +144,9 @@
         unless @checks_in_progress.include?(check[:name])
           @checks_in_progress << check[:name]
-          command, unmatched_tokens = substitute_tokens(check[:command], @settings[:client])
+          check, unmatched_tokens = obj_substitute_tokens(check)
           if unmatched_tokens.empty?
-            check[:executed] = Time.now.to_i
             started = Time.now.to_f
-            Spawn.process(command, :timeout => check[:timeout]) do |output, status|
+            check[:executed] = started.to_i
+            Spawn.process(check[:command], :timeout => check[:timeout]) do |output, status|
               check[:duration] = ("%.3f" % (Time.now.to_f - started)).to_f
               check[:output] = output

Tested with the following check-definition her:

"SSLcerts": {
       "Marsupials": [ "wombat", ":::predator|tasmanian tiger:::", ":::name:::" ],
       "command":      "check-ssl-certs.rb /etc/pki/tls",
       "interval":     86423,
       "Description":  "Find all SSL-certificates -- individual files and bundles -- under /etc/pki/tls on :::name::: and check their expiration.",
       "subscribers":  [ "unix" ]
},

Note, how the substitution is done in a separate method entirely (should it live in utilities.rb, perhaps?). When somebody becomes ready to move the substitutions out of the check-execution loop, the method will be ready for him...

portertech commented 8 years ago

@UnitedMarsupials I was hoping to go with your original change proposal, as a first step. I would prefer to introduce top-level string only substitution, observe client impact and use cases.

UnitedMarsupials-zz commented 8 years ago

Is that a show-stopper? I'd say, it is just as inconsistent to limit token-substitution to top-level strings only, as it was to limit it to command-attribute before...

portertech commented 8 years ago

@UnitedMarsupials not a show-stopper, I am warming up to you. Made some comments on the PR, do not worry about squashing commits etc, I am all for verbose.

Gillingham commented 8 years ago

This would be very useful for things like the ability to use a variable to define the source attribute of checks, I have several checks that can run on pairs of servers that need to specify a source that represents the combined service of them. Right now I have to define a check for each "venue" instead of just being able to use "source": ":::venue:::" or similar on a single check.

UnitedMarsupials-zz commented 8 years ago

@Gillingham, sir, would you mind expanding on your idea? I need to monitor a bunch of systems via SNMP and am struggling to figure out the best way to do it...

I gather, they ought to be "proxy clients" with the actual checks running on the Sensu-server (for simplicity). How do I express it in Sensu config-file(s)? I'd rather not write (and forever maintain) my own daemon for it...

Gillingham commented 8 years ago

@UnitedMarsupials I just have a roundrobin subscription setup that has access to the service I need to monitor, then the check that runs against that will only run on a single host, and be tied to that proxy client with the source attribute of the check. So one actual client is chosen from the roundrobin pool and the check results ends up associated with the proxy client. The source attribute is all you really need.

UnitedMarsupials-zz commented 8 years ago

@Gillingham, thanks, but could you post a snippet of your config-file(s)? I'm fairly new to Sensu, believe it or not...

BTW, your idea will not work even if my change is merged in -- an attempt to use "source": ":::foo:::" gets rejected by some validator at start-up... The source-attribute must be alphanumeric...

@portertech, can I relax (or remove) the validation?

portertech commented 8 years ago

@UnitedMarsupials token substitution in required or strict attribute values can cause havoc, so we will not change validation. E.g. in progress checks (name), unmatched source token -> empty client name.

portertech commented 8 years ago

Let's run with check attribute value token substitution with the current validation constraints and see how it goes. We can certainly revisit things after some use.

portertech commented 8 years ago

@Gillingham we can revisit check source validation in regards to tokens for the following release (0.25). Special characters and empty strings are known to break several pieces of Sensu.

portertech commented 8 years ago

I merged @UnitedMarsupials's substitution implementation into the 0.24 release :+1:

scosist commented 8 years ago

@UnitedMarsupials I was hoping you could tell me how you populated all the client attributes like Architecture, CPU count, etc. (all the attributes before _id) as seen in the uchiwa screenshot in your first post.

Are they statically assigned or are the attribute values populated dynamically?

UnitedMarsupials-zz commented 8 years ago

@scosist -- all these attributes are Puppet's "facts" about the machines. We use Puppet to generate /etc/sensu/config.json on each machine from a template:

        "client": {
                "name": "<%= hostname %>",
                "address":      "<%= ipaddress %>",
                "Location":     "<%= location %>",
                "Owner":        "<%= owner_name %>",
                "Manufacturer": "<%= manufacturer %>",
                "Architecture": "<%= hardwaremodel %>",
                "MAC address":  "<%= macaddress %>",
                "RAM size":     "<%= memorysize %>",
                "Swap size":    "<%= swapsize %>",
                "CPU count":    "<%= processorcount %>",
                "OS":   "<%= operatingsystem %>-<%= operatingsystemrelease %>",
                "UUID": "<%= uuid %>",
                "Puppet":       "<%= puppetversion %>",
                "subscriptions": [
                        "unix"
                ]
        }

The beauty of Sensu is that whatever attributes it does not recognize itself are treated as simply "user-data" -- passed back and forth (to handlers, etc.). And Uchiwa will display them all -- and even arrange for images and iframes to be displayed nicely if you use those.

scosist commented 8 years ago

Ah, of course, I hadn't considered config mgmt handling those. @UnitedMarsupials Thank you for the detailed explanation. I'm aware of that article you linked to as I have unsuccessfully attempted to include an iframe in a client config. I left a comment hoping someone with the same issue might have arrived at a solution they could share.

Using "grafana_test_iframe": "iframe:http://example.url/provided/by/grafana/iframe/src" only results in grafana_test_iframe {} on the dashboard.

Apologies, I don't mean to cross post especially on a closed thread. I should probably open a new issue...

UnitedMarsupials-zz commented 8 years ago

Uhm, gentlemen? I don't think, this improvement is working... I got my clients updated to 0.25.4 and the JSON coming back from them contains unexpanded :::name:::.

I can not reopen this ticket, but it should be reopened...

calebhailey commented 8 years ago

@UnitedMarsupials I wonder if you're experiencing what's described in this issue, which is fixed in 0.25.5: https://github.com/sensu/sensu/issues/1360

UnitedMarsupials-zz commented 8 years ago

Yes, going up to 0.25.5 seems to have fixed things... Thanks.

calebhailey commented 8 years ago

@UnitedMarsupials glad to hear it! Sorry for the confusion. :-)

sensu / sensu

Token-substitution should apply to all attributes of a check #1281