trainline / consul-deployment-agent

Cross-platform deployment agent for Environment Manager
Other
1 stars 0 forks source link

Allow Sensu health checks defined as PowerShell scripts #7

Open Merlin-Taylor opened 7 years ago

Merlin-Taylor commented 7 years ago

Given that you have defined the following in healthchecks.yml:

 some-service-is-running:
    name: some-service-is-running
    local_script: is-running.ps1
    interval: 10

the check that will be generated by CDA will look like:

  "checks": {
    "order-notification-service-is-running": {
      "command": "C:\\temp\\e3c9cfb0-f92c-11e6-be86-8389fd3ece3c\\archive\\healthchecks\\sensu\\is-running.ps1",
      "interval": 10
    }
  }
}

when running this check, Sensu client will always timeout and eventually, by the look of it, stop. this is because Sensu client doesn't know how to run PowerShell scripts out of the box. looking at other system checks, e.g. consul-deployment-agent, we need to define the check as follows for it to run properly:

  "checks": {
    "order-notification-service-is-running": {
      "command": "powershell.exe -NonInteractive -NoProfile -ExecutionPolicy Bypass -file \"C:\\temp\\e3c9cfb0-f92c-11e6-be86-8389fd3ece3c\\archive\\healthchecks\\sensu\\is-running.ps1\"",
      "interval": 10
    }
  }
}

now, because of the way we have to define checks using the yaml file specification, i.e. local_script or server_script + script_arguments, it means that Windows users cannot include the powershell.exe -NonInteractive -NoProfile -ExecutionPolicy Bypass -file bit of the command as this won't match a filename on disk. this validation is done in CDA before registering a check.

jeanml commented 7 years ago

Documenting a few options discussed on Slack:

1) Not supporting PowerShell scripts at all, i.e. wrap up the execution in a Windows batch file that is natively supported by Sensu [This is currently the workaround I am using] 2) Write some kind of twisted logic in the deployment agent to check whether a script is a PowerShell script and add powershell.exe -NonInteractive -NoProfile -ExecutionPolicy Bypass -file (or something along those lines) to the command when creating the check definition. 3) Add an extra property, say script_type in healthchecks.yml, where values can be bash or powershell or windows-cmd and use that to figure out if the check definition's command requires the command prefix mentioned in 2) 4) By convention, only support PowerShell scripts on Windows, and use the platform property set during deployments (https://github.com/trainline/consul-deployment-agent/blob/master/agent/deployment.py#L25) to figure out what to do. if linux, do nothing. if windows, add the command prefix mentioned above. If the script isn't a PowerShell script, checks are likely to fail and create enough noise that development teams will be able to fix their issue and comply with the convention. 5) Stop messing about too much with Sensu check definitions in the deployment agent and let us specify the actual command in healthchecks.yml. This is at the expense of not checking if the script you are trying to run exists or not. Like in 4), should the script to execute not being there, it should create enough noise for development teams to fix.

jeanml commented 7 years ago

My personal opinion is that we should go for options 4 or 5.

Option 4:

Option 5:

Also, I have a slight concern about supporting PowerShell only, as some of the Sensu Windows plugins are Ruby scripts (https://github.com/sensu-plugins/sensu-plugins-windows). That being said, there seem to be a PowerShell equivalent available as well. My other concern is using other Sensu plugins, e.g. https://github.com/sensu-plugins/sensu-plugins-aws, written in Ruby. However, it looks like all of them need to run on Linux machines so there might not be any need to support Ruby plugins on Windows, unless we port them to PowerShell if need be.