rundeck / docs

Rundeck documentation
https://github.com/rundeck/rundeck
68 stars 144 forks source link

result.reason returning Unknown instead of NoMatchedNodes #318

Open remixtj opened 8 years ago

remixtj commented 8 years ago

I created a job with a job reference inside. On the job reference i overridden the filter with another. The filter i entered returns an empty set of nodes. I added a simple error handler to catch the error and print the error reason. In the anvils demo the filter i entered is tags: www+db which returns an empty set. The error handler simply does echo ${result.reason}.

Expected result:

echo ${result.reason} output is NoMatchedNodes

Obtained result:

echo ${result.reason} output Unknown

Samples

No nodes matched for the filters: NodeSet{includes={tags=www+db, dominant=false, }}
Failed dispatching to node app1.anvils.com: com.dtolabs.rundeck.core.execution.workflow.steps.StepException: No nodes matched for the filters: NodeSet{includes={tags=www+db, dominant=false, }}
Unknown
Execution failed: 4: [Workflow result: , step failures: {1=Dispatch failed on 1 nodes: [app1.anvils.com: Unknown: com.dtolabs.rundeck.core.execution.workflow.steps.StepException: No nodes matched for the filters: NodeSet{includes={tags=www+db, dominant=false, }}]}, Node failures: {app1.anvils.com=[Unknown: com.dtolabs.rundeck.core.execution.workflow.steps.StepException: No nodes matched for the filters: NodeSet{includes={tags=www+db, dominant=false, }}]}, flow control: Continue, status: failed]

Countercheck

I did also a test to check if the my usage of ${result.reason} variable is correct. the error handler remained the same and inserted a correct filter on the called job. The called job executes an exit 1, so is an always failing job. In this case the value of ${result.reason} correctly becomes JobFailed.

Countercheck sample output:

Remote command failed with exit status 1 Failed: NonZeroResultCode: Remote command failed with exit status 1 Failed: JobFailed: Job [TEST/failing job] failed JobFailed Execution failed: 6: [Workflow result: , step failures: {1=Dispatch failed on 1 nodes: [app1.anvils.com: JobFailed: Job [TEST/failing job] failed]}, Node failures: {app1.anvils.com=[JobFailed: Job [TEST/failing job] failed]}, flow control: Continue, status: failed]

Sample jobs

These are the jobs i created on anvils-demo. You can import and run immediately to check the issue.

fc5429cb-5ec9-4f49-bd37-30a140be6a92.yaml.txt 896799cb-1990-42f2-b17c-fb762f4b1f0a.yaml.txt

puremourning commented 8 years ago

We had the same problem. If you set the step as a Workflow step, rather than a node step, this works.

FYI this is the "error handler" we use with continue on success set:

- name: NoMatchedNodes_Ok
  project: name
  loglevel: INFO
  options:
    reason:
      required: true
      value: main
      description: "The value of result.reason"
  sequence:
    keepgoing: false
    strategy: node-first
    commands:
    - script: |-
        echo "Failure code: @option.reason@ (ignoring: NoMatchedNodes)"
        test "@option.reason@" = "NoMatchedNodes"
      nodeStep: true
      description: Looking for failure reason to be NoMatchedNodes
  description: Node filter failed to match - Do not fail
  group: ErrorHandler

and called like this:

      errorhandler:
        jobref:
          group: ErrorHandler
          name: NoMatchedNodes_Ok
          args: -reason ${result.reason}
        keepgoingOnSuccess: true

it doesn't work if nodeStep: true is set for the job.

remixtj commented 8 years ago

I implemented a simpler error handler. I set as error handler a local script called rundeck-errorhandler.sh that is placed on the rundeck server.

The script is called in this way:

/usr/local/bin/rundeck-errorhandler.sh ${result.message}

#!/bin/bash
ERROR_MSG="$1"
ERRORS_HANDLED="(No nodes matched)"
echo $ERROR_MSG | grep -E $ERRORS_HANDLED

An example job:

<joblist>
  <job>
    <description></description>
    <dispatch>
      <excludePrecedence>true</excludePrecedence>
      <keepgoing>true</keepgoing>
      <rankOrder>ascending</rankOrder>
      <threadcount>30</threadcount>
    </dispatch>
    <executionEnabled>true</executionEnabled>
    <group>TEST/patch_v2</group>
    <id>faaec3dc-049b-47cd-ab9b-1b916e2422a8</id>
    <loglevel>INFO</loglevel>
    <name>_TEMPLATE Patching TEST error handler</name>
    <nodefilters>
      <filter>.*</filter>
    </nodefilters>
    <nodesSelectedByDefault>false</nodesSelectedByDefault>
    <scheduleEnabled>true</scheduleEnabled>
    <sequence keepgoing='false' strategy='node-first'>
      <command>
        <errorhandler keepgoingOnSuccess='true'>
          <node-step-plugin type='localexec'>
            <configuration>
              <entry key='command' value='/usr/local/bin/rundeck-errorhandler.sh "${result.message}"' />
            </configuration>
          </node-step-plugin>
        </errorhandler>
        <jobref group='TEST/patch_v2' name='Before - Physical Machine' nodeStep='true'>
          <nodefilters>
            <filter>tags: is_virtual=false name: ${node.hostname}</filter>
          </nodefilters>
        </jobref>
      </command>
    </sequence>
    <uuid>faaec3dc-049b-47cd-ab9b-1b916e2422a8</uuid>
  </job>
</joblist>

With this job we want to execute the step "Before - Physical Machine" if and only if the given host is a physical machine (the tag "is_virtual=false"). If is a virtual machine (is_virtual=true) should fail due to filter with NoMatchedNodes and skip to the next step due to the errorhandler with keepgoingOnSuccess set to 'true'.