sensu / sensu-puppet-handler

Deregister Sensu entities without an associated Puppet node
MIT License
0 stars 5 forks source link

Please Change puppetNodeExists Function to Examine Payload for Deactivated field #4

Closed csoleimani closed 4 years ago

csoleimani commented 4 years ago

I think due to our puppet node ttl is set to the default value (14 days), I believe the handler is not working 100% as expected. After doing some troubleshooting, I've found that even if we have purged a node in puppet, it remains in the puppet database until the node ttl has expired. Here is what the payload looks like for a node that has not been purged when hitting the /pdb/query /v4/nodes endpoint:

{
"deactivated": null,
"latest_report_hash": "636634b6d72f3d2a8f1c61be8c66adb096d1b88e",
"facts_environment": "puphandler",
"cached_catalog_status": "not_used",
"report_environment": "puphandler",
"latest_report_corrective_change": true,
"catalog_environment": "puphandler",
"facts_timestamp": "2020-03-05T18:36:48.323Z",
"latest_report_noop": true,
"expired": null,
"latest_report_noop_pending": true,
"report_timestamp": "2020-03-05T18:08:18.528Z",
"certname": "sensu-master01.corp.test.internal",
"catalog_timestamp": "2020-03-05T18:07:04.762Z",
"latest_report_job_id": null,
"latest_report_status": "failed"
}

Here is what it looks like on a node we have recently purged:

{
  "deactivated": "2020-03-02T23:39:54.601Z",
  "latest_report_hash": "8326d678d0fe6d7a76724fc908e958513e9dccca",
  "facts_environment": "production",
  "cached_catalog_status": "not_used",
  "report_environment": "staging",
  "latest_report_corrective_change": false,
  "catalog_environment": "production",
  "facts_timestamp": "2020-03-02T23:26:15.667Z",
  "latest_report_noop": true,
  "expired": null,
  "latest_report_noop_pending": false,
  "report_timestamp": "2020-03-02T23:26:29.131Z",
  "certname": "puptest-node01.corp.test.internal",
  "catalog_timestamp": "2020-03-02T23:26:22.275Z",
  "latest_report_job_id": null,
  "latest_report_status": "failed"
}

The main difference is that the deactivated field has a date in it, rather than being null. If we could change the handler to not just check for an HTTP OK, but examine the payload and ensure that deactivated is not null, I think the handler would work more effectively. I'll take a stab at it soon, but here's the part I'm talking about with some added pseudocode logic

    // Determine if the node exists
    if resp.StatusCode == http.StatusOK {
        log.Printf("puppet node %q exists, checking if deactivated.", name)
                 if client.response["deactivated"] != nil {
        return true, nil
                } else {
                 return false, nil
                }
    } else if resp.StatusCode == http.StatusNotFound {
        log.Printf("puppet node %q does not exist", name)
        return false, nil
    }

    return false, fmt.Errorf("unexpected HTTP status %s while querying PuppetDB", http.StatusText(resp.StatusCode))
}

I've confirmed by querying the puppetdb log that even after purging the node, the handler gets a 200 even on a purged node, so it thinks the node exists and will not delete it. Here is what that log looks like:

172.31.201.59 - - [05/Mar/2020:18:51:14 +0000] "GET /pdb/query/v4/nodes/puptest-node01.corp.test.internal HTTP/1.1" 200 694 "-" "Go-http-client/1.1" 7 -

Does it make sense if I say that?

echlebek commented 4 years ago

Fixed by #5