newrelic / nri-winservices

Windows services Integration for New Relic Infrastructure
Apache License 2.0
8 stars 8 forks source link

Hosts custom attributes from Infrastructure yaml configuration file not appearing in labels section of the alert json payload #113

Closed OwainWin closed 2 years ago

OwainWin commented 2 years ago

We have noticed that incidents genertaed by the Windows Services Integration are not adding the hosts custom attributes as set in the New Relic Infrastructure yaml configuration file to the labels section of the alerts json payload. We can see these custom attributes and values being collected in the windows_service_state table under Date explorer/Metrics so know that they are being collected from the host. However they are not being added to the alert payload labels section as they are for other alerts that are generated. We use some of these custom attributes for the create and assignment of Jira tickets so current this process cannot work for Windows Service Integration calls.

I know that Windows Service Integration is in Beta so:

Thanks

Expected Behaviour

That all the host custom attributes as configured in the hosts Infrarsture ageny .yaml configuration file are passed to the labels section of the alerts JSON payload. See below for an example of an alert where this happening e.g. CPU usage metric alert where the hosts custom attributes are:

`{
  "metadata": {
    "entity.type": "HOST",
    "entity.name": "blanked-out"
  },
  "closed_violations_count": {
    "critical": 0,
    "warning": 0
  },
  "incident_acknowledge_url": "https://alerts.eu.newrelic.com/accounts/2885393/incidents/23944669/acknowledge",
  "targets": [
    {
      "id": "6784985827110899467",
      "name": "blanked-out",
      "link": "https://infrastructure.eu.newrelic.com/accounts/blanked-out/alertLanding?violationId=59515229",
      "labels": {
        "account": "Account blanked-out",
        "application_tier": "N/A",
        "displayName": "blanked-out",
        "environment": "Test",
        "fullHostname": "blanked-out.blanked-out.blanked-out",
        "guid": "blanked-out",
        "hostStatus": "running",
        "hostname": "blanked-out",
        "instanceType": "Standard_B1ms",
        "patch_day": "Monday",
        "patch_time": "03:00-06:00",
        "regionName": "westeurope",
        "service": "Authentication",
        "support_contact": "me@mine.com",
        "support_team": "Systems Cloud Services Team",
        "support_url": "N/a",
        "windowsFamily": "blanked-out",
        "windowsPlatform": "Microsoft Windows Server 2016 Datacenter",
        "windowsVersion": "10.0.14393 Build 14393"
      },
      "product": "INFRASTRUCTURE",
      "type": "Host"
    }
  ],
  "duration": 162,
  "incident_id": 23944669,
  "event_type": "INCIDENT",
  "account_name": "Account blanked-out",
  "details": "CPU % > 1 for at least 1 minutes on 'blanked-out'",
  "condition_name": "CPU Test",
  "timestamp": 1634744104830,
  "owner": "",
  "severity": "CRITICAL",
  "policy_url": "https://alerts.eu.newrelic.com/accounts/2885393/policies/47196",
  "current_state": "open",
  "policy_name": "High CPU",
  "incident_url": "https://alerts.eu.newrelic.com/accounts/2885393/incidents/23944669",
  "condition_family_id": 361607,
  "version": "1.0",
  "condition_id": 1658378,
  "account_id": "blanked-out",
  "timestamp_utc_string": "2021-10-20, 15:35 UTC",
  "open_violations_count": {
    "critical": 1,
    "warning": 0
  },
  "condition_description": "This is the description:\n\nopwdc250\n\n\ntargetname: blanked-out",
  "violation_callback_url": "https://infrastructure.eu.newrelic.com/accounts/blanked-out/alertLanding?violationId=59515229"
}`

Actual Behaviour

The alert JSON payload for Windows Service Integation incidents is not adding the hosts custom atrributes to JSON payloads labels section even though these can be seens in windows_service_state table under Date explorer/Metrics. See below for an example of a Windows Service Integration where the hosts custom attributes are not added, this is from the same from host as the example above in Expected Behaviour and the following custom atrributes are missing:

{
  "metadata": {
    "entity.type": "WIN_SERVICE",
    "entity.name": "WIN_SERVICE:e18a2c32-246f-48f1-bc05-cf90f718abda:spooler",
    "evaluation_system_source": "Willamette"
  },
  "closed_violations_count": {
    "critical": 0,
    "warning": 0
  },
  "incident_acknowledge_url": "https://alerts.eu.newrelic.com/accounts/blanked-out/incidents/23945275/acknowledge",
  "targets": [
    {
      "id": "Metric",
      "name": "blanked-out_spooler_stopped",
      "link": "https://insights.eu.newrelic.com/accounts/blanked-out/query?query=SELECT%20count%28%2A%29%20FROM%20Metric%20WHERE%20metricName%20%3D%20%27windows_service_state%27%20AND%20state%20%21%3D%20%27running%27%20AND%20service_name%20%3D%20%27spooler%27%20FACET%20hostname%2C%20service_name%2C%20state%20TIMESERIES%201%20minute%20SINCE%20%272021-10-20%2009%3A50%3A54%27%20UNTIL%20%272021-10-20%2015%3A49%3A54%27",
      "labels": {
        "account": "Account blanked-out",
        "accountId": "blanked-out",
        "displayName": "Print Spooler",
        "display_name": "Print Spooler",
        "guid": "blanked-out",
        "hostname": "blanked-out",
        "process_id": "0",
        "run_as": "LocalSystem",
        "service_name": "spooler",
        "start_mode": "auto",
        "state": "stopped",
        "trustedAccountId": "blanked-out"
      },
      "product": "NRQL",
      "type": "Query"
    }
  ],
  "duration": 395,
  "incident_id": 23945275,
  "event_type": "INCIDENT",
  "account_name": "Account blanked-out",
  "details": "WIN_SERVICE:e18a2c32-246f-48f1-bc05-cf90f718abda:spooler query result is > 0.5 on 'MSiSCSI Service Integration - Service Stopped'",
  "condition_name": "MSiSCSI Service Integration - Service Stopped",
  "timestamp": 1634744994742,
  "owner": "",
  "severity": "CRITICAL",
  "policy_url": "https://alerts.eu.newrelic.com/accounts/blanked-out/policies/47231",
  "current_state": "open",
  "policy_name": "high cpu",
  "incident_url": "https://alerts.eu.newrelic.com/accounts/blanked-out/incidents/23945275",
  "condition_family_id": 377093,
  "version": "1.0",
  "condition_id": 1658347,
  "account_id": "blanked-out",
  "violation_chart_url": "https://gorgon.service.eu.newrelic.com/image/b2766767-7458-4dae-8ab2-d8c8e7c116cb?config.legend.enabled=false",
  "timestamp_utc_string": "2021-10-20, 15:49 UTC",
  "open_violations_count": {
    "critical": 1,
    "warning": 0
  },
  "condition_description": "Alert Details\n\nHostname: blanked-out\nService Name: \nThe service condition has changed to: \n",
  "violation_callback_url": "https://insights.eu.newrelic.com/accounts/blanked-out/query?query=SELECT%20count%28%2A%29%20FROM%20Metric%20WHERE%20metricName%20%3D%20%27windows_service_state%27%20AND%20state%20%21%3D%20%27running%27%20AND%20service_name%20%3D%20%27spooler%27%20FACET%20hostname%2C%20service_name%2C%20state%20TIMESERIES%201%20minute%20SINCE%20%272021-10-20%2009%3A50%3A54%27%20UNTIL%20%272021-10-20%2015%3A49%3A54%27"
}
gsanchezgavier commented 2 years ago

Hi @OwainWin,

Thanks for adding all the details to this issue.

What the alert is showing under labels are the Entity Tags, but the metadata added by the custom attributes are not added as tags for the Win services entities. So this is expected behavior.

Please contact Support to submit a Feature Request on Alerts to add a metrics metadata section to the Alert incident.