netdata / netdata-cloud

The public repository of Netdata Cloud. Contribute with bug reports and feature requests.
GNU General Public License v3.0
41 stars 16 forks source link

[Feat]: Support Parent only Dashboard and Health Alert setups - email notification links should support Alert Pages on Parent #1025

Open dogsbody-josh opened 4 months ago

dogsbody-josh commented 4 months ago

Problem

Netdata email notifications for Health Alerts contain URL links to charts involved in the alert but these links do not work for parent/child setups where dashboards are disabled on the child nodes.

Our setup is a follows:

Our setup relies on Netdata's agents to capture metrics on child nodes, store the default amount of data local to the child (in case of connectivity issues), and stream all metrics to the parent for long term storage.

We do not use 'per child' Dashboard UI's (they're disabled on the children) but instead use the Parent node as a 'single pane of glass'. This goes for Health Alerts too where the Parent is responsible for all Health Entities and Email Notifications.

When an alert threshold is passed we use the default scripts to send the default notification email to our ticketing system so we get notice of and can deal with the alert.

These emails contain a link to the chart from which the alert is derived. In the HTML version of the email the link is labelled with a large "GO TO CHART" button and in the Plain text version of the "URL:" field contains the link.

These URLs are of the form:

https://registry.my-netdata.io/registry-alert-redirect.html?agent_machine_guid=AGENT_MACHINE_GUID&host_machine_guid=HOST_MACHINE_GUID&transition_id=TRANSITION_ID&host=HOSTNAME&chart=CHART_NAME&alarm=ALARM_NAME&alarm_unique_id=ALARM_UNIQUE_ID&alarm_id=ALARM_ID&alarm_event_id=ALARM_EVENT_ID&alarm_when=VALUE&alarm_status=CLEAR&alarm_chart=ALARM_CHART&alarm_value=ALARM_VALUE

As far as we understand the registry, this type of link will never work with our setup as the redirect would be expecting to be sent to the discovered URL of the child nodes dashboard. That's disabled in our setup and so will never be discovered or accessible.

If this is correct then under the current system there's no possibility of email notifications containing a working deep-link to the alerting chart.

We attempted to run a registry on the parent node ourselves to see if we could control the URL but this highlighted the same issue - namely that the re-direct didn't work.

netdata-redirect-failure

Description

In our setup the parent is responsible for the Health Entity and the Alert Notification - it should be possible to therefore construct a link to the Alert Page on the Parent that sent the notification.

We would then like the option to configure the Alert Notification emails such that instead of constructing the $gotourl variable with the Registry URL, we can instead say 'use the deep-link (constructed in the process above) to the Alert Page on the Parent node that sent this notification'.

To try and sum that up we think that because the Parent has the chart/metrics being alerted on, has the Health Entity, has the Alert Page and sends the Alert Notification, it should be possible to have the 'GO TO CHART' button and 'URL:' fields in the HTML/plain text emails be a deep-link to the Alert Page on the Parent node.

Put even more briefly - Alert Notifications should have the option to link directly to the corresponding Alert Pages on the nodes that sent the alert (i.e. the parent in our case),

Importance

must have

Value proposition

  1. In the current system, email notifications containing registry links to charts where the Child dashboards are disabled will never work
  2. Deep-linking to the alert page on the Parent provides the expected behaviour in Parent/Child setups where the Parent is responsible for all Health alerts

Proposed implementation

We haven't investigated the inner working of how the current URL is constructed. To keep things simple in broad terms we would like an option under the [health] section of netdata.conf that allows us to configure the URL used in email notifications.

One proposal might look like:

[health]
email_notification_links = alertpage|registry

In the above case, registry would preserve the current behaviour and alertpage would link to the alert page on the parent node that sent the alert.

This is a naive proposal but one aimed at starting the conversation :-)

shyamvalsan commented 3 months ago

Hey @dogsbody-josh sorry for the late response, this one flew under our radar, just to confirm you are saying that you are receiving the alert notifications (for the child) from the parent agent with a link to the local child dashboard?

But since you are a Netdata cloud user, Netdata cloud should be centralizing the alerts and you should be getting email notifications from Netdata cloud where the link to the chart is a link to Netdata cloud chart. Does this not happen in your case?

shyamvalsan commented 3 months ago

@stelfrag is there a way for the parent agent to link the parent agent dashboard (instead of child agent dashboard) for cases such as these?

dogsbody-josh commented 3 months ago

just to confirm you are saying that you are receiving the alert notifications (for the child) from the parent agent with a link to the local child dashboard?

That is correct. The link in the notification we receive tries to take us to the child dashboard, which is disabled in our case. I caveat that by stating that is our interpretation to the best of our knowledge based on the style of the link being sent. The link is to the registry which we believe is used to discover and then access the child dashboard.

dogsbody-josh commented 3 months ago

But since you are a Netdata cloud user, Netdata cloud should be centralizing the alerts and you should be getting email notifications from Netdata cloud where the link to the chart is a link to Netdata cloud chart. Does this not happen in your case?

I am not sure about this. Our setup is parent/child and the parent is the one that's sending emails for notification, not Netdata Cloud. So for us I believe we are using Agent Dispatched Notifications, and specifically the email variant of those.

As far as I understand it, these are much more flexible and have important functionality like Roles etc, and so are where we would wish to continue sending notifications from.

Perhaps our feature request needs an additional clarification to state the request also relates to when using Agent Dispatched Notifications (the parent being the Agent in our case).