keephq / keep

The open-source alert management and AIOps platform
https://platform.keephq.dev
Other
3.19k stars 234 forks source link

[➕ Feature]: Incidents #1349

Closed Matvey-Kuk closed 1 month ago

Matvey-Kuk commented 1 month ago

Introduce the incident page and the Incident data model.

Data model: For now, incidents from some providers are pulled as "alerts", they should be pulled as "incidents".

UI: User should be able to manually CRUD Many:One relationships between Alerts and Incidents. Also please note that most probably user would love to add multiple alerts to the incident in a bulk way.

talboren commented 1 month ago

@Matvey-Kuk @VladimirFilonov can we maybe elaborate further on the actual data model for incidents (e.g., actual fields: incident id, name, description, started, ended, who participated, affected services, etc.)

@Matvey-Kuk is it true that alerts can be related with more than 1 incident? e.g. it's Many:One but same alert can be referred to more than 1 incident?

Matvey-Kuk commented 1 month ago

@talboren re data model, I propose to keep-it-simple for the beginning:

it true that alerts can be related with more than 1 incident

I don't know a real-life example where it could be true, but it may complicate UI perception a lot, so I'm a bit hesitant to this idea.

talboren commented 1 month ago

I don't know a real-life example where it could be true, but it may complicate UI perception a lot, so I'm a bit hesitant to this idea.

In what sense? I can't share a real-life example too but I can see why it makes sense that 2 non-independent incidents (in different parts of the system though) might share the same alerts as context for RCA. I'm just worried about narrowing down our options here (and I can't see the complications)

Matvey-Kuk commented 1 month ago

Good example, let's imagine we have an alert_a being a part of incident_b and incident_c.

We have options:

  1. Connect these alert to both incidents.
  2. Wrap alert_a to a incident_d and link it to incident_b and incident_c as a "root cause".

I'm leaning towards 2 because:

  1. We'll need to introduce root cause graph for dependent incidents anyway.
  2. We may want to bring per-incident bulk actions for alerts. If alerts will belong to multiple incidents, it may confuse users.

Not strong opinion anyways, also I think we may migrate the data scheme from simple to complicated any moment.

CC @GlebBerjoskin

talboren commented 1 month ago

Good example, let's imagine we have an alert_a being a part of incident_b and incident_c.

We have options:

  1. Connect these alert to both incidents.
  2. Wrap alert_a to a incident_d and link it to incident_b and incident_c as a "root cause".

I'm leaning towards 2 because:

  1. We'll need to introduce root cause graph for dependent incidents anyway.
  2. We may want to bring per-incident bulk actions for alerts. If alerts will belong to multiple incidents, it may confuse users.

Not strong opinion anyways, also I think we may migrate the data scheme from simple to complicated any moment.

CC @GlebBerjoskin

No strong opinion too but it was a little hard for me to understand reading quickly, so I lean towards over-engineering in those cases :P I feel like a reference table of AlertId to IncidentId is good enough as a beginning and then adjusting might be "easier"