sensu / sensu-go

Simple. Scalable. Multi-cloud monitoring.
https://sensu.io
MIT License
1.03k stars 174 forks source link

New event ID to be same across failure situation #4080

Open mcatngena opened 4 years ago

mcatngena commented 4 years ago

Feature Suggestion

With feature https://github.com/sensu/sensu-go/issues/3125 there is unique ID per event. This ID is changing by each command execution so each result/event has the unique ID.

Sensu Core had very nice id field where ID was the same while non 0 state took place. ID was restarted only during transitions 0->!0 or !0->0. This ID was part of the concept of Sensu Core where Sensu provided events and results endpoints. Now with Sensu Go, there is only Events endpoint.

Would be great to have another ID which would do the same even Sensu Go does not distinguish between event and result.

Possible Implementation

N/A

Context

With Sensu Core, it was great to have option via event ID (in terminology of Sensu Core) to track the failure of certain device or component - all results with non 0 state in continuous sequence had same identificator.

portertech commented 4 years ago

Sensu Classic (Core) events had an ID, however, it acted like an "incident ID", persisting until the event resolve. The intent of the event ID in Sensu Go is to allow users to follow it's path through the system. Unlike Classic, Sensu Go events persist across state/status changes. What is your use case for this other ID?

mcatngena commented 4 years ago

I got your point now. With Sensu Go, when you track event with some handler, you get details of all executions incl. status code so when you focus on entity name & check name, you see all historical states and transitions.

The other ID I meant is exactly that "incident ID". External ticketing systems often use some aggregation mechanism relying on such Incident ID. With incident ID you can capture ongoing situation end to end. Even the entity and check name is the same, ticketing system can automatically open and close separate tickets per such situation based on Incident ID. So simply aggregation of tickets based on this ID. I don't think there something similar in Sensu Go.

portertech commented 4 years ago

@mcbsd You could combine the event entity name, check name, and check last_ok to create what could be considered an incident ID. Thoughts?

mcatngena commented 4 years ago

@portertech clever idea indeed. I see only one imperfection (I don't recall if that was the same with Classic or not, maybe you can confirm) but last transition from !0 -> 0 resets also last_ok field. So with combination you suggest, you will not receive very last update that issue has been cleared. (very first transition from 0 -> !0 is captured correctly)

portertech commented 3 years ago

Ah, you are right! Shoot. This suggests a unique identifier for first !0 to second 0 (include resolution) is necessary.

portertech commented 3 years ago

@mcbsd What ticket systems are you pairing Sensu Go with?

mcatngena commented 3 years ago

@portertech we use Moogsoft

mcatngena commented 3 years ago

@portertech Hi, any update on this? Could it be enqueued into the roadmap? Thanks