numaproj / numaplane

Control Plane for Numaproj
Apache License 2.0
13 stars 2 forks source link

Splunk alert on errors #423

Closed juliev0 closed 11 hours ago

juliev0 commented 1 week ago

Summary

We need a good way to have an alert which occurs when there are errors in the logs, but we don't want to include errors from resource version conflicts which are transient.

The only way I can think of to do this is to use a Splunk Alert which looks for "error" level but filters out the resource version conflict errors. (From the log I'm looking at now, I see this text: "the object has been modified; please apply your changes to the latest version and try again". I am not sure whether this is always the exact text or not.)

According to @calvinx408, Splunk has the concept of a "federated alert". We can request splunk team (@avaz in Slack) to create the alert across all of splunk for our Numaflow AddOn, index "iks".


Message from the maintainers:

If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

juliev0 commented 11 hours ago

moved this to Internal repo: https://github.intuit.com/oss-analytics/numa-manifest-generator/issues/128