We have the design for exposing errors coming up on the Code Insights roadmap. However, the current state of things is that the insights backend doesn’t have a proper mechanism for handling errors coming from downstream APIs.
Right now the insights backend collects slices of Skipped, Alerts and Errors events received from the search streaming API (and compute API, minus the skipped). Our current setup is that all error events are terminal, alerts are retried, and “skipped reasons” are just logged.
Two examples of where this isn't working:
The shard-timeout skipped event was just logged, so we weren’t retrying on it.
A structural search where the patterntype hasn’t been specified returns an alert, but would require the user to change their query to work, so we'd waste processing power until the retry hits have been exhausting,
This issue is just to have a better structure for tabulating non-success events received from streaming APIs, such that we can log/retry/terminate execution accordingly.
We have the design for exposing errors coming up on the Code Insights roadmap. However, the current state of things is that the insights backend doesn’t have a proper mechanism for handling errors coming from downstream APIs.
Right now the insights backend collects slices of
Skipped
,Alerts
andErrors
events received from the search streaming API (and compute API, minus the skipped). Our current setup is that all error events are terminal, alerts are retried, and “skipped reasons” are just logged.Two examples of where this isn't working:
shard-timeout
skipped event was just logged, so we weren’t retrying on it.This issue is just to have a better structure for tabulating non-success events received from streaming APIs, such that we can log/retry/terminate execution accordingly.