influxdata / flux

Flux is a lightweight scripting language for querying databases (like InfluxDB) and working with data. It's part of InfluxDB 1.7 and 2.0, but can be run independently of those.
https://influxdata.com
MIT License
769 stars 153 forks source link

feat: make `endpoint` report the upstream status code #5510

Closed btasker closed 1 day ago

btasker commented 6 days ago

endpoint() populates the column _sent based on whether upstream returned a 200, however it doesn't report the status code to the caller.

endpoint is used (amongst other things) in 2.x's Notification Rules - not having the status code available can make it quite difficult to troubleshoot why a notification wasn't sent.

With this change, endpoint will include an additional column: status_code

In the context of a notification rule, this will result in entries in _monitoring/notifications carrying the tag status_code, making it possible to graph out notifications by the upstream status code:

from(bucket: "_monitoring")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "notifications")
  |> filter(fn: (r) => exists r["status_code"])
  |> group(columns: ["check_id", "status_code"])
  |> aggregateWindow(every: v.windowPeriod, fn: count)

Note: the addition of this column could cause issues where the result is written into a bucket with an explicit schema.

However, explicit schemas are only available on Cloud 2 - the query log has been checked for any queries which could result in issues, with none identified.

Checklist

Dear Author :wave:, the following checks should be completed (or explicitly dismissed) before merging.

Dear Reviewer(s) :wave:, you are responsible (among others) for ensuring the completeness and quality of the above before approval.

sanderson commented 6 days ago

To @mhilton's point, we might want to add this to all of the *.endpoint() functions in the stdlib so that they're consistent.

btasker commented 6 days ago

Yeah that's a fair point - I did think it'd be worth adding to at least the Slack and PD ones, but hadn't done the same legwork on looking to see whether there are any unusual uses of them.

I don't love changing the return of something quite so core, but this is causing some pain.

The alternatives seem to be

The latter's doable, but feels a little OTT. On the other hand, it would also be safer.

I'll try have a play around tomorrow

btasker commented 2 days ago

we might want to add this to all of the *.endpoint() functions in the stdlib so that they're consistent.

Starting with this - my first run at adding a different function ran into some headaches, which'll only be bigger when moving onto some of the other functions.

Pre-scan Notes

As pagerduty already has _status, rather than changing that I'll update this PR to also use that and then add it to the others.

btasker commented 2 days ago

Looks like _status was added to PD for much the same reasons - https://github.com/influxdata/flux/pull/4796 - so, it makes sense to keep things consistent

btasker commented 2 days ago

Note: If the windows build commit above works I'll rebase it out and raise a fresh PR for that, that way we don't lose it if this change ends up having to be reverted for some reason

btasker commented 1 day ago

Alright, rebasing that back out and I'll try find time to prod at it separately.

mhilton commented 1 day ago

This can be merged even if the windows build isn't working. It is unrelated to this PR.