mozilla / redash

This is a Mozilla fork of the re:dash project (https://redash.io/), where we do work to be contributed back to the upstream project and for our own custom needs.
BSD 2-Clause "Simplified" License
20 stars 21 forks source link

Scheduled query failures should be easier to find and understand #309

Open fbertsch opened 6 years ago

fbertsch commented 6 years ago

It's hard to track scheduled queries failing, because not only do they not alert, but the errors never show!

I tested this with a failing query here: https://sql.telemetry.mozilla.org/queries/50316/source

To fix this, we should offer:

Bug 1432317 was a duplicate of this.

jezdez commented 6 years ago

@fbertsch This is a non-trivial set of features that we should not add to our fork but see if upstream is interested before we start implementing.

@alison985 Please file the issues upstream and get feedback wether this is in-scope of upstream or whether we should start looking to implement it as an extension in redash-stmo.

alison985 commented 6 years ago

Started an issue at https://github.com/getredash/redash/issues/2593

alison985 commented 6 years ago

Arik is for this but wants to discuss implementation first.

@jezdez, for review before I post in the upstream issue, I would propose: Priority 1:

Priority 2:

Priority 3:

Priority 4:

jezdez commented 6 years ago

@alison985 Thank you for creating the list of priorities for this new feature, and I agree this is a good idea to solve in a discussion. A bit of feedback:

Prio 1: I'm concerned that the proposed list of scheduled query runs is way too technical for the most common usage pattern and creates more noise than signal for the users, especially if it's offered right next to a high level link like the profile link.

Since users don't have a personalized "homepage" or "dashboard" page it's hard to imagine where the page would be linked from instead though. Instead I'd like to propose to simply move the list of query runs to the individual query detail page (e.g. under /queries//runs) and link to that from the (Prio 2) emails. That should cut down the number of items in the table dramatically and remove it from attention unless a user looks for the specific runs.

To close the attention gap in case a user's query runs fail, I like Prio 3 to be shown on any page, basically introducing a way to track important notifications like failure states with a dropdown or similar (not a full list to simplify query time). I'd be interested to hear what upstream thinks about adding such a section or if it'd be overkill.

Prio 3: Good point on the query time for alerts, we can think about async ways to load that list of alerts though, and optimize on the backend by using Redis' Sorted Set data type for caching (for example).

Prio 4: Nice, if the query runs move next to the view source view of a query, then it's even easier to link from one to the other to indicate that something may be wrong with the query.

Thanks again, this is nicely done, let's see what upstream thinks!

jezdez commented 6 years ago

Oh I forgot two things:

Prio 2: Let's implement "every run" only for now, since I'm not sure if the other patterns of sending error mails would also skew the noise to signal ratio towards noise. The worst error mails are the ones that get ignored by their recipient in my experience: 😬

Prio 3: I'm not sure about limiting the alerts to 1 month either. Let's raise that when you propose the details upstream, ok?

alison985 commented 6 years ago

@jezdez questions: A) When you say "the list of query runs to the individual query detail page (e.g. under /queries//runs)" there is not currently a /queries/#/runs page. Are you saying to create it and use that instead of the /scheduled-query-list page I propose above? Or are you talking about the created and updated timestamp section of the query page itself? Or something else? B) When you say "introducing a way to track important notifications like failure states with a dropdown or similar (not a full list to simplify query time)." I don't know if I follow what you mean by "dropdown". Are you trying to say that we should show that you have notifications but only make them show up in a dropdown of issues when the dropdown is selected? Or something else?

jezdez commented 6 years ago

@alison985

A) yep, a new page for the runs

B) "dropdown" is probably the wrong term, I mean a notification section that is capable to inform users of current events for their user accounts, similar to how GitHub for example shows a small bell icon in the top header area. I guess having some of the notifications show up as a dismissable pop-up depending on the severity of the event would make sense additionally, too.

alison985 commented 5 years ago

Final Plan with all Feedback/edits merged

Priority 1 (THIS ticket):

Priority/Ticket 2:

Priority/Ticket 3:

Priority/Ticket 4:

alison985 commented 5 years ago

First PR upstream: https://github.com/getredash/redash/pull/3065