Open delan opened 1 week ago
To fix this, we need to avoid sending the whole database to the client, even if there are no filters. This is tricky, because the dashboard frontend currently relies on being aware of all data matching the given filters, in order to do the “all results have the same” analysis correctly.
Why? After all, being able to see all historical data from the web UI is nice to have, but it wasn’t a design goal. But the dashboard was designed to support all of the following at the same time:
These requirements may not be set in stone though, and that could affect the solution. For example, maybe we don’t always need live updates, or maybe we don’t need them at all. Either way, I think we need to move the “all results have the same” analysis to the server, and limit the number of results ever returned to the client. If you want to see really old data, dig into the database yourself.
I think that the default view should either show what most-flaky
or least-flaky
shows, but clamping the number of results shown. These views are quite useful:
most-flaky
is useful to know if there is a test that is flaking a lot -- to detect what the highest value flaky tests to fix are.least-flaky
is useful to know if a test has stopped flaking and we can close the intermittent bug for it.Regarding live updates, I think we almost never need live updates. The flakiness results are incredibly noisy, so a delta of data from a run that happens isn't so useful.
With ~71K rows in
"attempt"
(~29 days worth), the response with no filters orsince
is over 40 MB, which is very unwieldy, even though the dashboard only makes this request once per page load. The endpoint also takes 380 ms to start sending a response, >210 ms of which is in DashboardDB.select_attempts:These response times, and the server’s memory usage, also increase greatly under contention: