Monitoring issue - Githubissues

achamayou commented 4 years ago

I was wondering how we could replicate the history page we have historically done with jupyter, and I had an idea.

Since the API to post to issues and to comments is really the same, how about configuring a given issue as the “performance monitoring” issue for the target branch (master), now that we have a tend view?

When a build happens there, we still plot (without a branch value, but that’s just skipping a bit of logic), and push to the “performance monitoring” issue!

It’s a bit clunky, sure, but it’s also really easy to do!

It’s also a nice place to push links to suspicious looking builds once we have anomaly detection? Although come to think of it, it would be better for that to simply create tickets, so they can be commented against etc.

@jumaffre, @letmaik, what do you think?

letmaik commented 4 years ago

I think it's a good idea, as it avoids having to play games with gh-pages. I assume this would always edit the first post in this special issue instead of adding new comments.

achamayou commented 4 years ago

Yeah that's what we already do in PRs since https://github.com/jumaffre/cimetrics/pull/71

jumaffre commented 4 years ago

Definitely the most minimal way to keep track of metrics on master - I really like it! A few implementation considerations:

I believe the issue will have to be created first and then passed to cimetrics somehow (metrics.yml?) as it would otherwise be difficult to discover the issue number from subsequent builds if it's first created by cimetrics.
Will we want to display all the points we have or a reasonable long and fixed span?
We should display the results in one colon only as space won't be as expensive as for the PRs.

achamayou commented 4 years ago

It looks like we can quite easily query by label for example, so it could be that the config specifies a label (defaulting to cimetrics_status or something), and then we use that to find the issue. If there are multiple ones, we can always update them all.
All the points is a lot of points, and once it's more points than pixels, it's going to get mushy. So I think fixed a amount, configurable, and defaulting to a small multiple of the default span perhaps.
Yeah we could easily do that, we can also just fork all the config for status, with different defaults, and let users decide how many they want:

pr:
  span: 20
status:
  label: cimetric_status
  span: 100

Something like that?

cimetrics commented 4 years ago

achamayou-patch-9@588 aka 20200524.1 vs master ewma over 48 builds from 2 to 586 images

achamayou commented 4 years ago

There is an early draft for this here https://github.com/jumaffre/cimetrics/pull/77, which is what produced the comment above.

There's a few things to sort out:

[x] configure the span
[ ] configure the branch to be monitored explicitly (unless someone has a better idea for this)
[x] switch to one column plot
[x] all plots should have x ticks

Thoughts welcome etc.

achamayou commented 4 years ago

The reason we have so little master history is because of the recent introduction of the __complete flag, in case anyone's wondering.

achamayou commented 4 years ago

Some thoughts now that this is in CCF as https://github.com/microsoft/CCF/issues/1063:

It's not just the x-axis that's too sparse, y-axis too. I think we want at least the start value, and probably some steps we've managed to detect too. We probably want those steps in x-axis as well, rather than random build points.
Because of the change in aspect ratio, the fonts, points etc end up being too large. Not sure what the best way to correct for that is.
Have another stab at making some elements linkable (SVG?), not being able to click on build/events is super annoying for this view.

eddyashton commented 4 years ago

Really good idea. Some thoughts after seeing this on https://github.com/microsoft/CCF/issues/1063, with an eye to how we actually discuss the results this shows:

Needs more x-axis ticks, and would be nice to see dates rather than build numbers since we have so much space ("The increase around the Nth of Foontember was because...")
Each graph could be much taller - space is much cheaper in the dedicated issue than on a PR comment
Editing the same comment means that explanatory comments will start to reference things that are no longer visible ("This is why it got slower" from 3 months ago, but the graph only shows the steady increases of the last 2 weeks). This might be fine, especially if the explanatory comments are rare and precise, but it may be worth rolling onto multiple comments/multiple graphs to keep a longer history?

achamayou commented 4 years ago

Yes, absolutely, there's a dire lack of x-ticks already mentioned above. I'm trying to do something that comes up with relevant ticks, but failing that I'll put some evenly spaced ones instead.
This is currently done as a single large picture (because it's a convenient reuse of code from the PR code path), so there are some size limits, but I agree we can get more vertical space. I suspect shrinking the titles will actually help quite a bit.
Yes, but it's not obvious how that might work. One thing that could happen is that we capture the chart manually from time to time. Another thing may be that updates happen to a chart for a time period, before a cutoff and a new one gets issued. The problem then becomes that comparing across a cutoff is a bit difficult. I don't have a good idea yet, ideas welcome!

achamayou commented 4 years ago

There's some excellent stuff here https://github.com/rob-med/awesome-TS-anomaly-detection, in particular https://arundo-adtk.readthedocs-hosted.com/en/stable/notebooks/demo.html#LevelShiftAD

achamayou commented 4 years ago

This may be useful too: https://klabum.github.io/rrcf/

achamayou commented 4 years ago

The detection could probably do with a bit of tweaking before we use it to generate the ticks.

achamayou commented 4 years ago

So aligning the window size with the one we use for ewma helps a lot, and so does lowering the confidence a bit:

achamayou commented 4 years ago

I think the auto-detection is good enough, so the next items on are I think:

[x] Split x ticks per series, and set x and y ticks according to detected anomalies.
[x] Try to convert build ids to build numbers if possible (I seem to remember there was a reason this was difficult, but I have forgotten what it is)

Specifying the detection branch doesn't seem all that useful, it's easy enough to modify the config per-branch, like we've done with pbft, so I'm going to call that solved.

There's more to do for long term history, as @eddyashton suggested, but I think that's distinct from having the monitoring issue cover an arbitrarily large window usefully, which I think we'll basically have once those two things are done.

achamayou commented 4 years ago

With the 0.2.20 release, I'm going to call monitoring complete. This is what it looks like in CCF: https://github.com/microsoft/CCF/issues/1063

jumaffre / cimetrics

Monitoring issue #73