m-lab / etl

M-Lab ingestion pipeline
Apache License 2.0
22 stars 7 forks source link

fix file size bins #1036

Closed gfr10598 closed 2 years ago

gfr10598 commented 2 years ago

When using heatmaps in Grafana, non-uniform bins cause visual artifacts that can be quite misleading, or hide actual features of significance. The important feature is that the bin distribution should be smooth, i.e. either linear, logarithmic, polynomial, etc.

We have a few hand crafted bin sets in our metrics that are not very uniform (neither linear nor logarithmic). This PR fixes one of those that was causing gfr confusion in mlab-sandbox development.

metrics.FileSizeHistogram will now have a smooth bin distribution, and easier interpretation. It uses 7 bins per decade, to approximate the density in the previous bin set.

NOTE: When grafana heatmaps cover a period with multiple bin sets, the display will be almost meaningless. Changing the time window to an interval with a single bin set resolves the problem.


This change is Reviewable

coveralls commented 2 years ago

Pull Request Test Coverage Report for Build 7011


Files with Coverage Reduction New Missed Lines %
active/active.go 4 88.54%
<!-- Total: 4 -->
Totals Coverage Status
Change from base Build 6945: -0.07%
Covered Lines: 3840
Relevant Lines: 5986

💛 - Coveralls
gfr10598 commented 2 years ago

Before

Screen Shot 2021-12-13 at 7 44 38 AM

After

Screen Shot 2021-12-13 at 7 43 42 AM