mikemccand / luceneutil

Various utility scripts for running Lucene performance tests
Apache License 2.0
205 stars 115 forks source link

Export nightly chart data to JSON each night #205

Open mikemccand opened 2 years ago

mikemccand commented 2 years ago

One of the awesome suggestions that came out of the ApacheCon NA 2022 talk ("Learning from 11+ years of Apache Lucene benchmarks") was to export at least the values for all nightly charts to a simple format (JSON), each night.

We could then build on that to more automatically spot anomalies, regressions, set up alarms, etc.

mikemccand commented 2 years ago

OK, I pushed a rough first cut at this! It produces this file each night. Maybe we can build on this to make alarms/anomaly detection/pre-release checks of some kind.

There are still some TODOs:

  # TODO: also include all known annotations!                                                                                                                                 
  # TODO: when errorBars is true, the variance(s) is/are extra columns in the data, so the headers                                                                            
  #       look incorrect now                                                                                                                                                  
  # TODO: get all other benches into this -- geo, sparse, github PRs, etc.   
msokolov commented 2 years ago

Ooh, awesome - so that file is labeled "for_all_time.json" - should we publish a dated file? Oh never mind I see that it has all the previous days in it. SO It looks like "schema" is:

{
   MetricName -> [
     Title,
      [
        [ Date, header1, ... headerN],
        [ "YYYY-MM-DD HH:MM:DD", data1, ..., dataN]
      ]
   ], ...
}

I noticed a few squdgy things though:

  1. sometimes there are different numbers of headers from data columns (eg in NRT)
  2. numbers are represented as strings

Suggestions:

  1. could we add some header column explaining the additional value(s)? Are they variances?
  2. Can we use JSON numbers?
mikemccand commented 2 years ago
  1. could we add some header column explaining the additional value(s)? Are they variances?

Yeah +1 I'll try to add that (they are variances, I think!).

2. Can we use JSON numbers?

Ahh good catch -- I'll try to fix that too.