nasa / opera-sds-ops

Apache License 2.0
4 stars 2 forks source link

Add script to generate 90-day metrics reports #26

Open sjlewis-jpl opened 1 year ago

sjlewis-jpl commented 1 year ago

Purpose

Proposed Changes

Issues

Testing

This script was tested using about 3 weeks of detailed Production Time and Retrieval Time reports. It is able to ingest and combine the detailed reports, and generate a summary report over the whole period. The summary report matches those produced in OPS, including histograms of the times. The reports and histograms can be found on GDrive (Testing Summary Report Generation folder)

riverma commented 1 year ago

@sjlewis-jpl - question about the input for your script: can we feed in only the daily detailed reports (i.e. the daily CSV files) as opposed to the daily summary reports (i.e. the daily .zip, which includes the daily summary csv + the histogram PNGs)?

sjlewis-jpl commented 1 year ago

@sjlewis-jpl - question about the input for your script: can we feed in only the daily detailed reports (i.e. the daily CSV files) as opposed to the daily summary reports (i.e. the daily .zip, which includes the daily summary csv + the histogram PNGs)?

Correct, it requires the detailed CSV reports. The detailed reports can be daily, weekly, some other cadence, or a mix of them. The only caveat to that is that the detailed reports can't overlap, since there's no logic in it to detect and handle duplicate entries.

With the summary reports, we could compute the product counts, and the min, max, and mean times. However, we would not be able to compute the median or 90th percentile, nor generate new histograms. Not being able to compute the 90th percentile is the real deal breaker, since our retrieval time requirement explicitly uses that statistic.

sjlewis-jpl commented 1 year ago

Also, while the script can work on either Production Time reports or Retrieval Time reports, it can only do one at a time. It will throw an error if it tries to ingest both types at the same time.