Add checks in sizing for disparate datasets

mozilla / mozanalysis

A library for Mozilla experiments analysis

https://mozilla.github.io/mozanalysis/

Mozilla Public License 2.0

9 stars 13 forks source link

Add checks in sizing for disparate datasets #173

Closed m-d-bowerman closed 4 months ago

m-d-bowerman commented 1 year ago

Currently, no warning is raised when Metrics and Segments are using data sources that rely on different applications' datasets. A check should be added in sizing.py for a mismatch between data sources.

jaredsnyder commented 4 months ago

Is there a straightforward way to identify data sources you can think of? If we assume all metrics and data sources are being created from metric-hub we could just add the application name as an attribute to the Metric and Segment classes, but how do we handle the case where from_expr is used to generate a Metric or Segment?

m-d-bowerman commented 4 months ago

I'll give this some more thought. One thing that came to mind was checking the datasets returned when running get_single_window_data, to see if there are any columns in that set with all 0 values, which would happen if the apps mismatched.