Closed fbertsch closed 3 months ago
This file seems to reference that data source as well: https://github.com/mozilla/metric-hub/blob/a5da1569fa8f5df56c4b6c8e52e173232aedf1f5/looker/definitions/firefox_desktop.toml#L88
Looks reasonable to me so I'll approve.
One thing I was planning to tweak in the near future is probably removing the second paragraph of a lot of those descriptions (i.e., the ones starting "For more information..." and then linking to confluence and then listing email addresses) since we already link to Confluence in the first part of every description edited here (I think) and if you use the
owner
field then those display nicely in the right hand sidebar of DataHub. Not sure if it's easier/cleaner to remove them now or I can do it later in a separate PR with some other tweaks.For my own knowledge, where do the
looker/definitions
show up? Are they displayed or viewable in Looker somewhere? Or are they just used to reference metric definitions from Looker?
It's a good idea to make these descriptions more succinct. Since it sounds like these are mostly proposed changes to documentation and less likely to introduce breaking changes, let's follow-up on them afterwards.
@bochocki with Frank out for a couple weeks do you think we're good to merge this? Or should we wait for @scholtzan to respond to Frank's comment from a few days ago?
I think maybe it could help to wait for @scholtzan.
I'm going to go ahead and merge; I wrote a short script for a recent PR that aims to validate specific Metric Hub definitions. I wrote a similar script for this case (below). I verified that it runs before the change, and will also check it after the merge.
The script doesn't test the looker and opmon directories, but those are less critical paths and the changes are all similar. We can also check the OpMon dashboards tomorrow to make sure they're still running as expected. I think those dashboards are:
!pip install mozanalysis -qq
import pandas as pd
from google.cloud import bigquery
from google.colab import auth
from mozanalysis.config import ConfigLoader
from textwrap import dedent
auth.authenticate_user()
def fetch_metric_hub_definition(start_date, end_date, metric_slug, app_name, bq_project):
start_date = pd.to_datetime(start_date).date()
end_date = pd.to_datetime(end_date).date()
# Set useful attributes based on the Metric Hub definition
metric = ConfigLoader.get_metric(metric_slug=metric_slug, app_name=app_name)
submission_date_column = metric.data_source.submission_date_column
# Modify the metric source table string so that it formats nicely in the query.
from_expression = metric.data_source._from_expr.replace("\n", "\n" + " " * 15)
query = dedent(
f"""
SELECT {submission_date_column},
{metric.select_expr} AS value
FROM {from_expression}
WHERE {submission_date_column} BETWEEN '{start_date}' AND '{end_date}'
GROUP BY {submission_date_column}
"""
)
print(
"\n",
"\n" + "-" * 110,
"\n" + f" Querying for '{app_name}.{metric_slug}' ".center(110, "-"),
f"\n> data_source: {metric.data_source.name}"
f"\n{query}"
)
df = bigquery.Client(project=bq_project).query(query).to_dataframe()
# ensure submission_date has type 'date'
df[submission_date_column] = pd.to_datetime(df[submission_date_column]).dt.date
return df
definitions = {
"firefox_desktop": ["daily_active_users_v2", "desktop_dau_kpi_v2"],
"multi_product": ["mobile_daily_active_users_v1", "mobile_dau_kpi_v1"],
"fenix": ["daily_active_users_v2"],
"firefox_ios": ["daily_active_users_v2"],
"focus_android": ["daily_active_users_v2"],
"focus_ios": ["daily_active_users_v2"],
}
for app_name, metric_slugs in definitions.items():
for metric_slug in metric_slugs:
df = fetch_metric_hub_definition(
start_date="2024-03-14",
end_date="2024-03-21",
metric_slug=metric_slug,
app_name=app_name,
bq_project="moz-fx-data-bq-data-science"
)
display(df.set_index(df.columns[0]).transpose())
I'm going to go ahead and merge; I wrote https://github.com/mozilla/metric-hub/pull/523#issuecomment-2237524578 that aims to validate specific Metric Hub definitions. I wrote a similar script for this case (below). I verified that it runs before the change, and will also check it after the merge.
This appears to work as expected; all of the definitions can still be queried and are using the updated data sources.
I also triggered the Airflow KPI forecasting task, which is probably the highest-profile asset impacted by these changes. The task completed as expected.
The OpMon dashboards are running the day after this merge. Seems like everything works as expected.
We were using the source name
active_users_aggregates_v1
, which actually pointed to a view. This now clarifies that we're pointing to the view, and filtering to just desktop.Also updated the link in the metric definition to point to the generated source query.