mozilla / lookml-generator

LookML Generator for Glean and Mozilla Data
https://mozilla.github.io/lookml-generator/generator.html
Mozilla Public License 2.0
15 stars 16 forks source link

clientinfo.first_run_date should be a date, not a string for baseline tables #255

Open irrationalagent opened 2 years ago

irrationalagent commented 2 years ago

this field is a string in the underlying table, it has format e.g. 2021-09-30+02:00. usually when im dealing with this field directly i do something like DATE(LEFT(clientinfo.first_run_date,10)).

its unclear to me where the best place to change this might be. naively i think we would want to make the change to bigquery-etl but that would require a schema update etc. on the other logical extreme we could extend the views in spoke-default to include the date version. a glance at the code in this repo makes me think handling it here would create a special case that we might not want.

jklukas commented 2 years ago

We have existing logic to parse this date as part of the baseline_clients_daily query: https://github.com/mozilla/bigquery-etl/blob/0e524168a9bbf63b653064776525c6c0223a7231/bigquery_etl/glean_usage/templates/baseline_clients_daily_v1.query.sql#L22

We have handled parsing for other metadata by adding parsed_ variants at the view level. See mozfun.norm.metadata which is called in the automatically-generated views on top of historical ping tables: https://github.com/mozilla/bigquery-etl/blob/main/sql/mozfun/norm/metadata/udf.sql

So my inclination would be to add client_info.parsed_first_run_date by modifying the logic in https://github.com/mozilla/bigquery-etl/blob/main/sql/mozfun/norm/glean_baseline_client_info/udf.sql

The intention of adding parsed_ variants was that we'd still have the underlying values available. There has been a discoverability problem, though, as many folks don't know to look for parsed fields.