mozilla / bigquery-etl

Bigquery ETL
https://mozilla.github.io/bigquery-etl
Mozilla Public License 2.0
241 stars 98 forks source link

Consolidate the country/region names to match the Legal recommended data source #5841

Open data-sync-user opened 3 days ago

data-sync-user commented 3 days ago

In alignment with the guidance stated in this Wiki, please consolidate the field names and values for the country/region names we are using.

Requirements to consider

Context

When we rolled out the new version of the KPI dashboard which now has regional breakdowns of DAU, a Mozillian requested changing ‘Taiwan, province of China’ to ‘Taiwan’. [~accountid:6345617db391eab61f71c0a2] consulted Legal who pointed the team to the Wiki page and suggested we implement the changes outlined above.

┆Issue is synchronized with this Jira Story

data-sync-user commented 3 days ago

➤ Misun Mizener commented:

Assigning to Alexander Nicholson as I understand you helped with the current country mapping dataset. Tagging George Kaberere for context (as this came up during the Beyond DAU sprint discussions).

data-sync-user commented 1 day ago

➤ Sean Rose commented:

I have thoughts/concerns:

data-sync-user commented 19 hours ago

➤ Alexander Nicholson commented:

{quote}Certain 'country or region' names in the Legal-recommended (see the Wiki link above) GENC list ( https://nsgreg.nga.mil/registries/browse/results.jsp?registryType=genc&registerField=GEC&itemTypeField=fgp&entryTypeField=all ) are different from the country names currently in the data source such as:

I’ll do this in a PR shortly

{quote}* The field name ‘Region’ should be switched to ‘Continent’ (to capture values like Asia, North America, Antarctica, etc.){quote}

Looking into this currently. It seems since https://github.com/mozilla/bigquery-etl/pull/5562 ( https://github.com/mozilla/bigquery-etl/pull/5562|smart-link ) , all the non-continent regions have been removed, so continent might be more accurate also. This will require some coordinated changes.

{quote}* The field name ‘Country’ should be switched to ‘Country or Region’{quote}

This one is more complicated. Echoing some of Sean’s comments below, a few quick questions Misun Mizener:

  1. For users of the core country utils, the field shouldn’t appear as “Country” by default. We don’t have a country field name in the core BigQuery table, it’s called country_codes_v1.name. In the Looker View, it’s also called name, though it could appear as Countries.name due to the Looker view being called Countries. We could change this view to something like Country Mapping or similar to the table Country Codes. Would this be sufficient? As it is right now by default it should simply show up for end-users as name/Name.
  2. In terms of field names in tables, as Sean mentioned, we could do a one-time search for specifically country_name/country in a few repositories, but this won't necessarily prevent people from renaming them in future downstream tables/one-off queries. Field names in tables don't necessarily appear as-is in user-facing analysis anyway (as shown by the labels applied in this dashboard https://mozilla.cloud.looker.com/dashboards/1784 ( https://mozilla.cloud.looker.com/dashboards/1784?Date%20for%20YoY%20Comparison=2%20days%20ago )), could this be a guidance/documentation issue?