owid / etl

A compute graph for loading and transforming OWID's data
https://docs.owid.io/projects/etl
MIT License
76 stars 20 forks source link

New ETL channel (or namespace) to track dependencies of external repositories #2508

Closed pabloarosado closed 4 months ago

pabloarosado commented 5 months ago

Context & motivation

We would love to get an error early, rather than breaking something.

Proposal

Inspired by this thread, I thought we could have a new ETL channel (or, as an easier alternative, a new namespace), e.g. repository, which hosts latest versions of steps that are loaded by external repositories, like poverty-data, covid-19-data, co2-data, energy-data, and also owid-grapher.

With this, if anyone accidentally deletes or archives a dependency of these repository steps, we will get an error by VersionTracker. We could also show these dependencies in the ETL Dashboard (and treat them as always active, even if they don't have charts).

Specifically, Sophia said it would be useful for her to have latest versions of grapher/worldbank_wdi/2023-05-29/wdi/wdi#ny_gdp_pcap_pp_kd and for grapher/demography/2023-03-31/population/population#population_historical. They could be easily created in the new channel or namespace.

Technical notes

Scope

danyx23 commented 5 months ago

Another option for how to do this would be to add an alias system that we use to keep what are probably basically our MiMs in an easily accessible namespace.

I.e. the aliases would not be an ETL channel or something like this but a new kind of URI scheme that is mapped to full ETL paths. Maybe something to discuss in our offsite.

sophiamersmann commented 5 months ago

Grapher has the following indicator IDs currently hardcoded:

In the future, the entity selector has these two variables hardcoded (not yet merged, but implemented in https://github.com/owid/owid-grapher/pull/3466)

pabloarosado commented 4 months ago

Now that the external channel is created, I've created https://github.com/owid/etl/issues/2609 to add all existing dependencies (including the ones listed by Sophia above).