owid / etl

A compute graph for loading and transforming OWID's data
https://docs.owid.io/projects/etl
MIT License
83 stars 22 forks source link

Version Tracker complains because `grapher://grapher` steps are not found in the dag #3267

Open pabloarosado opened 2 months ago

pabloarosado commented 2 months ago

Problem

Currently, running etl d version-tracker raises an error, because, e.g.

* Missing step
    grapher://grapher/energy/2024-06-20/primary_energy_consumption
  is a dependency of the following active steps:
    export://multidim/energy/latest/energy

Expected behaviour

We would expect that version tracker automatically includes the auto-generated grapher://grapher/... steps when sanity checking the dependency graph, i.e. no error here.

Why this is happening

This happens because the grapher://grapher/ dependency is not in the dag (however, the corresponding data://grapher/ step is in the dag, so the error should not be raised). This started happening recently, since we started having export steps that depend on grapher://grapher steps.

Technical notes

pabloarosado commented 2 weeks ago

This issue also causes StepUpdater to fail when, e.g. archiving steps. I can imagine that it may also fail occasionally when updating steps, and maybe the information shown in the dashboard (regarding the status of a step) may not be correct. The main reason why this happens is that grapher://grapher steps are not defined in the dag.

The easiest would be to replace them with data://grapher, but I think that may break some of the new logic on export steps.

An alternative would be to add some logic when reading the dag, so that additional (hidden) steps are added, namely, the "grapher://grapher/[STEP]": "data://grapher/[STEP]". This doesn't need to happen for all grapher://grapher steps, but only to those that explicitly appear in the dag.

Marigold commented 2 weeks ago

There's a function construct_dag with a couple of arguments for adding various steps. If that doesn't help, we should start refactoring how is DAG constructed (it's pretty messy right now).