owid / etl

A compute graph for loading and transforming OWID's data
https://docs.owid.io/projects/etl
MIT License
58 stars 18 forks source link

🐛 fast-track failure #2825

Open lucasrodes opened 1 week ago

lucasrodes commented 1 week ago

There was a bug coming from some fast-track datasets that prevented ETL to build.

When investigating it a bit, I realised couple of problems that I don't fully understand in the Fast-Track workflow. For the time being, we just commented out these steps (so they are not built) so that ETL can build properly (see https://github.com/owid/etl/pull/2821).

Brief summary

The steps that were failing were:

All of them due to some duplicate index. One can re-create it by uncommenting these steps form dag/fasttrack.yml and running etl run <step_name>.

I've tried to trace back to Google Sheet these files and see if there is anything wrong.

Example: draft_joe_gini_diff_1980_2018

Unsure what's the matter here, but what is being read in ETL does not correspond with what I see on Google Drive.

Could either be bc: (i) I'm editing a different file or (ii) there is some error in the snapshot links to google sheets?

Further comments

Adding data via Fast-Track can be dangerous: there is no CI/CD being shown to the user. And it seems that one can add data that might break our ETL deploy jobs.

This particular example seems to be for some experimental work. Can we use Fast-Track on staging servers? If so, we should for experimental work. If not, we should probably think about it?

larsyencken commented 3 days ago

If we can fix it quick, we fix it, else we close it.