owid / etl

A compute graph for loading and transforming OWID's data
https://docs.owid.io/projects/etl
MIT License
58 stars 18 forks source link

SDG grapher step generates a non-deterministic short-name #726

Closed larsyencken closed 1 year ago

larsyencken commented 1 year ago

Problem

As part of a comparison of Pandas 1.5.1 and 1.5.2 output for the ETL, it appears that many tables in the data/grapher/un_sdg/2022-07-07/un_sdg dataset get minor renames like below.

Before: data/grapher/un_sdg/2022-07-07/un_sdg/_10_3_1__vc_vov_gdsd__no_breakdown_by_disability__age__all_areas__both_sexes.feather

After: data/grapher/un_sdg/2022-07-07/un_sdg/_10_3_1__vc_vov_gdsd__all_areas__no_breakdown_by_disability__age__both_sexes.feather

My interpretation is that the table names are being generated by exploding dimensions, but the explosion is not done in a deterministic order.

Impact

It's not a big deal for our usage in Grapher, but it makes QA on them more difficult, and it means the short names aren't stable handles for these variables at the moment.

larsyencken commented 1 year ago

@Marigold I'm pretty sure the tiny PR I made fixes this, but please double check it.