owid / etl

A compute graph for loading and transforming OWID's data
https://docs.owid.io/projects/etl
MIT License
58 stars 18 forks source link

✨ chart-diff: add metadata-diff view, bug fixes #2791

Closed lucasrodes closed 3 weeks ago

lucasrodes commented 3 weeks ago

BUG: Charts without no change type tag are not being shown.

Bug explained! These charts are: new charts and charts whose config has changed for special fields. I fixed the bug for 'new charts'. For charts with only changes in fields specified by [configs_are_equal](https://github.com/owid/etl/blob/data/plastic-surgery/apps/wizard/app_pages/chart_diff/chart_diff.py#L240-L246) are not considered as different by this function. Hence, these are not tagged with the label `change_type='config'`. Consequently these charts are not shown unless we enable 'show all'. This, however, is not consistent with what [modified_charts_by_admin](https://github.com/owid/etl/blob/data/plastic-surgery/apps/chart_sync/cli.py#L492) from chart_sync is doing. It gets charts that have suffered any change in the config. And this is important, because the charts obtained by `modified_charts_by_admin` are those listed by owidbot. Consequently, in this PR we observe that there are **3 charts** listed by @owidbot, yet only two charts are shown by default in chart-diff. We should align these behaviours. Some ideas: 1. Further filter the charts obtained by `modified_charts_by_admin` by checking their configs and ignoring them based on the same criteria as `configs_are_equal`. 2. Change the logic from `configs_are_equal` and just tag a chart with _any_ change with `change_type='CONFIG'`. 3. Leave as-is, but make @owidbot ignore charts with changes in fields mentioned in `configs_are_equal`. We would have charts with such minor config changes in chart-diff, but wouldn't be shown. My personal preference is either 1 or 2. I find 3 confusing.

BUG: There are leaks and charts from other workflows are shown

likely fixed by https://github.com/owid/etl/pull/2797

owidbot commented 3 weeks ago
Quick links (staging server): Site Admin Wizard

Login: ssh owid@staging-site-data-plastic-surgery

chart-diff: ✅
  • 2 new charts (2 reviewed)
  • 1 modified charts (1 reviewed)
data-diff: ✅ No differences found ```diff + Dataset garden/health/2024-06-11/isaps_plastic_surgery + + Table isaps_plastic_surgery + + Column num_procedures Legend: +New ~Modified -Removed =Identical Details Hint: Run this locally with etl diff REMOTE data/ --include yourdataset --verbose --snippet ``` Automatically updated datasets matching _weekly_wildfires|excess_mortality|covid|fluid|flunet|country_profile|garden/ihme_gbd/2019/gbd_risk_ are not included

Edited: 2024-06-12 20:20:23 UTC Execution time: 3.84 seconds

lucasrodes commented 3 weeks ago

@Marigold FYI, I found couple of minor bugs. Tried to explain them in the PR description.

I estimate that you will start working like 2 hours before me, so I thought I'd ping you in case you want to look into this before I start working. Otherwise I'll work on it tomorrow.

Marigold commented 3 weeks ago

Thank you @lucasrodes. Just a heads up that I hotfixed two of those bugs on master & this branch, but there are other things to fix (or at least to discuss them). I'll write it up as soon as I finish firefighting.

lucasrodes commented 3 weeks ago

@Marigold I'd say you can play around with charts that originate from 'metadata' changes. If you run this locally, you will get gpt summaries (bc there are no credentials on staging to access openai gpt).

on this matter, do you think we could have credentials added to staging servers? or is it risky?

lucasrodes commented 3 weeks ago

Also, not really the need to review the new dataset pipeline.