:bar_chart: Speed up GHE & GBD

owid / etl

A compute graph for loading and transforming OWID's data

MIT License

58 stars 18 forks source link

What started as performance optimizations of GHE ended up as multiple performance optimizations across ETL.

pr.concat and pr.merge gracefully handle categoricals and don't convert them to objects
Add owid.datautils as a dependency to owid.catalog
Silence annoying DepreciationWarnings when upserting to MySQL
More efficient ds.save(), especially for large datasets
Optional repack when saving dataset
Let etl d profile handle nested functions

This speeds up GHE & GBD by about 40% (mostly because of categorical variables). I was thinking whether this could cause issues for other datasets and whether I should increment ETL_EPOCH to test them all, but I since it only affects categorical variables, I think it should be fine.

owid / etl

:bar_chart: Speed up GHE & GBD #2872