What started as performance optimizations of GHE ended up as multiple performance optimizations across ETL.
pr.concat and pr.merge gracefully handle categoricals and don't convert them to objects
Add owid.datautils as a dependency to owid.catalog
Silence annoying DepreciationWarnings when upserting to MySQL
More efficient ds.save(), especially for large datasets
Optional repack when saving dataset
Let etl d profile handle nested functions
This speeds up GHE & GBD by about 40% (mostly because of categorical variables). I was thinking whether this could cause issues for other datasets and whether I should increment ETL_EPOCH to test them all, but I since it only affects categorical variables, I think it should be fine.
chart-diff: ✅
No charts for review.
data-diff: ✅ No differences found
```diff
Legend: +New ~Modified -Removed =Identical Details
Hint: Run this locally with etl diff REMOTE data/ --include yourdataset --verbose --snippet
```
Automatically updated datasets matching _weekly_wildfires|excess_mortality|covid|fluid|flunet|country_profile|garden/ihme_gbd/2019/gbd_risk_ are not included
What started as performance optimizations of GHE ended up as multiple performance optimizations across ETL.
pr.concat
andpr.merge
gracefully handle categoricals and don't convert them to objectsowid.datautils
as a dependency toowid.catalog
DepreciationWarnings
when upserting to MySQLds.save()
, especially for large datasetsrepack
when saving datasetetl d profile
handle nested functionsThis speeds up GHE & GBD by about 40% (mostly because of categorical variables). I was thinking whether this could cause issues for other datasets and whether I should increment
ETL_EPOCH
to test them all, but I since it only affects categorical variables, I think it should be fine.