Adding GBD datasets to the DAG highlighted the importance of speed in our development workflow. A slow step not only hinders progress at that point but also extends overall build times. Ensuring that our steps run as quickly as possible should be our top priority.
Here are a couple of things we could improve:
[ ] Implement CLI tool for profiling steps based on line_profiler and memprof
[ ] Silence 2024-06-12 13:20:43 [warning ] Passing a SingleBlockManager to Variable is deprecated and will raise in a future version. Use public APIs instead. category=DeprecationWarning filename=/home/owid/etl/lib/catalog/owid/catalog/variables.py lineno=95
[ ] Implement pr.concat function that would work well with categorical variables (and get rid of hacks like this)
[ ] Speed up add_regional_aggregates function (for instance in step garden/ihme_gbd/2024-05-20/gbd_cause or data-private://grapher/ihme_gbd/2024-05-20/gbd_prevalence)
[ ] Speed up grapher upserts, e.g. grapher://grapher/ihme_gbd/2024-05-20/impairments (there were a couple of optimizations already, can we do better?)
Adding GBD datasets to the DAG highlighted the importance of speed in our development workflow. A slow step not only hinders progress at that point but also extends overall build times. Ensuring that our steps run as quickly as possible should be our top priority.
Here are a couple of things we could improve:
line_profiler
andmemprof
2024-06-12 13:20:43 [warning ] Passing a SingleBlockManager to Variable is deprecated and will raise in a future version. Use public APIs instead. category=DeprecationWarning filename=/home/owid/etl/lib/catalog/owid/catalog/variables.py lineno=95
pr.concat
function that would work well with categorical variables (and get rid of hacks like this)add_regional_aggregates
function (for instance in stepgarden/ihme_gbd/2024-05-20/gbd_cause
ordata-private://grapher/ihme_gbd/2024-05-20/gbd_prevalence
)grapher://grapher/ihme_gbd/2024-05-20/impairments
(there were a couple of optimizations already, can we do better?)