sacundim / covid-19-puerto-rico

COVID-19 data and graphs for Puerto Rico
13 stars 6 forks source link

Kill off SQLAlchemy (massive speedup) #59

Closed sacundim closed 1 year ago

sacundim commented 1 year ago

We chose to use SQLAlchemy for this around May 2020 and back then it made more sense because we were on PostgreSQL, but

  1. We've long been Athena-only and PyAthena has made leaps and bounds since then in speed
  2. We will likely want to use some subset of Arrow, Polars and DuckDB in the future, and SQLAlchemy doesn't really help us in that world

Apart from killing off all the SQLAlchemy here, we switch to PyAthena's PandasCursor which downloads result CSVs directly off S3, and that turns out to be a 2x performance boost.