We chose to use SQLAlchemy for this around May 2020 and back then it made more sense because we were on PostgreSQL, but
We've long been Athena-only and PyAthena has made leaps and bounds since then in speed
We will likely want to use some subset of Arrow, Polars and DuckDB in the future, and SQLAlchemy doesn't really help us in that world
Apart from killing off all the SQLAlchemy here, we switch to PyAthena's PandasCursor which downloads result CSVs directly off S3, and that turns out to be a 2x performance boost.
We chose to use SQLAlchemy for this around May 2020 and back then it made more sense because we were on PostgreSQL, but
Apart from killing off all the SQLAlchemy here, we switch to PyAthena's PandasCursor which downloads result CSVs directly off S3, and that turns out to be a 2x performance boost.