mansueto-institute / dejure-defacto-taxrates

Code to process First American data to generate property tax rates for the entire US.
MIT License
1 stars 1 forks source link

Make a polars best practices guide #1

Open nmarchio opened 9 months ago

nmarchio commented 9 months ago

In working with polars I've observed that the performance of certain functions is greatly impacted by how you construct queries. For instance, sink_parquet works well if the preceding functions only involve joins, filters, and selects (but hits memory errors if you create new variables or simply wont allow window functions). While collect(streaming = True) and more performant than streaming when creating variables.

Another question is how to write parquet datasets with Polars and perform incremental file builds that writes in the correct format with a proper schema.