In working with polars I've observed that the performance of certain functions is greatly impacted by how you construct queries. For instance, sink_parquet works well if the preceding functions only involve joins, filters, and selects (but hits memory errors if you create new variables or simply wont allow window functions). While collect(streaming = True) and more performant than streaming when creating variables.
Another question is how to write parquet datasets with Polars and perform incremental file builds that writes in the correct format with a proper schema.
In working with polars I've observed that the performance of certain functions is greatly impacted by how you construct queries. For instance,
sink_parquet
works well if the preceding functions only involve joins, filters, and selects (but hits memory errors if you create new variables or simply wont allow window functions). Whilecollect(streaming = True)
and more performant than streaming when creating variables.Another question is how to write parquet datasets with Polars and perform incremental file builds that writes in the correct format with a proper schema.