googleapis / python-bigquery-dataframes

BigQuery DataFrames
https://cloud.google.com/python/docs/reference/bigframes/latest
Apache License 2.0
194 stars 36 forks source link

Polars Support #735

Open firmai opened 4 months ago

firmai commented 4 months ago

It would be great to offer Polars support, it is currently half as popular as Pandas, and generally work better for large datasets. Polars is bound to replace most data-scientist day to day operations within the next five years.

Thanks for developing bigframes, it very useful.

TrevorBergeron commented 4 months ago

What kind of polars support would you find useful? Would you want BigQuery DataFrames to have an polars-like DataFrame API (as an alternative to the current pandas-like one) or simply interop with polars objects more easily?

lmmx commented 3 months ago

I would like automatic schema supply, this is currently the limiting step in automatically uploading Polars DataFrames: write_ndjson seems to be the only way I can upload list dtypes (Parquet seems to not be viable, see this issue), but NDJSON requires the schema to be passed. I'm really looking for something that will just let me put my Polars DataFrame in a BQ table without fiddling with schemas: there should be enough info already here to do that for me.

tswast commented 3 months ago

For going from BigQuery DataFrames to polars, I'm adding a to_arrow method in https://github.com/googleapis/python-bigquery-dataframes/pull/807 as well as an example for how to create a polars DataFrame from the results.