Open nagomiso opened 9 months ago
I agree that this is a great feature. We should add this natively on the rust side.
I'm not quite experienced with Polars and understand that this is already known and you'd like a more straightforward interaction (apologies for the noise if so), just in case it's any useful to anyone, it is possible to directly write parquet files to GCS (as well as to other storage providers)
import polars as pl
import gcsfs
# df = pl.read_parquet('file_path')
# Assuming `df` is your Polars DataFrame, and that GOOGLE_APPLICATION_CREDENTIALS env variable si correctly set
fs = gcsfs.GCSFileSystem()
# Define your GCS bucket and file path
destination = "gs://bucket/folder/file.parquet"
# Write the DataFrame to a Parquet file directly in GCS
with fs.open(destination, mode='wb') as f:
df.write_parquet(f)
It is possible to write parquet file with use_pyarrow:
df.write_parquet("gs://bucket/folder/file.parquet", use_pyarrow=True)
Description
I am using Polars with Python. When I attempted to save the Dataframe to Google Cloud Storage by specifying the URI of Google Cloud Storage and executing
df.write_parquet()
, aFileNotFoundError
occurred, and the write operation failed.pl.read_parquet()
can directly load files from Google Cloud Storage, so similarly, I would likedf.write_parquet()
to be able to save files to Google Cloud Storage directly.Environment
In the execution environment, the versions of the dependencies that seem to be relevant are as follows: