pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
30.25k stars 1.95k forks source link

scan_parquet from io.BytesIO() #10413

Closed s-b90 closed 3 weeks ago

s-b90 commented 1 year ago

Problem description

Add ability to accept io.BytesIO() as source parameter for scan_parquet. As for now, it accepts only a path to file/s. This feature may be useful in cases when your program receives parquet through rest API or socket, directly into memory.

ritchie46 commented 1 year ago

I am pretty sure that this is a duplicate. :thinking:

s-b90 commented 1 year ago

True, I'm sorry. I've found some related issues #4950 #9511. They all are about scan_csv but you definitely can close this as a duplicate. Just don't forget about parquet also :)

Object905 commented 1 year ago

It would also be great if scan* and read* functions had unified input "type" for files\bytes\etc.. Also it will be nice so that they accepted list of BytesIO or path-like, to process them in parallel like with glob pattern.

adamgreg commented 1 year ago

My application has Parquet embedded as BLOBs in SQL tables, and processes and combines them lazily. I would love to see support for this - at the moment I have to use read_parquet() and miss out on pushdown optimisations.

aberres commented 10 months ago

A similar use case here. We have a bunch of Parquet files in memory I want to work with, without having all of them in memory at the same time.

shoz commented 9 months ago

I would be very happy with this improvement. I have about a million parquet files stored as binaries in Redis and I want to read them as LazyFrame to save memory space.

HWiese1980 commented 2 months ago

This is still open. Has here been progress? I need this functionality too.

cmdlineluser commented 2 months ago

@HWiese1980 Coincidentally, it was added on main a few hours ago https://github.com/pola-rs/polars/pull/18532

HWiese1980 commented 2 months ago

Hah! That's quite the timing! :-D Thanks!

s-b90 commented 3 weeks ago

Can confirm that everything is working. Thanks @coastalwhite!