Closed s-b90 closed 3 weeks ago
I am pretty sure that this is a duplicate. :thinking:
True, I'm sorry. I've found some related issues #4950 #9511. They all are about scan_csv
but you definitely can close this as a duplicate. Just don't forget about parquet also :)
It would also be great if scan* and read* functions had unified input "type" for files\bytes\etc.. Also it will be nice so that they accepted list of BytesIO or path-like, to process them in parallel like with glob pattern.
My application has Parquet embedded as BLOBs in SQL tables, and processes and combines them lazily. I would love to see support for this - at the moment I have to use read_parquet()
and miss out on pushdown optimisations.
A similar use case here. We have a bunch of Parquet files in memory I want to work with, without having all of them in memory at the same time.
I would be very happy with this improvement. I have about a million parquet files stored as binaries in Redis and I want to read them as LazyFrame to save memory space.
This is still open. Has here been progress? I need this functionality too.
@HWiese1980 Coincidentally, it was added on main a few hours ago https://github.com/pola-rs/polars/pull/18532
Hah! That's quite the timing! :-D Thanks!
Can confirm that everything is working. Thanks @coastalwhite!
Problem description
Add ability to accept io.BytesIO() as source parameter for
scan_parquet
. As for now, it accepts only a path to file/s. This feature may be useful in cases when your program receives parquet through rest API or socket, directly into memory.