[feature request] read parquet from URL (or from raw vector?)

r-lib / nanoparquet

R package to read and write Parquet files

https://nanoparquet.r-lib.org/

Other

48 stars 0 forks source link

[feature request] read parquet from URL (or from raw vector?) #71

Open tanho63 opened 3 months ago

tanho63 commented 3 months ago

Hi! Excited by the looks of this package. A frequent use case I have is reading a parquet from a URL, e.g.

arrow::read_parquet("https://github.com/nflverse/nflverse-data/releases/download/pbp/play_by_play_2023.parquet")

Is this something that would be in-scope for nanoparquet?

gaborcsardi commented 3 months ago

Yes, we could definitely do one or both of those. The challenge for the HTTP is to keep the package lean, but reading from a raw vector is pretty straightforward. write_parquet() already supports writing to a raw vector.

Btw. we could also support reading from an R connection, then you could do

read_parquet(url("https://...."))

tanho63 commented 3 months ago

either of these would be great!

mrcaseb commented 3 months ago

Reading from a connection would be great as that's how we read rds files from url!

gaborcsardi commented 3 weeks ago

To clarify, for a Parquet file, reading from a connection means that we would need to read the whole file first, save it to a temporary file, and then read it from there.

Which you can also do relatively easily as a workaround.

gaborcsardi commented 3 weeks ago

Dev version can read from a connection now.