ropensci / piggyback

:package: for using large(r) data files on GitHub
https://docs.ropensci.org/piggyback
GNU General Public License v3.0
182 stars 26 forks source link

Add `pb_read` and `pb_write` functions #115

Closed tanho63 closed 8 months ago

tanho63 commented 8 months ago

Closes #97.

I thought briefly about making this a wrapper around pb_download_url + read function that accepts URLs, but I don't think it had the flexibility I wanted plus I ran into issues downloading from private repositories that I later learned was around not being able to pass an auth token to it.

I think this is the most flexible approach to the problem but would love to hear any thoughts

tanho63 commented 8 months ago

too much going on in read_ methods to abstract away (what about other data serializations, like spatial formats?

I believe this will fail on spatial formats since it would not be one of csv, tsv, rds, parquet? (unfamiliar with how geoparquet works and whether arrow::read_parquet will process geoparquet, but I assume yes?).

what about lazy reads / remote reads etc?

Yep, this will read eagerly by design/default, and maybe it's a bad thing for folks who should be thinking about optimizing - however uninformed users would currently do pb_download anyways so it's not necessarily much different than that?

I agree with improving the docs, e.g.

tanho63 commented 8 months ago

Flow state hit me like a bus, many apologies for this PR running away from me. Since your last review (diff), I: