Closed sellorm closed 2 years ago
It would be useful to add an option to allow storing as parquet also
For what it's worth, the IANA extension for Arrow data is .arrow
, so that's not invalid, though .feather is also commonly used. See also https://arrow.apache.org/faq/.
Parquet is a different format, though the arrow libraries can read and write Parquet.
Thanks for the clarification @nealrichardson. The FAQ you linked recommends the .arrow
extension without mentioning .feather
. Would you recommendation therefore be that pins stays as it is?
Perhaps we could just tweak the docs a little to make things clearer for users.
I'm not sure of the details, or whether the type
parameter directly maps onto file extension. But saving files with a .arrow
extension is not a problem.
Thanks for all the discussion / resources. If I'm understanding, naming the pin type arrow is in line with how apache-arrow sees things (feather v2 is Arrow IPC format), and using .arrow
extension too.
It seems like--based on the docs--if we had to choose between "arrow" and "feather", that "arrow" is where people are being steered toward (e.g. it recommends .arrow as the extension).
It seems like this might be the move for pins...
I've switched pin-python to support type="arrow" (https://github.com/rstudio/pins-python/releases/tag/v0.5.0). Thanks y'all for ironing out!
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.
When people not using the
pins
package see a pin with an arrow data set, it is not immediately obvious what format that actually is.For reference, the format used when
type = "arrow"
is actually 'feather'.From the
pin_write()
help:This is confusing for
arrow
users since there is no formal on-disk format called '.arrow'. Arrow users, generally use either '.parquet' or '.feather'.pins
should either:pins
calls "arrow" is really "feather" under the hood.If
pins
chooses to adopt the second of these approaches then it would be nice to highlight that it's feather in both the package help, as well as in the metadata somewhere.This would hopefully reduce the support burden and help increase cross-language adoption.