ropensci / arkdb

Archive and unarchive databases as flat text files
https://docs.ropensci.org/arkdb
Other
78 stars 6 forks source link

Adding parquet functionality and filter injection for ark. #40

Closed 1beb closed 2 years ago

1beb commented 2 years ago

TODOs:

Questions:

Outline:

  1. Added function: streamable_parquet which imports the arrow functions following the other external packages
  2. Functions appear to be designed for text based format. Making a connection object the defacto. However, parquet requires a named sink. Needed to add conditionals to ark con object for this purpose.
  3. parquet files would never be written without header information, made adjustment to keep_open to accomodate this.
1beb commented 2 years ago

@cboettig this is ready for another round of review. Note my question about unark. I'm not sure how meaningful (or even desirable) it would be.

cboettig commented 2 years ago

:eyes: nice, looking promising here! just ping me when you're ready for a review!

1beb commented 2 years ago

@cboettig

I accidentally included some of the filter injection on this one. Quick question, do you agree with the filter injection filter (as a concept), and if so, can I combine these two into a single PR (filter + parquet)?

cboettig commented 2 years ago

yup, I noticed the injection filter was here too. I agree that in principle it's something we should be handling, at least as an option, so I'm happy to have it wrapped in to the same PR.

1beb commented 2 years ago

@cboettig ok, we're ready for a review.

cboettig commented 2 years ago

:rocket: nice work, all looks good here.