Closed 1beb closed 2 years ago
@cboettig This is ready for review.
@cboettig ready for round 2
Looks good! My only other thought is maybe to mention in the README as well that ark
is automatically selecting streamable_parquet()
; otherwise the reader might assume from the syntax that it is still writing out with the usual default.
I actually went the other way. Instead of making the decision for someone, I chose to stop() if they used window-parallel without straeamable_parquet. I'm not sure what the best decision is here. I don't think I'd want to get parquet if I was expecting the default. Can't guarantee people will pay attention to that detail in a readme, but they will attempt to rectify a stop. Maybe?
That makes sense to me.
Only place I think may still be confusing, at least to me, is in "Strategy 1", it looks like this is not using streamable_parquet()
? https://github.com/ropensci/arkdb/pull/48/files#diff-72778b58969c8ca8268402860b0e003e3d213a26c812bc9f9b928395c284c99fR139
excellent, this looks good to me!
This is a candidate example for writing windowed database out to files in parallel.
Backgrounder:
Currently, the only way to run in parallel is to run multiple tables at a time. But what if you have an exceptionally large table? This pull request includes a new function
window_parallel
that allows you to run a large table in parallel.Key points:
future.apply
, however, any parallel function could be substituted (furrr, etc).TODO:
arkdb
devtools::test_file()
usageRefs #21 because it supports parallelization at a different level.