Open danielibsen opened 1 year ago
@omarsilverman did you have a good format to work with large datasets?
DuckDB! It's very fast, since it is SQL-based when working with large data. We use that in the register databases. Connecting through dplyr is super easy. https://duckdb.org/docs/api/r.html
Converting between Parquet (Arrow) to DuckDB is as easy as arrow::to_duckdb()
. :grin:
Different file formats to work with very large datasets: