steno-aarhus / coding-cafe

https://coding-cafe-sdca-au.netlify.app/
Creative Commons Attribution 4.0 International
0 stars 1 forks source link

Storing and working with very large datafiles #4

Open danielibsen opened 1 year ago

danielibsen commented 1 year ago

Different file formats to work with very large datasets:

snhansen commented 1 year ago
danielibsen commented 1 year ago

@omarsilverman did you have a good format to work with large datasets?

lwjohnst86 commented 1 year ago

DuckDB! It's very fast, since it is SQL-based when working with large data. We use that in the register databases. Connecting through dplyr is super easy. https://duckdb.org/docs/api/r.html

Converting between Parquet (Arrow) to DuckDB is as easy as arrow::to_duckdb(). :grin: