Closed matreyes closed 6 months ago
Thank you @matreyes! I have some questions to understand your use case better:
Answering your questions
Why not use Explorer/DataFrames for the data manipulation? I currently work for a big corp with lot of tools and languages, the only common language between eng, scientists and analyst is sql. Also persistence (views or tables) could be useful and I can reuse DuckDB transformations in our DW (BigQuery) later on with few changes.
Do you actually have a Duck DB in disk? I use both (memory and disk). In disk is great for having a kind of local "datamart"; in memory resembles more a typical dataframe workflow. Also "Running on a persistent database allows spilling to disk, thus facilitating larger-than-memory workloads (i.e., out-of-core-processing)."
How are you currently importing data into Duck DB to work with it? I usually import csv or parquet files directly to duckdb
Thank you! This looks good to me. If you are happy with it, we can merge it.
To be clear, let me know if it is ready by saying yes or no :)
Thanks @jonatanklosko , It's OK from my side.
:green_heart: :blue_heart: :purple_heart: :yellow_heart: :heart:
In my role as Data Engineer I've been moving out from dataframes (aka spark) as I'm building everything with DBT and BigQuery and locally with DuckDB (CLI). Using just sql to build different stages (kind of delta lake), has been a great way to facilitate the work of data scientists and for data governance.
I can see how DuckDB could work as the Ingestion and transformation layer to expose a simplified and maybe materialized "data marts" to explorer or vegalite. This is so powerful!
Amazing work with ADBC !