Open armetiz opened 11 months ago
On MBP, using DuckDB 0.9.0 with an in-memory database, I tried to fetch a large data-set.
Here the DuckDB error :
Error: near line 1: Out of Memory Error: could not allocate block of size 262KB (27.4GB/27.4GB used)
Database is launched in in-memory mode and no temporary directory is specified.
Unused blocks cannot be offloaded to disk.
Launch the database with a persistent storage back-end
Or set PRAGMA temp_directory='/path/to/tmp.tmp'
IMHO,
Using an in-memory database with setting temp_directory
is adapted to a stateless task.
This should be the case by default.
Whereas using DuckDB with a persistent storage back-end could be useful only if it could be "re-used" between tasks. This should be a Kestra option.
I mean something like that. Tasks :
echo "CREATE TABLE t1 AS SELECT 42 AS i, 84 AS j;" | duckdb database.file
echo "COPY t1 TO 'output.parquet' (FORMAT PARQUET)" | duckdb database.file
It could be useful because SQL operations could be split between dedicated task, to improve debug, maintenance, readability ...
Feature description
By default, DuckDB start with an in-memory database.
To avoid out-of-memory, it could be useful to connect DuckDB to a database file.
From Java documentation :