kestra-io / kestra

:zap: Workflow Automation Platform. Orchestrate & Schedule code in any language, run anywhere, 500+ plugins. Alternative to Zapier, Rundeck, Camunda, Airflow...
https://kestra.io
Apache License 2.0
13.68k stars 1.18k forks source link

Make it easier to use data stored in ION format in DuckDB #4038

Open lburja opened 5 months ago

lburja commented 5 months ago

Feature description

When working with Oracle, if I want to copy data from one table to another, I can use a combination of io.kestra.plugin.jdbc.oracle.Query and io.kestra.plugin.jdbc.oracle.Batch, as shown in the documentation.

I have a situation, where I would like to copy data from Oracle into DuckDB.

However, DuckDB does not have a task io.kestra.plugin.jdbc.duckdb.Batch. DuckDB can also not import ION files, which Kestra produces.

The solution for now is to query Oracle and store the result as a ION file, then transform to a CSV file, then load the CSV file into DuckDB. With a task io.kestra.plugin.jdbc.duckdb.Batch, this could be solved simpler, faster, with less CPU and less storage involved.

In the image below, I have selected from Oracle (duration: 15s) and transformed to CSV (duration: 31s). The CSV transform would not be necessary if I could load the results of io.kestra.plugin.jdbc.oracle.Query directly into DuckDB.

image
anna-geller commented 5 months ago

Hey, we were planning to contribute an ION extension to DuckDB. It turned out to be more difficult than anticipated. We'll keep investigating and update here once we know more.