kestra-io / kestra

Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
https://kestra.io
Apache License 2.0
7.07k stars 414 forks source link

Make it easier to use data stored in ION format in DuckDB #4038

Open lburja opened 3 weeks ago

lburja commented 3 weeks ago

Feature description

When working with Oracle, if I want to copy data from one table to another, I can use a combination of io.kestra.plugin.jdbc.oracle.Query and io.kestra.plugin.jdbc.oracle.Batch, as shown in the documentation.

I have a situation, where I would like to copy data from Oracle into DuckDB.

However, DuckDB does not have a task io.kestra.plugin.jdbc.duckdb.Batch. DuckDB can also not import ION files, which Kestra produces.

The solution for now is to query Oracle and store the result as a ION file, then transform to a CSV file, then load the CSV file into DuckDB. With a task io.kestra.plugin.jdbc.duckdb.Batch, this could be solved simpler, faster, with less CPU and less storage involved.

In the image below, I have selected from Oracle (duration: 15s) and transformed to CSV (duration: 31s). The CSV transform would not be necessary if I could load the results of io.kestra.plugin.jdbc.oracle.Query directly into DuckDB.

image
anna-geller commented 3 weeks ago

Hey, we were planning to contribute an ION extension to DuckDB. It turned out to be more difficult than anticipated. We'll keep investigating and update here once we know more.