We adapted our current flyte duckdb plugin to also work with motherduck.
Wrote an example for a blogpost:
Wanted to showcase the following:
Efficiency of duckdb for analytical queries.
The ability to query in memory data and remote data in motherduck at the same time in the same query.
To showcase these features, I chose an ecommerce dataset that would suit very analytical queries. With some "new" incoming ecommerce data, say in a dataframe, we can also query that at the same time as remote data in motherduck to get some sales and customer trends/statists.
The current state of the example:
We have a Union workflow that takes in a pandas dataframe containing recent ecommerce data.
We run various queries that simultaneously target the local (dataframe) and remote (motherduck) data to get some information like "how are products selling compared to average?"
Display some plots in the Union UI showing the results of those queries.
There will also be an optional input into the workflow which is a prompt. This can be something simple like "how many customers bought product x?" Or something more complex. This is routed to openai function calling which prepares a duckdb query that can be ran on motherduck data and/or the local data to answer the original user prompt. This serves as a natural language to duckdb interface and follows the best practices outlined by openai.
Copy of message sent to MotherDuck:
Here are some updates:
The current state of the example:
Example Run:
union -c ~/.uctl/config-demo.yaml run --remote -p daniel ecommerce_wf.py wf --prompt="How many customers are there in the historical data compared to the recent data?"
https://demo.hosted.unionai.cloud/console/projects/daniel/domains/development/executions/fc7ba109f1c754508ad4/nodeId/n4/nodesBlog post: https://docs.google.com/document/d/1Xq3vwGyiAYUh6TlgwpgHdLGCDHGb7PdpmdF2HZo9RvE/edit