Provide ability for the VDK users to connect to a database of choice (e.g MySql, Postgres, etc) easily. The integration should be available and easy to configure.
Then, it should be possible to ingest data into it without needing to write any ingestion code (insert queries) by default unless user needs something advanced.
The query is managed by VDK and it can provide extra benefits - auto recovery, lineage, and whatever else VDK comes up in the future.
Users should be able to list possible(supported) DB types when configuring the desired DB.
This includes:
vdk-database plugin (provide support for any DB API compatible database)
Automatic detection (possibly install) of db python library for the db
Connection validation
Base Ingestion support
Easy configuration
Potentially it could also include
DB agnostic modelling templates (SDC2, SDC1).
DB agnostic detailed lineage (already partially already provided using vdk-lineage)
Metrics:
Number of databases used by VDK users.
Time to configure and connect and validate to my own DB (target <1,5 minutes)
Time to ingest data into a complete new DB
Business case:
Provide ability for the VDK users to connect to a database of choice (e.g MySql, Postgres, etc) easily. The integration should be available and easy to configure. Then, it should be possible to ingest data into it without needing to write any ingestion code (insert queries) by default unless user needs something advanced. The query is managed by VDK and it can provide extra benefits - auto recovery, lineage, and whatever else VDK comes up in the future.
Users should be able to list possible(supported) DB types when configuring the desired DB.
This includes:
Potentially it could also include
See research so far in https://github.com/vmware/versatile-data-kit/tree/main/specs/vep-2421-universal-database-plugin
See implementation details in https://github.com/vmware/versatile-data-kit/issues/1444
Metrics: Number of databases used by VDK users. Time to configure and connect and validate to my own DB (target <1,5 minutes) Time to ingest data into a complete new DB
Size: M(3 person sprints)
Included Milestones: