vmware / versatile-data-kit

One framework to develop, deploy and operate data workflows with Python and SQL.
Apache License 2.0
429 stars 57 forks source link

Python DB API databases support #2421

Open sabadzhiev opened 1 year ago

sabadzhiev commented 1 year ago

Business case:

Provide ability for the VDK users to connect to a database of choice (e.g MySql, Postgres, etc) easily. The integration should be available and easy to configure. Then, it should be possible to ingest data into it without needing to write any ingestion code (insert queries) by default unless user needs something advanced. The query is managed by VDK and it can provide extra benefits - auto recovery, lineage, and whatever else VDK comes up in the future.

Users should be able to list possible(supported) DB types when configuring the desired DB.

This includes:

Potentially it could also include

See research so far in https://github.com/vmware/versatile-data-kit/tree/main/specs/vep-2421-universal-database-plugin

See implementation details in https://github.com/vmware/versatile-data-kit/issues/1444

Metrics: Number of databases used by VDK users.​ Time to configure and connect and validate to my own DB (target <1,5 minutes) ​ Time to ingest data into a complete new DB

Size: M(3 person sprints)

Included Milestones:

antoniivanov commented 1 year ago

Work on VEP was started here - https://github.com/vmware/versatile-data-kit/pull/2616 .