Python DB API databases support

Business case:

Provide ability for the VDK users to connect to a database of choice (e.g MySql, Postgres, etc) easily. The integration should be available and easy to configure. Then, it should be possible to ingest data into it without needing to write any ingestion code (insert queries) by default unless user needs something advanced. The query is managed by VDK and it can provide extra benefits - auto recovery, lineage, and whatever else VDK comes up in the future.

Users should be able to list possible(supported) DB types when configuring the desired DB.

This includes:

vdk-database plugin (provide support for any DB API compatible database)
Automatic detection (possibly install) of db python library for the db
Connection validation
Base Ingestion support
Easy configuration

Potentially it could also include

DB agnostic modelling templates (SDC2, SDC1).
DB agnostic detailed lineage (already partially already provided using vdk-lineage)

See research so far in https://github.com/vmware/versatile-data-kit/tree/main/specs/vep-2421-universal-database-plugin

See implementation details in https://github.com/vmware/versatile-data-kit/issues/1444

Metrics: Number of databases used by VDK users. Time to configure and connect and validate to my own DB (target <1,5 minutes) Time to ingest data into a complete new DB

Size: M(3 person sprints)

Included Milestones:

Python DB API databases support in VDK Discovery and POC

vmware / versatile-data-kit

Python DB API databases support #2421