intuit / superglue

Superglue is a lineage-tracking tool built to help visualize the propagation of data through complex pipelines composed of tables, jobs and reports.
Apache License 2.0
153 stars 37 forks source link

Provide ability to configure SQL dialects and platform for SQL Input Paths. #23

Closed sambekar15 closed 3 years ago

sambekar15 commented 3 years ago

Is your feature request related to a problem? Please describe. Currently we are using default MYSQL dialect while parsing queries be it - hive,vertica,redshift,sparksql . Provide ability to configure SQL dialects - Eg SparkSQL,Vertica,Redshift,Hive etc for SQL Input Paths.

Dialect can be specified for each kind of sqlInput file Path

For example for the above config com.intuit.superglue { pipeline { outputs.database.enabled = true inputs.files = [{ base = "/Users/sambekar/GIT/care_analytics/care_analytics" kind = "sql" includes=["glob:/*.sql"] }, { base = "/Users/sambekar/GIT/care_analytics/care_analytics" kind = "hql" includes=["glob:*/.hql"] }, { base = "/Users/sambekar/GIT/sbg_stable_analyst_scripts/sbg_stable_analyst_scripts" kind = "sql" includes=["glob:/*.sql"] } ] } dao { backend = "relational" relational.db { url = "jdbc:mysql://localhost:3314/superglue" user = "root" password = "superglue_development" } } }

Proposed Config :

com.intuit.superglue { pipeline { outputs.database.enabled = true inputs.files = [{ base = "/Users/sambekar/GIT/care_analytics/care_analytics" kind = "sql" includes=["glob:/*.sql"] dialect = "VERTICA" }, { base = "/Users/sambekar/GIT/care_analytics/care_analytics" kind = "hql" includes=["glob:*/.hql"] dialect = "SPARKSQL" }, { base = "/Users/sambekar/GIT/sbg_stable_analyst_scripts/sbg_stable_analyst_scripts" kind = "sql" includes=["glob:/*.sql"] dialect = "REDSHIFT" } ] } dao { backend = "relational" relational.db { url = "jdbc:mysql://localhost:3314/superglue" user = "root" password = "superglue_development" } } }

lingyv-li commented 3 years ago

Hi @sambekar15 , I'd like to work on this issue.