intuit / superglue

Superglue is a lineage-tracking tool built to help visualize the propagation of data through complex pipelines composed of tables, jobs and reports.
Apache License 2.0
153 stars 37 forks source link

Provide configurable ability to plug-in any parser #22

Closed sambekar15 closed 2 years ago

sambekar15 commented 3 years ago

Is your feature request related to a problem? Please describe. Currently we have implemented Calcite parser. We have defined interfaces so it's easy to plug in any parser - Calcite,gSQLParser but its hardcoded. Make it configurable .

Current implementation picks up parser depending upon kind of file and type of parser is hard-coded. For example this config: com.intuit.superglue { pipeline { outputs.database.enabled = true inputs.files = [{ base = "/Users/sambekar/GIT/care_analytics/care_analytics" kind = "sql" includes=["glob:/*.sql"] }, { base = "/Users/sambekar/GIT/care_analytics/care_analytics" kind = "hql" includes=["glob:*/.hql"] }, { base = "/Users/sambekar/GIT/sbg_stable_analyst_scripts/sbg_stable_analyst_scripts" kind = "sql" includes=["glob:/*.sql"] } ] } }

picks up calcite only for sql files because it is hard-coded in ParsingPipeline class . Parses only sql files. Calcite is able to pass hql files as well but this config filters that out. Instead make the parser type configurable (could be calcite,gsqlparser etc..) and don't filter on kind of file Proposed Config:

com.intuit.superglue { pipeline { outputs.database.enabled = true parserEngine = "calcite" inputs.files = [{ base = "/Users/sambekar/GIT/care_analytics/care_analytics" kind = "sql" includes=["glob:/*.sql"] }, { base = "/Users/sambekar/GIT/care_analytics/care_analytics" kind = "hql" includes=["glob:*/.hql"] }, { base = "/Users/sambekar/GIT/sbg_stable_analyst_scripts/sbg_stable_analyst_scripts" kind = "sql" includes=["glob:/*.sql"] } ] } }

sambekar15 commented 2 years ago

Not required