zazuko / xrm

A friendly language for mappings to RDF
MIT License
1 stars 0 forks source link

[rdb] DB metadata extractor #19

Open mchlrch opened 5 years ago

mchlrch commented 5 years ago

Inspecting DB metadata in order to reduce the work for the user to define SourceGroup/LogicalSource manually.

Approach: Read out DB metadata, build AST model and serialize to the DSL.

Goal: A standalone CLI tool that reads out DB metadata, generates the appropriate EMF object model for a SourceGroup and then serializes into DSL text.

xrm-cli -extract-sources -rdb db.properties > foobar-sources.xrm

For db.properties, we re-use Stardogs properties for mappings https://www.stardog.com/docs/#_available_properties:

jdbc.*
sql.schemas
default.mapping.include.tables
default.mapping.exclude.tables

For serializing to the DSL, is has to be considered that naming rules for the identifiers are strict. See Handling invalid identifiers for how this is handled.

mchlrch commented 4 years ago

We also want to use the inspection to store a history of the source DB schema. To track and detect schema evolution (eg. new table columns got added). Simply by keeping a textual representation of the inspected DB schema in git. For this, a textual representation (independent of the DSL) should be dumped as well.

mchlrch commented 4 years ago

On branch feature-19 I added a new module com.zazuko.rdfmapping.dsl.sourceinspection with SerializationSample that shows how to use the Xtext serializer to turn a programmatically built-up AST (the EMF Object Model) into textual DSL output.

One not so practical way of running the sample outside of Eclipse is currently mira@blinky:~/git/rdf-mapping-dsl/com.zazuko.rdfmapping.dsl.parent/com.zazuko.rdfmapping.dsl.sourceinspection$ mvn -q exec:exec -Dexec.executable="java" -Dexec.classpathScope="compile" -Dexec.args="-classpath %classpath com.zazuko.rdfmapping.dsl.sourceinspection.SerializationSample" > foobar-sources.xrm. This is particularly slow because the launch is using maven and is looking for updated dependencies etc.

Running the SerializationSample inside Eclipse is easy and fast.

The content of the generated DSL output file foobar-sources.xrm for this sample looks like this: source-types { csv referenceFormulation "ql:CSV" } logical-source airport { source "http://www.example.com/Airport.csv" referenceables foo "föö" }

Right now the output is all on one line, because we haven't implemented formatting rules yet #10

The output is missing type csv in the logical-source. I didn't figure out yet what the issue is, that's causing this to fail. There's a TODO in the source related to this.

mchlrch commented 4 years ago

I'm shelving this to the backlog for now due to inactivity.

For a customer project, I used a more pragmatic ad-hoc approach in the meantime: Instead of using the serializer of the DSL, I explicitly generate output in DSL syntax. One advantage of this approach is flexibility, for example to easily serialize addititional metadata from the DB, like column datatypes and comments as // comments in the DSL output

mchlrch commented 4 years ago

https://github.com/nnamtug/spring-boot-maven-plugin-example