neo4j-contrib / sql2cypher

Experimental SQL to Cypher Transpiler using jooq and cypher-dsl
Apache License 2.0
26 stars 1 forks source link

Start Implementing Joins being Mapped to Relationship-Patterns #18

Closed jexp closed 7 months ago

jexp commented 1 year ago

There are a few options and extensions:

Foreign Key Field

SELECT p.name, m.title FROM `Person` as p JOIN `Movie` as m ON (p.directed = m.id)
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
RETURN p.name, m.title

Join Table

SELECT p.name, m.title, r.role 
FROM `Person` AS p 
JOIN ACTED_IN AS r ON (p.id = r.person_id) 
JOIN`Movie` AS m ON (r.movie_id = m.id)
MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
RETURN p.name, m.title, r.role

For the 2nd we can possibly use the table-mappings to make it exactly correct.

But it would be also good to have something that is lenient/smart about it, e.g. 3-way joins that have a all-caps table name in the middle would be a rel-type, or fk-joins where the field-name is all caps the same.

jexp commented 1 year ago

I might need some help from @lukaseder to start on the jOOQ parser side to get started quicker without trial and error but then can continue myself.

I think that's the most crucial feature missing for the initial usefulness of the library.

lukaseder commented 1 year ago

What do you need help with? Where to start? Probably in Translator.node(Table<?>), and then pattern match against all QOM.JoinTable subtypes

lukaseder commented 1 year ago

Of course, SQL can have "implicit joins", such as:

SELECT *
FROM
  Person AS p,
  ACTED_IN AS r,
  Movie AS m
WHERE p.id = r.person_id
AND r.movie_id = m.id

The commercial jOOQ editions can transform that to JOIN trees for you: https://www.jooq.org/doc/latest/manual/sql-building/queryparts/sql-transformation/table-lists-to-ansi-join

With the Open Source edition, this transformation would have to be done manually, if you think this should be done also in Cypher. Although, perhaps it would be better to transform such things directly on the Cypher side, in Cypher DSL?

jexp commented 1 year ago

Right but I would ignore implicit joins for relationships for the time being. Or we need to find some way to clearly mark tables as relationships.

Because in principle Cypher also allows this (which can be used for creating relationships / or for "rare" lookups).

MATCH (p:Person)
MATCH (m:Movie)
WHERE p.movie_id = m.id
lukaseder commented 1 year ago

Because in principle Cypher also allows this

Ah I see, I didn't know this. So then, this translator shouldn't be too opinionated about join styles.

jexp commented 1 year ago

@lukaseder I tried to add some minimal validation on the table name, but what I get as table from the join, just has the aliases? not the original table name?

        if (t instanceof QOM.Join<?> ta && ta.$on() instanceof QOM.Eq<?> eq) {
            var relType = eq.$arg1().getQualifiedName().last();
            Table<?> t1 = ta.$table1();
            Table<?> t2 = ta.$table2();

            System.out.println("t1 = " + Arrays.toString(t1.$name().getName()));
            System.out.println("t2 = " + Arrays.toString(t2.$name().getName()));
            if (mightBeNodeLabel(t1.$name().first()) && resolveTableOrJoin(t1) instanceof Node from

t1 = [p]
t2 = [m]

I'll leave it off for the time being, just curious on how to get the actual table name

lukaseder commented 1 year ago

but what I get as table from the join, just has the aliases? not the original table name?

The Table type is the top type of an entire type hierarchy, among which a TableImpl (doesn't yet have a QOM type) or a TableAlias, and many more. In Translator.node(Table), you can see a simple disambiguation between the two cases.

michael-simons commented 1 year ago

I improved this now quite a bit.

Multiple joins will result in a chain of relationships:

[source,sql,id=t2_1,name=join_multiple_table_join,table_mappings=people:Person;movies:Movie;movie_actors:ACTED_IN;movie_directors:DIRECTED]
----
SELECT p.name AS actor, d.name AS director, m.title
FROM people p
 JOIN movie_actors r ON r.person_id = p.id
 JOIN movies m ON m.id = r.person_id
 JOIN movie_directors r2 ON r2.movie_id = m.id
 JOIN people d ON r2.person_id = d.id
----

as demonstrated by

[source,cypher,id=t2_1_expected]
----
MATCH (p:`Person`)-[r:`ACTED_IN`]->(m:`Movie`)<-[r2:`DIRECTED`]-(d:`Person`)
RETURN p.name AS actor, d.name AS director, m.title
----

Notice how the direction of the `DIRECTED` relationship is defined by the order of the join columns.
michael-simons commented 7 months ago

There has been some improvements in the JDBC driver already on that topic, but I would like to restart the discussion there on concrete user feedback.