neo4jrb / neo4j-core

A simple unified API that can access both the server and embedded Neo4j database. Used by the neo4j gem
MIT License
99 stars 80 forks source link

feature: .union() & .union_all() query clause methods #309

Open jorroll opened 6 years ago

jorroll commented 6 years ago

Fixes #308

Currently, both .union() and .union_all() are working for my (limited) testing.

I wanted to run my implementation by you before writing any tests / docs. You were correct, the query clause method implementation is rather confusing.

You can pass either a query object or a string to union / union_all.

Some added benefits come if you use a query object for the union clause's argument:

Pings: @cheerfulstoic @subvertallchris

coveralls commented 6 years ago

Coverage Status

Coverage decreased (-8.8%) to 65.738% when pulling a0bbf243f7f0b4c7f3a2df9941541d6a8a63d1f1 on thefliik:master into 00190f40810b8fbb559ede1592273501d1230349 on neo4jrb:master.

coveralls commented 6 years ago

Coverage Status

Coverage decreased (-8.8%) to 65.738% when pulling a0bbf243f7f0b4c7f3a2df9941541d6a8a63d1f1 on thefliik:master into 00190f40810b8fbb559ede1592273501d1230349 on neo4jrb:master.

jorroll commented 6 years ago

Also: this should probably be updated so that if .proxy_as(Class, :symbol) is called on the query, the returns get overridden with the :symbol (like with .pluck())

cheerfulstoic commented 6 years ago

What happens if you make a new Query object with the union methods and then try to call a clause method? Should a unioned Query object be frozen?

I'm a bit worried about the added complexity (to an already really complex pair of classes). UNION feels like it's a fundamentally different sort of thing than what most of Query already takes care of. Looking briefly at the code, the overwriting of to_cypher is what gives me the thought that "one of these things is not like the other" and that it should perhaps be modeled differently. Being able to build strings or execute a query based on the UNION of a set of Query objects could be one simple way to take care of it, but there could alternately perhaps be a UnionQuery class which does things like

jorroll commented 6 years ago

This PR follows what I proposed over in #308. Namely:

Calling .union() on a query always adds a new union clause to the query. Once added, that union clause cannot be modified. Calling query_as(:n).match(n: {color: 'purple'}).union(query_as(:n).match(n: {color: 'blue'})) would be the same as query_as(:n).union(query_as(:n).match(n: {color: 'blue'})).match(n: {color: 'purple'}). I.e. additional query clause methods modify the original query, ignoring any union clauses.

It looks like Neo4j expects all union returns to have the same columns (both name and number). So it would make sense to validate that and also: if you plucked the root query directly (.pluck()) to simply overwrite the return of the root and all unions so that they returned whatever was in the pluck.

This implementation doesn't attempt to validate the columns. Instead, the recommended usage is to pass in a query object as the argument to the union clause and then pluck() the result (which simply overwrites all the returns so that the columns are identical--which is what Neo4j expects). You can pass in a cypher string though, in which, a pluck() will ignore the string but still overwrite any other query objects.

The modified to_cypher is simply to allow pretty formatting in the console.

but there could alternately perhaps be a UnionQuery class which does things like

  1. Take in a set of Query objects
  2. Validate the columns
  3. Allow enumerating on results

I'm not sure how this would be different from adding a UnionClause class? In the UnionClause you pass in query objects, the columns are ensured to be valid if you pluck the return, and you can enumerate the results. I kinda see what you mean about union feeling different from the other clauses (seeing as you pass in a whole new query to a union clause). Except the result from a union query is the same as the result from any other query, so I'm not sure adding a new interface for union queries makes sense.

tjad commented 5 years ago

Just my 2 sense.

I've read from #308 until here, and I perceive unnecessary discrepancies for "union_all". There are use cases for this, and it is most beneficial when wanting the database engine to do work before retrieving results.

The syntax is not confusing, and in fact, is much easier to use with the neo4jrb framework. Create 2 queries , q1 and q2

q1.union_all(q2) Both queries have to contain a return clause - they are both executed by the neo4j database, and the results for both queries are unioned and then returned to the client as if it was a single query.

What am I missing ?

UPDATE: As in, it is up to the query writer to make sure that the result of both queries return similarly structured nodes. The single resultset that the engine returns should be proxied just as per a single query (as there is 1 resultset)

https://neo4j.com/docs/developer-manual/current/cypher/clauses/union/

tjad commented 5 years ago

Would be great if I could at the very least do q1 = MyObj.as(:m).some_rel.where(prop1: 'a value') q2 = MyObj.as(:m).where_not('(m)-[:SOME_REL]-()') q1.union(q2) #query returning MyObj with some_rel relationship entity's prop1='a value' OR MyObj with no some_rel relationships yet

And a QueryProxy of MyObj were returned. (or an array - I guess why you mentioned to do pluck_union) Perhaps at this point, if I wanted to continue chaining, the next part of the chain would be fed the result nodes as starting points. But disregard the complex possibilities for now... get the simple use cases working.

UPDATE: Yes, I realise that https://github.com/neo4jrb/neo4j would be responsible for the QueryProxy part of it

jorroll commented 5 years ago

@tjad I stopped pushing forward with this because @cheerfulstoic was busy at the time and because I was able to refactor my code to remove the need for a union clause.

At the moment, if you want to perform a union query, you'll need to either use the existing union_cypher or just execute a raw query string.