neo4jrb / neo4j-core

A simple unified API that can access both the server and embedded Neo4j database. Used by the neo4j gem
MIT License
99 stars 80 forks source link

Unexpected behavior: #union_cypher returns String object rather than Query object #308

Open jorroll opened 6 years ago

jorroll commented 6 years ago

While attempting to create a query involving the union of three sub-queries, I discovered that union_cypher expects a query object but returns a string rather than another query object. This prevents me from performing a union query on a union query. Given that all the other query methods return a query object, is there a reason for this behavior? I guess it kinda explains the name union_cypher

https://github.com/neo4jrb/neo4j-core/blob/00190f40810b8fbb559ede1592273501d1230349/lib/neo4j-core/query.rb#L376-L386

How would you feel about a PR to either create a new query method union that returns another query object for further chaining?

Alternatively (and easier to implement), union_cypher could be updated so that it tests to see if its argument is already a string and, if so, simply inserts it without first calling to_cypher on the argument.

Thoughts?

cheerfulstoic commented 6 years ago

I'd be fine with a union method that returns a Query object, though I have a lot of questions (which is part of why I didn't make it in the first place):

Generally, I think the main reason I didn't implement a UNION is because the underlying structure of Query doesn't feel like it jives with a Query representing a UNION

I think I'd be fine with union_cypher taking a string, but what about the idea of union_cypher taking multiple Query arguments (perhaps via a def union_cypher(*others, options = {}) definition)?

jorroll commented 6 years ago

what about the idea of union_cypher taking multiple Query arguments (perhaps via a def union_cypher(*others, options = {}) definition)?

I like that đź‘Ť .

Regarding a union query method, I see what you mean about confusing to implement. My main annoyance with union_cypher is that I need to put it into Neo4j::ActiveNode.query(). I can't simply call .exec or pluck() on it.

Maybe it would be best to leave a union() method out, but the implementation that comes to mind would be:

  1. Calling .union() on a query always adds a new union clause to the query. Once added, that union clause cannot be modified. Calling query_as(:n).match(n: {color: 'purple'}).union(query_as(:n).match(n: {color: 'blue'})) would be the same as query_as(:n).union(query_as(:n).match(n: {color: 'blue'})).match(n: {color: 'purple'}). I.e. additional query clause methods modify the original query, ignoring any union clauses.

  2. It looks like Neo4j expects all union returns to have the same columns (both name and number). So it would make sense to validate that and also: if you plucked the root query directly (.pluck()) to simply overwrite the return of the root and all unions so that they returned whatever was in the pluck.

Does this make sense? Thoughts?

cheerfulstoic commented 6 years ago

Yeah, I would be fine with what you’re proposing, though I would warn that the clause logic can get a bit complicated if you were thinking about implementing this.

Another thought: union_pluck and union_return which do the same as union_cypher but pluck / return for you.

jorroll commented 6 years ago

I see what you mean about the logic being confusing. I've got an implementation of .union and .union_all() query clause methods working in #309. Tests still need to be written and the code cleaned up, but I wanted to run it by you first.

I also tried to implement an improved union_cypher, before realizing that def union_cypher(*others, options = {}) wouldn't work, as options will never get a value. Not 100% sure if ruby would even let me put a splat before a regular argument like that. I decided not to pursue it further since I think .union & .union_all will provide better replacements.

jorroll commented 6 years ago

PS: I've seen you mention a few times how you have mixed feelings about neo4j-core's automatic query clause ordering (occasionally necessitating the use of.break). I'll admit, upon first usage I was surprised by the feature and it took me a bit to figure out what was going on. This being said, once I figured out what was going on, I found the api to be easy to work with, and far more powerful than an api without query ordering. I've definitely made use of the feature in a few places, and it would be very tedious / annoying to replicate on my own (but it's very easy to work around with .break, on the occasions I don't want it). My two cents.

cheerfulstoic commented 6 years ago

@thefliik That's fair, though I think there might be room for a hybrid implementation. Like certain clauses should combine, but not others. Like if you have multiple where method calls in order those should probably combine. But if there's a match in between, they shouldn't because WHERE is actually a modifier of MATCH (most people don't realize this and it causes strange issues because if you try do do MATCH ... WHERE ... MATCH ... WHERE ... and the second WHERE refers to a variable which was created in the first MATCH, you won't actually get filtering.

cheerfulstoic commented 6 years ago

Something that I just thought of: It's all fine to be able to understand a tool once you've figured it out, but in order to increase adoption I think it's important to make sure that things are as easy as possible (without compromising the design, of course)

jorroll commented 6 years ago

That's a great point. So as long as we're talking about "wouldn't it be great if." It'd probably be best if match clauses, by default, never re-arranged themselves, but there was an option to turn on re-arranging of some kind (something that more advanced users could discover).

cheerfulstoic commented 6 years ago

Yeah, probably MATCH clauses shouldn't rearrange themselves. There's a trickier question, perhaps, in if sequential match calls should be combined into one MATCH clause, since there is actually a difference between specifying two MATCH clauses and putting the same clauses into a single MATCH and separating them with commas. See this SO question and it's answers for details:

https://stackoverflow.com/questions/34089692/what-does-a-comma-in-a-cypher-query-do