mindplay-dk / sql

Database framework and query builder
Other
17 stars 6 forks source link

Improve expr::all() and expr::any() #28

Closed mindplay-dk closed 4 months ago

mindplay-dk commented 7 years ago

Improve expr::all() and expr::any(), so that there is no risk of unexpected precedence issues when combining nested AND/OR expressions.

For example, the following might lead to an unexpected net condition:

$db->select($table)
    ->where("one OR two")
    ->where("three OR four");

The result here would be (one OR two AND three OR four), which will evaluate as one OR (two AND three) OR four due to AND taking precedence over OR, which was clearly not the intent in this case.

Try to avoid adding excessive parens - it doesn't have to be perfect, but maybe do a preg_match() to see if a given expression might need extra parens.

Use the improved expr::all() inside ProjectionQuery, where this is currently hardcoded - and anywhere else, if this is hardcoded in other places.

boan-jfm commented 7 years ago

Please mention the expr-feature within the documentation.

mindplay-dk commented 7 years ago

@boan2010 yeah, the section on building queries is very slim, since this was quite experimental for a while and changed shape several times along the way - it's now pretty stable, so I've created issue #30 to improve the documentation.

mindplay-dk commented 4 months ago

@boan-jfm did you solved this problem on your end?

I'm thinking of using the following regex in the where method to check if an expression is fully parenthesized:

https://regex101.com/r/VVOmLF/1

if the expression is not already fully parenthesized, the where method would add parens around the expression when it gets added.

the test suite will include a test case based on the following:

$db->select($user)
    ->where("true OR true")
    ->where("false OR false")

This illustrates the precedence problem, in which:

Alternatively, the simpler but valid solution would be to just always add parens in the where method, but this doesn't exactly do much for readability - as you know, the expr helpers will also add parens, so it might be nice if where only added parens to expressions that aren't already fully parenthesized, as some of the expr expressions will be.

mindplay-dk commented 4 months ago

The idea with the existing pattern in the expr helpers is pretty simple: grouping same-operator expressions (all AND, or all OR) avoids problems with nesting - and compared with the alternative solution, where you add parens around every operand ((a) AND (b) etc.) and this results in fewer parens overall.

However, this approach only works barring you build every expression with the expression helper functions and don't use any raw SQL - to illustrate the problem, consider expr::all(["a OR b", "c OR d"]), which incorrectly yields (a OR b AND c OR d).

The conclusion is that, if you want to combine unknown expressions, you have to make sure those expressions are grouped with enclosing parens.

However, if we assume that expressions may not be grouped (the safe assumption) then it also follows that the expr helpers do not need to fully group the combined expression (as they currently do) since, if those resulting expression somehow get grouped (by another expr call, or indirectly by adding them to where or having) those expression would get grouped anyway.

I implemented the change here - however, the benchmark difference was completely unacceptable at 20x slower. This was due to the regular expression, which is extremely slow. I tried inlining the group method body, but this made no real difference - the regular expression was just too slow, and the group method was removed in a subsequent commit here.

So we get some extra parens, at least in theory, but the good news is it appears to make no real difference in practice - as you can see from the test changes in that second commit, this barely affected the tests, even for the complex nested queries. Conditional grouping saved a total of one set of extra parens in the most complex query in the test-suite.

In practice, it looks like my reservations about this approach weren't even valid. 😄

Case closed.