nette / nette

👪 METAPACKAGE for Nette Framework components
https://nette.org
Other
1.53k stars 234 forks source link

Database\Table\Selection::where and NOT IN construct #667

Closed tomaswindsor closed 11 years ago

tomaswindsor commented 12 years ago

Currently, when using $db->table('book')->where('NOT id', array()) or $db->table('book')->where('id NOT', array()) it will result in query SELECT * FROM book WHERE NOT ID IN (NULL) or SELECT * FROM book WHERE ID NOT IN (NULL) which will both return no rows because of special behavior of NULL. Same problem occurs for MySQL subqueries in $db->table('book')->where('NOT id', $subQuery), because Nette\Database performs optimization and replaces the actual subquery with its result, and if this result is empty, it will use (NULL).

Solution could be replacing the (NULL) with some kind of query that returns an empty result (no idea about that one). Alternatively instead of empty list, we could use list containing special unique value, e.g. WHERE NOT ID IN ('~~NDB~NO~VALUES~~'). Solution with simply removing whole condition in case of NOT is insufficient because NULL can appear also on the left side of the IN operator, and removing whole condition would then cause different behavior - imagine $db->table('book')->where('NOT NULL', array()) - I know this example doesn't look to be of much use, but keep in mind that NULL can be result of some sql function used in the condition as well...

temistokles commented 12 years ago

Is there any workaround?

hrach commented 11 years ago

Well, it's mysql thing. If you dont want to restrict the seleciton, don't call where condition.

$selection = $db->table('book');
if ($bookIds)
     $selection->where('NOT IN', $bookIds);
tomaswindsor commented 11 years ago

By saying it's mysql thing, you mean in other databases this works well? Either way the behavior of this for mysql has big WTF factor, especially when using subqueries (which are optimized internally by NDB for MySql - and therefore won't work, while they would have worked if left unoptimized). Is the solution using special constant really so unacceptable? I think it would solve most problems for now, and perhaps later mysql comes with a possibility how to resolve this in cleaner way...

dg commented 11 years ago

Tohle je celkem známý problém, viz http://php.vrana.cz/operator-in-s-prazdnou-mnozinou.php#d-8096. Teoreticky workaround by existovat mohl.

hrach commented 11 years ago

Srr, you are right, it's not only mysql problem, it's also in postgre. Only workaround which I know is to pass an magic value, which can't be present. Such ~~NDB~NO~VALUES~~, etc...

tomaswindsor commented 11 years ago

Alternatively, it is also possible to use subquery such as SELECT 0 FROM $table WHERE 0, where $table is a string containing name of any existing table (e.g. the currently accessed one). Unfortunately it is not possible to use the dummy table dual here, as it seems to ignore the WHERE 0 part when used within the NOT IN construct (at least in MySql).

milo commented 11 years ago

The trick is that NULL in comparison to anything else is always NULL, not BOOL. So, with NOT IN you can use:

SELECT * FROM table WHERE COALESCE(id NOT IN (NULL), TRUE);

I'm sure on PostgreSQL only...

hrach commented 11 years ago

Wow! Works in MySQL too. I'm not sure if nette should add COALESCE automatically.

milo commented 11 years ago

This is a better solution than subquery, because you do not have to care about subquery return type.

But when I take a look at SqlBuilder::addWhere(), you must detect condition negation NOT. After that, you must detect emptiness of parameters (already done). And when you know these two things, you can simply remove whole condition or replace it by TRUE/FALSE. So COALESCE or IN (NULL) is not longer needed.

Maybe I'm missing something, I'm a Nette\Database amateur.

milo commented 11 years ago

@hrach What do you think https://github.com/milo/nette/commit/65f05ee514b48a76c5e88c5fe8e7a681dfcfeca4?

tomaswindsor commented 11 years ago

The COALESCE is good idea, but in presented form it can be used only when the left side of the IN construct is not NULL. You can e.g. have WHERE NULL NOT IN (NULL) (or you can replace the first NULL with something that can be NULL) and then you expect it to filter out those rows where it is NULL, but COALESCE would change this behavior. So maybe something like this would work better:

SELECT * FROM table WHERE (NOT ISNULL(expression) AND COALESCE (expression NOT IN (NULL), TRUE))

... where expression contains whatever is on the left side of the IN, like in:

$db->table('table')->where("$expression NOT IN", $array_or_subquery); or $db->table('table')->where("NOT $expression IN", $array_or_subquery);

However all this relates to ANSI NULLS being enabled, but some RDBMS allow to disable it...

hrach commented 11 years ago

Well, I would avoid stripping condition and parameter analysis. There is no need to do it. SELECT * FROM users WHERE COALESCE(userId IN (1), TRUE); works as well as SELECT * FROM users WHERE COALESCE(NOT userId IN (NULL), TRUE); It leads (at least) to better variable caching key.

$db->table('table')->where("$expression NOT IN", $array_or_subquery); is definetelly NOT valid use of Nette\Database, not escaped $expression, IN keyword is duplicit, NOT only at the begining...

tomaswindsor commented 11 years ago

@hrach You didn't address the issue I presented at all. In your example, you work with primary key (userId) which you expect to be always non-NULL. If you consider cases with column that can contain NULL (or more generally an expression that evaluates to NULL) , you will realize COALESCE changes the behavior (if ANSI NULLS are enabled).

My PHP example was merely meant to illustrate what i mean by expression. It was by no means meant to show how Nette\Database should be used. But your argument that it was not escaped is speculative, as I didn't specify whether the $expression was or wasn't pre-escaped (escaping was irrelevant for what I wanted to show). Also I fail to see any duplicate IN in my illustration. Both notations are valid btw: expression NOT IN (...) and NOT expression IN (...).

hrach commented 11 years ago

Duplicit IN: see https://github.com/nette/nette/blob/master/Nette/Database/Table/SqlBuilder.php#L181

Oh, I get it now, maybe you should try to express you thoughts better :D Just write where('nullableColumn IN (NULL)'). So @milo solution is right. Let's do it.

tomaswindsor commented 11 years ago

Ok, you were right about the duplicate IN, my bad.

As for the actual problem we are discussing, I am glad you got it now, but I am afraid solution of @milo is also not respecting ANSI NULLS behavior for operand left of the IN operator. I am afraid replacing the condition with boolean is not the way because of this, as only the database knows what the left operand will evaluate to (whether it will be NULL or something else) and therefore PHP can't resolve it this way.

What we need is to find a condition that could be used for empty list of values in combination with NOT IN and would work preferably for both ANSI NULLS enabled or disabled. Solution with COALESCE (when extended by check for NULL) might work for ANSI NULLS enabled, but what if they are disabled? So far I am inclined to solution with the special constant, but if there was another solution without described negative side effects, I would prefer that one.

hrach commented 11 years ago

From my point of view I would not deal with empty array as "NULL" in "IN". If you want to filter by NULL, you would use "array(NULL)".

tomaswindsor commented 11 years ago

@hrach I am not sure if you wanted to react on me in your last comment. In case you did, you probably missed the point again, in which case I suggest we discuss the matter in czech. I agree that adding more NULLs only complicates things here. Hence my inclination to solution using special constant or subquery that returns empty result set (whichever is more efficient).

tomaswindsor commented 11 years ago

Maybe I found the solution. For empty arrays, following replacement would occur:

WHERE expression NOT IN () would be replaced with: WHERE expression IS NOT NULL and WHERE expression IN () would be replaced with: WHERE FALSE

Or did I miss something?

milo commented 11 years ago

Souhlas s češtinou :-)

Vše je o situacích:

1. expr IN ()
2. expr NOT IN ()

3. expr IN (NULL)     // db's decision
4. expr NOT IN (NULL) // db's decision

5. expr IN (1, 2, 3)     // db's decision
6. expr NOT IN (1, 2, 3) // db's decision

A celá naše diskuze je o tom, že se snažíme opravit SQL chování (syntax error) pro "prázdnou množinu", tedy případ 1 a 2.

Myslím, že 1. by mělo být vždy FALSE, protože množina je prázdná a není v ní ani nějaká neznámá hodnota. A 2. situace vždy TRUE ze stejného důvodu. Když si to uživatel bude chtít změnit, může si přidat další podmínku na přidání/odebrání NULL sloupců.

hrach commented 11 years ago

@tomaswindsor: @milo solution is right, because it no longer solve 1st / 2nd as 3rd / 4th expression - which is the cutting edge.

tomaswindsor commented 11 years ago

Máš pravdu. Z nějakého důvodu jsem měl za to, že když je expr NULL, tak to v případech 1 a 2 vrací NULL, ale teď jsem si ověřil, že to tak není. Jediné, co mě ještě napadá, že by mohl být problém, je situace, kdy expr by obsahovalo něco s vedlejším efektem, např. (@i:=@i+1). To by šlo vyřešit tak, že v případě 1 by se použilo (expr) AND FALSE a v případě 2 (expr) OR TRUE.

milo commented 11 years ago

I made a pull request https://github.com/nette/nette/pull/869. If you agree, we can continue discuss there.