Closed IvoMoor closed 1 month ago
INCLUDE can be implemented as a SEMI JOIN against a list of values, using
SEMI JOIN (values(..),(..),(..))
col.val IS NOT DISTINCT FROM list.col
EXCLUDE can be implemented as an ANTI JOIN:
ANTI JOIN (values(..),(..),(..))
col.val IS NOT DISTINCT FROM list.col
The advantage of this approach is that NULL is handled exactly like any other value, regarless of whether the NULL is from the dataset, or the value list.
This is a pretty big overhaul, we need to remove the WHERE clauses
Problem is that exclude filter works by applying:
column != value
if the column contains NULL values, then this evaluation will return NULL, and thus, the row with the NULL value column will be exlcuded.
This is not intended - the exclude filter should only exclude those rows where the value matches the exclude set exactly.
With an Include filter, this problem is not felt, as a NULL value is not eqial to the include set.