rpbouman / huey

A UI for DuckDB
MIT License
182 stars 14 forks source link

Exclude filter value, should include NULL values #90

Closed IvoMoor closed 1 month ago

rpbouman commented 1 month ago

Problem is that exclude filter works by applying:

column != value

if the column contains NULL values, then this evaluation will return NULL, and thus, the row with the NULL value column will be exlcuded.

This is not intended - the exclude filter should only exclude those rows where the value matches the exclude set exactly.

With an Include filter, this problem is not felt, as a NULL value is not eqial to the include set.

rpbouman commented 1 month ago

INCLUDE can be implemented as a SEMI JOIN against a list of values, using

SEMI JOIN (values(..),(..),(..))
col.val IS NOT DISTINCT FROM list.col  

EXCLUDE can be implemented as an ANTI JOIN:

ANTI JOIN (values(..),(..),(..))
col.val IS NOT DISTINCT FROM list.col  

The advantage of this approach is that NULL is handled exactly like any other value, regarless of whether the NULL is from the dataset, or the value list.

This is a pretty big overhaul, we need to remove the WHERE clauses