yesodweb / yesodweb.com-content

Content for the www.yesodweb.com site
http://www.yesodweb.com/
Other
67 stars 112 forks source link

NULL=NULL is NULL in Postgres #199

Open deviant-logic opened 6 years ago

deviant-logic commented 6 years ago

The section on persistent says:

The reason for this is that the SQL standard is ambiguous on how uniqueness should be applied to NULL (e.g., is NULL=NULL true or false?). Besides that ambiguity, most SQL engines in fact implement rules which would be contrary to what the Haskell datatypes anticipate (e.g., PostgreSQL says that NULL=NULL is false, whereas Haskell says Nothing == Nothing is True).

My understanding had alway been that NULL=NULL gave NULL, and indeed in the Postgres on my machine at the moment (9.6.5) as well as the sqlite, select (null = null) is null evaluates to true. The article on SQL Nulls believes the distinction is worth calling out in its own section.

MaxGabriel commented 6 years ago

Maybe this should be reworded to specifically talk about uniqueness constraints? Postgres, MySQL, and SQLite will all allow duplicate NULL values in a column with a unique index, whereas in Haskell, Data.List.nub [Nothing, Nothing, Just 'a'] returns [Nothing, Just 'a']

deviant-logic commented 6 years ago

I think it might just be that it's misleading (at least from the SQL perspective) to suggest that nullable columns in SQL are just like Maybes in Haskell.

MaxGabriel commented 6 years ago

Can you elaborate?

deviant-logic commented 6 years ago

Apologies for the long delay getting back to this:

I was not trying to address the relationship of null semantics to uniqueness constraints, but the specific claims about null at all. For instance, in the quoted passage:

(e.g., is NULL=NULL true or false?)

To the best of my knowledge (I don't have the standard open at the moment) the answer to that question should not contribute to the putative ambiguity of uniqueness constraints: null=null is neither true nor false, it is null.

I think it's misleading to suggest nullable columns are just like Maybes in Haskell because I believe someone familiar with Maybe but unfamiliar with SQL would be surprised by the latter's behavior. To elaborate:

A Haskell programmer expects the equality function to be boolean valued (i.e., (==) :: Eq a => a -> a -> Bool), but in SQL, it can (and, I believe must) give null when either of its arguments are null. It's tempting to suggest that what's really going on is that everything in the database is actually in Maybe, and SQL = should be liftM2 (==), but that would imply that non-nullable columns are all actually Maybes that just happen to always be Just. Not only does that severely weaken the metaphor ("an int column is just like Int, a nullable int column is just like Maybe Int.... but we need = to make sense, so actually a non-nullable int column is more like having implicit pures everywhere..."), it doesn't generalize to other SQL behavior. For example, suppose we have:

-- I'm using the same postgres 9.6.5 from above
create table foo (bar int unique);
insert into foo values (1);
insert into foo values (2);
insert into foo values (3);
insert into foo values (null);
insert into foo values (null);
insert into foo values (4);
insert into foo values (null);

In Haskell (under the null-as-Maybe metaphor) this is:

foo :: [Maybe Int]
foo = [Just 1, Just 2, Just 3, Nothing, Nothing, Just 4, Nothing]

So we try select count(*) from foo, and get back 7. Great, length foo also gives us 7. But then we try select count(bar) from foo and get back 4. Hmm. Haskell-programming-me doesn't really see a difference here, but maybe count(*) is just special (turns out it is), and I can believe that count is sql means something more like fmap getSum . foldMap (fmap (const (Sum 1))). That would certainly make it somewhat closer to reasonable-seeming that select count(*) from foo where not (bar < 3) gives back 2, as does select count(*) from foo where bar >= 3. We get trichotomy on non-null values, but we're working under the Maybe monoid, so of course things get somewhat weird around the Nothings. That also makes select count(distinct(*)) from foo giving us 4 somewhat okay---in Haskell, length . nub $ foo gives us 5, but the weirder monoidal stuff with getSum will throw out the stray unique Nothing.

Let's try select 1 + bar from foo. My psql shows me 7 rows with the same results as fmap (fmap succ) foo would give. It appears as if + is behaving as liftM2 (+) like = does. So I probably should have written fmap (Just succ <*>) foo. Perhaps the rule is "nullable columns are like Maybes, and binary operators get pushed through <*> when applied to them (oh, and when the column's not nullable, we pure it to get things to line up right)". This doesn't leave a good story for binary operators like is distinct from, which is boolean valued like (==), or unary operators like is null which can similarly "see through the monad". But it works for addition.

So now we try select sum(bar) from foo. I (both as the character in this narrative and its author) honestly can't tell whether the resulting 10 is counterintuitive at this point. On the one hand, I want to believe that sum is using +, as if I'd written foldr (liftM2 (+)) (Just 0) foo, but the Foldable law says that sum = getSum . foldMap Sum. Since there's no Num instance for Maybe Int, presumably the right interpretation would be that SQL sum corresponds to Haskell fmap getSum . foldMap (fmap Sum)? Pedagogically, I feel like I'm making more progress understanding SQL null by attempting to come up with an explanation for how the metaphor could work despite my expectations rather than due to them. That's a pretty useful exercise, but one that at minimum deserves some warning.

Then there's outer joins, coalesce, in, etc. Writing all this, it occurs to me that I'm probably actually just channeling https://web.archive.org/web/20090813030219/http://thoughts.j-davis.com/2009/08/02/what-is-the-deal-with-nulls/, which, once I realized I had to go through the wayback machine, I was able to find again.

MaxGabriel commented 6 years ago

Are you asking for a change in Persistent behavior, or just a change to the docs? If just a change to the docs, can you suggest a revision of how you think they could be made more accurate?

deviant-logic commented 6 years ago

I am not addressing the behavior persistent; this is purely a comment about the docs. And it was intended as a very modest comment at that. Per the title/description, the quoted excerpt says that postgres believes that NULL=NULL is false, which does not appear to be the case. The smallest possible fix would change the word "false" to "NULL":

(e.g., PostgreSQL says that NULL=NULL is NULL, whereas Haskell says Nothing == Nothing is True).

I think it would be nice to add some verbiage addressing the fidelity-or-lack-thereof of Maybe as a representation of SQL-nullable data---presumably that information would be helpful to the same audience that would benefit from the explanation of how null interacts with unique constraints---but I doubt I'd get around to filing a PR for such language any time soon, presuming it were considered appropriate for this document in the first place.