Open deviant-logic opened 7 years ago
Maybe this should be reworded to specifically talk about uniqueness constraints? Postgres, MySQL, and SQLite will all allow duplicate NULL values in a column with a unique index, whereas in Haskell, Data.List.nub [Nothing, Nothing, Just 'a']
returns [Nothing, Just 'a']
I think it might just be that it's misleading (at least from the SQL perspective) to suggest that nullable columns in SQL are just like Maybe
s in Haskell.
Can you elaborate?
Apologies for the long delay getting back to this:
I was not trying to address the relationship of null
semantics to uniqueness constraints, but the specific claims about null
at all. For instance, in the quoted passage:
(e.g., is NULL=NULL true or false?)
To the best of my knowledge (I don't have the standard open at the moment) the answer to that question should not contribute to the putative ambiguity of uniqueness constraints: null=null
is neither true nor false, it is null
.
I think it's misleading to suggest nullable columns are just like Maybe
s in Haskell because I believe someone familiar with Maybe
but unfamiliar with SQL would be surprised by the latter's behavior. To elaborate:
A Haskell programmer expects the equality function to be boolean valued (i.e., (==) :: Eq a => a -> a -> Bool
), but in SQL, it can (and, I believe must) give null
when either of its arguments are null
. It's tempting to suggest that what's really going on is that everything in the database is actually in Maybe
, and SQL =
should be liftM2 (==)
, but that would imply that non-nullable columns are all actually Maybe
s that just happen to always be Just
. Not only does that severely weaken the metaphor ("an int
column is just like Int
, a nullable int
column is just like Maybe Int
.... but we need =
to make sense, so actually a non-nullable int
column is more like having implicit pure
s everywhere..."), it doesn't generalize to other SQL behavior. For example, suppose we have:
-- I'm using the same postgres 9.6.5 from above
create table foo (bar int unique);
insert into foo values (1);
insert into foo values (2);
insert into foo values (3);
insert into foo values (null);
insert into foo values (null);
insert into foo values (4);
insert into foo values (null);
In Haskell (under the null
-as-Maybe
metaphor) this is:
foo :: [Maybe Int]
foo = [Just 1, Just 2, Just 3, Nothing, Nothing, Just 4, Nothing]
So we try select count(*) from foo
, and get back 7. Great, length foo
also gives us 7. But then we try select count(bar) from foo
and get back 4. Hmm. Haskell-programming-me doesn't really see a difference here, but maybe count(*)
is just special (turns out it is), and I can believe that count
is sql means something more like fmap getSum . foldMap (fmap (const (Sum 1)))
. That would certainly make it somewhat closer to reasonable-seeming that select count(*) from foo where not (bar < 3)
gives back 2
, as does select count(*) from foo where bar >= 3
. We get trichotomy on non-null values, but we're working under the Maybe
monoid, so of course things get somewhat weird around the Nothing
s. That also makes select count(distinct(*)) from foo
giving us 4
somewhat okay---in Haskell, length . nub $ foo
gives us 5, but the weirder monoidal stuff with getSum
will throw out the stray unique Nothing
.
Let's try select 1 + bar from foo
. My psql
shows me 7 rows with the same results as fmap (fmap succ) foo
would give. It appears as if +
is behaving as liftM2 (+)
like =
does. So I probably should have written fmap (Just succ <*>) foo
. Perhaps the rule is "nullable columns are like Maybe
s, and binary operators get pushed through <*>
when applied to them (oh, and when the column's not nullable, we pure
it to get things to line up right)". This doesn't leave a good story for binary operators like is distinct from
, which is boolean valued like (==)
, or unary operators like is null
which can similarly "see through the monad". But it works for addition.
So now we try select sum(bar) from foo
. I (both as the character in this narrative and its author) honestly can't tell whether the resulting 10
is counterintuitive at this point. On the one hand, I want to believe that sum
is using +
, as if I'd written foldr (liftM2 (+)) (Just 0) foo
, but the Foldable
law says that sum = getSum . foldMap Sum
. Since there's no Num
instance for Maybe Int
, presumably the right interpretation would be that SQL sum
corresponds to Haskell fmap getSum . foldMap (fmap Sum)
? Pedagogically, I feel like I'm making more progress understanding SQL null
by attempting to come up with an explanation for how the metaphor could work despite my expectations rather than due to them. That's a pretty useful exercise, but one that at minimum deserves some warning.
Then there's outer joins, coalesce
, in
, etc. Writing all this, it occurs to me that I'm probably actually just channeling https://web.archive.org/web/20090813030219/http://thoughts.j-davis.com/2009/08/02/what-is-the-deal-with-nulls/, which, once I realized I had to go through the wayback machine, I was able to find again.
Are you asking for a change in Persistent behavior, or just a change to the docs? If just a change to the docs, can you suggest a revision of how you think they could be made more accurate?
I am not addressing the behavior persistent
; this is purely a comment about the docs. And it was intended as a very modest comment at that. Per the title/description, the quoted excerpt says that postgres believes that NULL=NULL
is false
, which does not appear to be the case. The smallest possible fix would change the word "false" to "NULL":
(e.g., PostgreSQL says that NULL=NULL is NULL, whereas Haskell says Nothing == Nothing is True).
I think it would be nice to add some verbiage addressing the fidelity-or-lack-thereof of Maybe
as a representation of SQL-null
able data---presumably that information would be helpful to the same audience that would benefit from the explanation of how null
interacts with unique constraints---but I doubt I'd get around to filing a PR for such language any time soon, presuming it were considered appropriate for this document in the first place.
The section on persistent says:
My understanding had alway been that
NULL=NULL
gaveNULL
, and indeed in the Postgres on my machine at the moment (9.6.5) as well as the sqlite,select (null = null) is null
evaluates totrue
. The article on SQL Nulls believes the distinction is worth calling out in its own section.