m4dc4p / haskelldb

A library for building re-usable and composable SQL queries.
BSD 3-Clause "New" or "Revised" License
101 stars 17 forks source link

Semantics of aggregation is unclear #22

Open karamaan opened 10 years ago

karamaan commented 10 years ago

The semantics of HaskellDB's aggregation operators is very unclear. Let me propose the following example to demonstrate my confusion. It concerns a table whose rows represent people. A person has a family and an age.

import Database.HaskellDB.PrimQuery
import Database.HaskellDB.Query
import Database.HaskellDB.HDBRec
import Database.HaskellDB.DBLayout

data Person = Person
instance FieldTag Person where
  fieldName _ = "person"

data Family = Family
instance FieldTag Family where
  fieldName _ = "family"

data Age = Age
instance FieldTag Age where
  fieldName _ = "age"

family :: Attr Family String
family = mkAttr Family

person :: Attr Person String
person = mkAttr Person

age :: Attr Age Int
age = mkAttr Age

personTable :: Table (RecCons Person String
                  (RecCons Family String
                   (RecCons Age Int RecNil)))
personTable = Table "mytable" [ ("person", AttrExpr "personcol")
                              , ("family", AttrExpr "familycol")
                              , ("age", AttrExpr "agecol") ]

I might want to calculate the total age of everyone in a family. This I do with agesOfFamilies. It returns a query whose rows (ostensibly) pair the family with the total age of everyone in that family.

agesOfFamilies :: Query (Rel
                         (RecCons Family (Expr String)
                          (RecCons Age (Expr Int) RecNil)))
agesOfFamilies = do
  my <- table personTable
  project (family << my!family # age << _sum (my!age))

I can test it with showSql thus:

*Main> putStrLn $ showSql agesOfFamilies 
SELECT familycol as family,
       SUM(agecol) as age
FROM mytable as T1
GROUP BY familycol

which is exactly what I wanted. What happens if I want to project just the age column from this query?

justAgesOfFamilies :: Query (Rel (RecCons Age (Expr Int) RecNil))
justAgesOfFamilies = do
  agesOfFamilies <- agesOfFamilies
  project (age << agesOfFamilies!age)

It seems that justAgesOfFamilies should return a single-column query with one row for each family containing their total age, i.e. the result of the query agesOfFamilies without the family column. However, what I get is completely different

*Main> putStrLn $ showSql justAgesOfFamilies 
SELECT SUM(agecol) as age
FROM mytable as T1

This kind of behaviour seems to be an enormous impediment to composability of queries in HaskellDB.

tomjaguarpaw commented 9 years ago

Just to be clear, the reason that this is undesirable is that it is a violation of referential transparency. ("Referential transparency" here is with respect to the database, not with respect to Haskell, of course!) An expression's value should be unchanged when you replace a subexpression with its value. For example agesOfFamilies might evaluate to

Family Age
Smith 75
Jones 85

Replacing agesOfFamilies in the definition justAgesOfFamilies with its value (i.e. this table) would lead to a result of

Age
75
85

Since HaskellDB gives us

Age
160

this is a violation of referential transparency.