Open parsonsmatt opened 4 years ago
I looked through the issues you linked -- would this also simplify implementing created_at and updated_at type timestamp columns?
My current design would have a syntax like:
User
Id Serial
name Text
createdAt UTCTime default=current_time()
updatedAt UTCTime default=current_time()
This would result in the following SQL table:
create table "user" (
id serial primary key,
name varchar not null,
created_at timestamp default current_time(),
updated_at timestamp default current_time()
);
The data type would look like:
data User = User
{ userId :: Serial
, userName :: Text
, userCreatedAt :: UTCTime
, userUpdatedAt :: UTCTime
}
The class instance would look like:
data NewUser = NewUser { newUserName :: Text }
class PersistEntity User where
type New User = NewUser
So you'd write:
userKey <- insert NewUser { newUserName = "Bob" }
Just user <- get userKey
print user
-- User { userId = 0, userName = "Bob", userCreatedAt = ..., userUpdatedAt = ... }
Nice -- this seems like a huge improvement to me! Thanks for your work on this
More details"
persistent
is an awesome database library, and with esqueleto
for more SQL features, we have a compelling database story.
However, in my time spent using and evolving codebases with it, I've discovered a problem, and fixing it will require a relatively big change to how persistent
works.
Notably, we have to refactor Entity
.
The Yesod book talks about why Entity
was created, and it's a great idea that works well for the most common cases.
Many applications use datatypes where the Entity
pattern fits perfectly.
Unfortunately, it's not perfect.
Not all models have an ID
column with an easy default.
Currently, persistent
fails with a runtime error if you try to insert
such a model.
You need to use insertKey
instead, which accepts the key parameter to insert instead of relying on an assumed default.
Additionally, the id
column isn't the only one you want the database to generate.
It's really common to want a createdAt
and/or updatedAt
field that the database generates for you.
This is awkward to handle in persistent
- we must define our models to have Maybe UTCTime
for those fields, even though we only need Nothing
for writes and all reads should return a Just timestamp
.
It's also awkward in the case of composite primary keys.
In this case, an Entity record
does not contain any more information than a mere record
.
So, I propose this solution: a PersistEntity record
implies that the type record
completely describes the table, as it exists in the database.
To solve the problems related to multiple different uses of the type, we create separate datatypes to represent those uses.
All of the code here is a proposal, nothing is concrete or comitted.
To solve "The Insert Problem", we create a type family:
type family New record = insert | insert -> record
This type family uses the TypeFamilyDependencies
language extension, which allows us to infer the type of record
if we know the type of insert
.
This is important to preserve type inference.
And the type of insert
becomes (modulo SqlPersistT stuff)
insert :: New record -> IO record
Without the type family dependency on New
, we'd need to manually specify the type of record
, which would be annoying!
Let's consider the easiest thing here: a record that has no default columns.
NoDefault
Id Text
name Text
age Int
This model expects that a unique textual ID will be generated by the application.
This will now generate a Haskell record and an instance for the New
type family.
data NoDefault = NoDefault
{ noDefaultId :: Text
, noDefaultName :: Text
, noDefaultAge :: Int
}
type instance New NoDefault = NoDefault
Now, we can call insert NoDefault {..}
and it Just Works, no errors, provided we supply all of the fields.
Let's get a little more complex.
This is a User
datatype that has a database generated ID.
User
Id UUID default=uuid_generate_v4()
name Text
age Int
The default=
expression tells the database what to use for generating columns.
Because we have a default=
clause, we don't need to actually provide this when insert
ing a value.
So we generate a datatype for the table and a datatype for inserting:
data User = User
{ userId :: UUID
, userName :: Text
, userAge :: Int
}
data NewUser = NewUser
{ newUserName :: Text
, newUserAge :: Int
}
type instance New User = NewUser
insert :: NewUser -> IO User
We don't need insertKey
anymore!
The API to insert
is now totally safe and cannot fail at runtime.
Let's get even fancier: timestamps!
User
Id UUID default=uuid_generate_v4()
name Text
age Int
createdAt UTCTime default=CURRENT_TIMESTAMP()
Because createdAt
has a default=
expression, it is not included on NewUser
.
So the NewUser
type does not change with this addition.
However, we do get a new field on User
:
data User = User
{ userId :: UUID
, userName :: Text
, userAge :: Int
, userCreatedAt :: UTCTime
}
data NewUser = NewUser
{ newUserName :: Text
, newUserAge :: Int
}
type instance New User = NewUser
insert :: NewUser -> IO User
The value is supplied by the database. All is well.
But what if we want to supply these things? Then we should have a function:
insertRecord :: record -> IO record
that allows us to provide all of the values.
But, what if we want to supply some of those things, but not all? Then we'll have a new type for that:
type family Partial record = partial | partial -> record
data PartialUser = PartialUser
{ partialUserId :: Maybe UUID
, partialUserName :: Text
, partialUserAge :: Int
, partialUserCreatedAt :: Maybe UTCTime
}
type instance Partial User = UserPartial
insertPartial :: Partial record -> IO record
This allows you to specify whatever subset of fields you'd like, and leave the Nothing
s to be generated automatically by the database backend.
You may note that there is a function from New record
to Partial record
where the Maybe
fields are given a Nothing
.
Likewise, there is a function from record
to Partial record
where the default-able fields are given Just field
.
This suggests that we can accept any of a New record
, record
, or Partial record
for the insert
:
class ToPartial record part | part -> record where
toPartial :: part -> Partial record
instance ToPartial User User where
toPartial User {..} = PartialUser
{ partialUserName =
userName
, partialUserId =
Just userId
, partialUserCreatedAt =
Just userCreatedAt
, partialUserAge =
userAge
}
instance ToPartial User (New User) where
toPartial NewUser {..} = PartialUser
{ partialUserName =
newUserName
partialUserAge =
newUserAge
, partialUserId =
Nothing
, partialUserCreatedAt =
Nothing
}
instance ToPartial User (Partial User) where
toPartial = id
Now, we can generalize insert
:
class PersistEntity record where
insert
:: (ToPartial record partial)
=> partial
-> IO record
insert =
insertPartial . toPartial
So that you can call:
insertUsers = do
insert User
{ userName, userAge, userId, userCreatedAt
}
insert NewUser
{ newUserName, newUserAge
}
insert PartialUser
{ partialUserName, partialUserAge
, partialUserCreatedAt, partialUserId
}
without needing a whole family of functions.
The datatype Entity
will become:
newtype Entity record = Entity { entityVal :: record }
entityKey
:: PersistEntity record
=> Entity record
-> Key record
entityKey =
keyFromRecord . entityVal
Much like Single
or Value
, it'll be primarily useful as a newtype
wrapper for providing various instances for PersistEntity
related functionality.
Let's look at get
:
get :: Key record -> SqlPersistT m (Maybe record)
We need a Key
.
Currently, Key
is defined as a data family
on the PersistEntity
class.
There's no reason we can't continue to do that.
It works out quite well.
However, we have a large number of variations of these functions that operate on non-primary keys.
The function getBy
takes a Unique record
.
But a Key record
is a Unique record
.
Wouldn't it be nice to just have a function get
that takes a uniqueness constraint of any sort (be it a Key
or a Unique
)?
Does it really make sense for a Key
to be separate from a Unique
in any sense?
Perhaps what we want is really a function:
get :: Unique record -> SqlPersistT m (Maybe record)
And one of the Unique
constructors will promote the Key record -> Unique record
.
To simplify usage, we can write:
get :: (ToUnique record uniq) => uniq -> SqlPersistT m (Maybe record)
class ToUnique record uniq | uniq -> record where
toUnique :: uniq -> Unique record
Almost all entities have a primary key, but some don't.
So get
doesn't make sense for these records, and you would need to use selectList
for them.
An entity with a primary key defined would have a single Unique
constructor:
instance ToUnique User (Key User) where
toUnique = UniqueUserKey
But a User
may also have another uniqueness constraint:
User
Id UUID default=uuid_generate_v4()
name Text
age Int
email Text
createdAt UTCTime default=now()
UniqueEmail email
This creates a constructor UniqueEmail :: Text -> Unique User
.
But it feels a bit weird to me that we have this single type Key User
that gets to be all these nice instances (Ord
, Eq
, Show
, etc).
We can create a Map (Key User) User
that indexes users by their ID column, but we can't write Map UserUniqueEmail User
, even though we totally should be able to.
Perhaps all unique keys should be their own type. Then, UniqueEmail email
creates:
newtype UserUniqueEmail = UserUniqueEmail
{ userUniqueEmail :: Text
}
instance ToUnique User UserUniqueEmail where
toUnique = UniqueEmail
And then we can just as easily write get (UserUniqueEmail "foobar@example.com")
as get (UserKey "asdfasdfasdfasdf")
.
Basically, I want to make it as easy to use any random Unique
ness constraint as it currently is to use the Key record
, without sacrificing the utility of Key
.
OK so upsert
used to be unsafe because it required that you have exactly one unique key on a record.
We accomplished this with a type class OnlyOneUniqueKey
and by defining a regular instance for appropriate records and a TypeError
instance for records with either 0 or multiple unique keys.
Why? Well, let's say you don't have any unique constraints with persistent-2
.
Then upsert
is only ever going to insert
!
That's probably not what you want.
Now, what if you have two unique constraints?
Suppose you do upsert
with a value containing a clash for two different keys.
You're definitely not doing an insert
, so you're going to update one of the records.
Which record?
Who knows?!
So we just say upsert
only works on a single uniqueness constraint, and if you have more, then you need to use upsertBy
, which lets you supply the uniqueness constraint.
But in this new model, where record
contains the primary key (along with any other unique columns), it's a little more tricky.
What should our signature be?
upsert
:: ToPartial record partial
=> partial
-> [Update record]
-> IO record
Well, we can definitely do an insert
with the whole record
.
We can use the ToPartial
machinery from insert
to use one of New record
, record
, or a Partial record
, and let the database fill in the boring values.
But we need to know how we're going to handle update
s.
If there are no unique keys, then this is just insert
.
If there are multiple unique keys, then we should require the user specify which uniqueness key.
This might depend on the precise runtime value provided.
Consider the User
given above:
User
Id UUID default=uuid_generate_v4()
name Text
age Int
email Text
createdAt UTCTime default=now()
UniqueEmail email
Because there is no default=
expression for email
, even the New User
will contain a unique key.
But it will only contain one unique key, and so we can call upsert NewUser {..}
without fear.
However, calling upsert User{..}
will have two uniqueness constraints: the Id
and the UniqueEmail
.
This should be forbidden and the user directed to write upsertBy
.
Using upsert PartialUser{..}
would trigger the same warning, which is somewhat unfortunate, because we might be providing a Nothing
for the Id
, and then we wouldn't really have a conflict.
This makes me wonder: Should PartialUser
be an HList
-y construct that extends a NewUser
, so that additional fields can be supplied?
data With rest (sym, val) = With rest val
type PartialUser =
NewUser
`With`
("createdAt", Maybe UTCTime)
`With`
("id", Maybe UUID)
This feels overly complicated to require for an initial release, but if the demand for the feature is high enough, I suppose it could be added in.
So this is the main big change that I'd like to roll out for persistent-3
that will really change the library.
There are a few other changes, and I've documented them on the GitHub issue tracker.
I am not sure if this is a valid concern but I am not a big fan of the new and partial prefixes on the field names. I don't know enough about the dangers of overlapping field names and I guess its just part of my general dissatisfaction with Haskell records.
So it looks like all of the cases of insert and upsert will need to be changed. What if instead of having a separate New User
and Partial User
there was a newUser :: Partial User
value? This would be created by the QQ and would set defaults of Nothing for all the Maybes and TypeError for all the required fields. Then the user would override like
newUser { _userName = "Bob" }
I don't know if this would work though. Would the type errors trigger if they were overridden? Would they even trigger if they weren't overridden? Would this even play nice with inference?
This newUser could actually be added in 2.11 and just be implemented in terms of the existing User type. This could then be the migration path since you could migrate all insertion code over to using the newUser functionality. We of course still have the issue with the multiple uniqueness constraints with upsert but I think the general insert case is much easier to migrate with this approach.
Another concern is what is the path for users that opted to not use the TH approach. I guess they will be prompted to define Partial types by the compiler? Is there a way to have a default associated type on a type class like default functions on a type class, e.g.
class PersistEntity typ where
type Partial typ = <typ where id is maybe somehow>
that way they don't need to modify their PersistEntity instances and their inserts. I don't know how many people avoid the QQ so maybe this migration path is less of a concern. I guess they could manually add the newUser style indirection.
Overall, this is amazing work and I think persistent-3 + esqueleto-4 will be the best in class typed-db solution.
@thematten points out in FPCHat slack that making New
and Partial
data families (instead of type families) would be more consistent. This is true and also more resilient to changing schema. Consider:
Foo
name String
This does not generate a PartialFoo
or NewFoo
datatype, and it sets type instance Partial Foo = Foo
and type instance New Foo = Foo
. We do insert Foo { fooName = "Hello" }
, and all is well. But then we add a defaultable column:
Foo
name String
createdAt UTCTime default=now()
Suddenly, we generate the NewFoo
and PartialFoo
types and the type family equations are changed. The previous call to insert
will now fail because Foo
has uninitialized fields.
If we always generated new datatypes as a data family
, then we'd be allowing the user to write:
insert Foo { fooName = "Hello" }
-- or,
insert NewFoo { newFooName = "Hello" }
-- or,
insert PartialFoo { partialFooName = "Hello" }
and you'd only need to modify your code if you wrote Foo
or PartialFoo
, and writing insert Foo {..}
signals the intent of providing all values anyway, so that's a good break.
I am not sure if this is a valid concern but I am not a big fan of the new and partial prefixes on the field names. I don't know enough about the dangers of overlapping field names and I guess its just part of my general dissatisfaction with Haskell records.
I definitely agree with this - I don't like them either! But I do like them more than DuplicateRecordFields
. All ears for a better convention there.
So it looks like all of the cases of insert and upsert will need to be changed. What if instead of having a separate New User and Partial User there was a newUser :: Partial User value? This would be created by the QQ and would set defaults of Nothing for all the Maybes and TypeError for all the required fields. Then the user would override like
Can you elaborate on that? Would this allow you to require that fields are overriden at the use site, but not blow up at the definition site?
Another concern is what is the path for users that opted to not use the TH approach.
I think in this case I am going to rely on extensive documentation on the PersistEntity
class to provide a good set of instructions on how to upgrade. I've never used the persistent
library without the QQ to generate the boilerplate, and it's not really a well supported workflow with the library.
Can you elaborate on that? Would this allow you to require that fields are overriden at the use site, but not blow up at the definition site?
Yah, I don't know, I figure as long as its never evaluated its fine, kinda like undefined. I need to test it. But basically the ideal would be.
newRecord :: Partial RecordType
newRecord =
PartialRecordType
{ _recordId = Nothing
, _recordOptionalField = Nothing
, _recordRequiredField = TypeError (Text "requiredField is required")
}
Then insert would be strict in the fields maybe so that if they weren't overridden they would blow. Of course I need to check that this won't blow the minute the record is defined because that would put the idea dead in the water.
EDIT: it looks like lazy type errors are hard and I will need to spend time figuring out how to make a value that only blows up when you evaluate it.
If we were supporting composite primary keys properly what are the plans for Database.Persist.Types.EntityDef.entityId?
My primary concern here is that the dissolution of Entity
and moving identifier information into the record produces a breaking behavioral change with no type errors.
A simple example is the difference between Set Foo
and Set (Entity Foo)
. Today Set Foo
is a set of unique values contained in the Foo
table. With the dissolution of Entity
Set Foo
becomes an ordered list of all rows in the Foo
table. These are the types of changes that can lead to insidious bugs in downstream consumers of this library.
data Foo = Foo { fooVal :: Int }
fooVals :: Set Foo
fooVals = Set.fromList [Foo 1, Foo 2, Foo 3, Foo 4, Foo 1, Foo 3]
-- fromList [Foo 1, Foo 2, Foo 3, Foo 4]
data Foo = Foo { fooId :: Int64, fooVal :: Int }
fooVals :: Set Foo
fooVals = Set.fromList [Foo 1 1, Foo 2 2, Foo 3 3, Foo 4 4, Foo 5 1, Foo 6 3]
-- fromList [Foo 1 1, Foo 2 2, Foo 3 3, Foo 4 4, Foo 5 1, Foo 6 3]
Arguing with myself here. Intention will be required in removing entityVal
calls, so this might not be as bad as I was initially thinking.
I am planning on continuing to maintain persistent-2.11
with bug fixes and minor improvements, so if you can't migrate to 3.0
then that's fine. A big part of why I want to do the 3.0
instead of 2.12
is to signal that this is a big change.
I'm hearing that you have a use case for Foo `Without` Key Foo
, whatever Without
ends up looking like. I think you could get there using Partial Foo
, doing:
Set.fromList . map (\foo -> foo { partialFooId = Nothing }) . map toPartial
It's kind of ugly, but it works? Supporting this in more detail might look like:
newtype WithoutId a = WithoutId a
instance (Ord (Partial a), PersistEntity a) => Ord (WithoutId a) where
compare (WithoutId a0) (WithoutId a1) =
comparing (nullifyId . toPartial) a0 a1
@parsonsmatt Wouldn't New Foo
already give you Foo
without the id
?
New Foo
would also be missing any other default
-able fields (eg createdAt
or updatedAt
timestamps if so specified).
The problem with a generic FooWithoutId
type being made is "What do you do with natural/composite keys?"
My library record-wrangler
could pretty easily remove a value from an entity if this use case is common.
I kind of assumed people wanting full backwards compatibility would not have any default
fields anyway, since they get meaningfully use them outside the initial migration. But yes other default
fields would change things.
Yeah in general in our case FooWithoutId
is no more fundamental than FooWithoutTimestamp
.
It seems like nice extensible records underneath these types would be best, which seems similar to what you are aiming for with record-wrangler
. Unfortunately Haskell's extensible records story is still in fairly early stages, although RecordDotSyntax
is a nice step forward.
@parsonsmatt about creating a completely new data type for partial records, perhaps that is something that could be avoided if we use TypeFamilies for columns with default values, then we should be able to pass an extra flag depending on the operation (read / write) in order to allow users to override a default value, or just leave it empty so the DB deals with that column:
Given the following template Haskell:
User
firstName Text
createdAt UTCTime default=now()
updatedAt UTCTime default=now()
It could output something like this:
data User (op :: Operation) = User
{ userFirstName :: Text
, userCreatedAt :: HasDefault op UTCTime
, userUpdatedAt :: HasDefault op UTCTime
}
data Operation = Read | Write
type family HasDefault (op :: Operation) a where
HasDefault 'Read a = a
HasDefault 'Write a = Maybe a
insert
would look something like the one below:
type instance Partial (User 'Read) = User 'Write
insert :: Partial record -> m record
Then for a HasDefault
field if the value is set to Nothing
persistent would let the DB to deal with that by excluding the column from the INSERT INTO
statement:
insert $ User "John" Nothing Nothing
-- INSERT INTO users (first_name) VALUES (?)
Otherwise, the value should be passed directly to the INSERT INTO
statement:
insert $ User "John" (Just createdAt) Nothing
-- INSERT INTO users (first_name, created_at) VALUES (?, ?)
I think the main benefit of using a single data type is consistency, in the sense that record fields would look the same for read and write operations, which I think is more intuitive for developers. Any thoughts?
I would really like to avoid the Higher Kinded Data pattern. That's a level of complexity that I'd rather avoid.
New Foo
would also be missing any other default-able fields (egcreatedAt
orupdatedAt
timestamps if so specified).
These are actually soft identifiers that break an ideal value based Eq
, so New Foo
might actually cover the use case.
Is New
maybe an over-fit name? I like Partial
, but it seems you intend a different use for that.
So following on from @eborden, New is the minimal version of the type, perhaps it could be MinimalUser or BaseUser. I think Base fits really nicely with the HList approach to the partial type.
In general, I think this looks really solid. It solves the main problem that we've had to work around in graphula
, which fully depends on persistent
's Entity
model. I suspect we'll be able to radically simplify graphula
once this is released.
I'd also like to second using a data family
instead of a type family
with an injectivity annotation. I think the only thing you lose there is the ability to re-use the base constructor as the Partial
/New
constructor, which shouldn't be a big deal.
Fixes #1074
~I'd like to roll up as many other big changes as possible. If you have something you'd like to change or break about the library, now is a great time to mention it.~
~Thought I would link https://github.com/yesodweb/persistent/issues/849 here as it talks about making SqlPersistT
a newtype
wrapper, would 3.0
be a good time to make a change like that?~
Ok scratch that, I've just seen https://github.com/yesodweb/persistent/issues/1038 :) I will link those issues.
Really looking forward to have this at some point. A better (CQRS) separation between read models and insert, upsert etc. models seems very helpful.
This issue tracks features and bug fixes that will be released as part of
persistent-3.0.0
. I want to bump the major-major version number because I want to implement a Big Breaking Change:The Desolation of
Entity
There are many problems in
persistent
relating to the design ofEntity
:This works very well in the common case of "I want a datatype
record
with an autogenerated surrogateKey record
," but it doesn't handle composite primary keys or non-autogenerated primary keys particularly well. It also special cases the primary key, when really any number of columns could have database specified defaults.1035, #1016, #994, #992, #985, #973, #974,
The solution that I want to implement is detailed in #995. I'll open a new issue to focus on gutting
Entity
and further conversation can go there.Moar Breaking Changes
Because this is a big change, I'd like to roll up as many other big changes as possible. If you have something you'd like to change or break about the library, now is a great time to mention it.
I want to roll up the functional/bug fix breaking changes and do
persistent-2.11
. After that, I'll start working onpersistent-3.0
, which will be a bigger change.The milestone
3.0
is in place for this update.