yesodweb / persistent

Persistence interface for Haskell allowing multiple storage methods.
MIT License
467 stars 296 forks source link

persistent-3.0.0 #1037

Open parsonsmatt opened 4 years ago

parsonsmatt commented 4 years ago

This issue tracks features and bug fixes that will be released as part of persistent-3.0.0. I want to bump the major-major version number because I want to implement a Big Breaking Change:

The Desolation of Entity

There are many problems in persistent relating to the design of Entity:

data Entity record = Entity 
  { entityKey :: Key record
  , entityVal :: record
  }

This works very well in the common case of "I want a datatype record with an autogenerated surrogate Key record," but it doesn't handle composite primary keys or non-autogenerated primary keys particularly well. It also special cases the primary key, when really any number of columns could have database specified defaults.

1035, #1016, #994, #992, #985, #973, #974,

The solution that I want to implement is detailed in #995. I'll open a new issue to focus on gutting Entity and further conversation can go there.

Moar Breaking Changes

Because this is a big change, I'd like to roll up as many other big changes as possible. If you have something you'd like to change or break about the library, now is a great time to mention it.

I want to roll up the functional/bug fix breaking changes and do persistent-2.11. After that, I'll start working onpersistent-3.0, which will be a bigger change.

The milestone 3.0 is in place for this update.

willisplummer commented 4 years ago

I looked through the issues you linked -- would this also simplify implementing created_at and updated_at type timestamp columns?

parsonsmatt commented 4 years ago

My current design would have a syntax like:

User
  Id         Serial
  name       Text
  createdAt  UTCTime  default=current_time()
  updatedAt  UTCTime  default=current_time()

This would result in the following SQL table:

create table "user" (
  id   serial primary key,
  name varchar not null,
  created_at timestamp default current_time(),
  updated_at timestamp default current_time()
);

The data type would look like:

data User = User 
  { userId :: Serial
  , userName :: Text
  , userCreatedAt :: UTCTime
  , userUpdatedAt :: UTCTime
  }

The class instance would look like:

data NewUser = NewUser { newUserName :: Text }

class PersistEntity User where
  type New User = NewUser

So you'd write:

userKey <- insert NewUser { newUserName = "Bob" }
Just user <- get userKey
print user
-- User { userId = 0, userName = "Bob", userCreatedAt = ..., userUpdatedAt = ... }
willisplummer commented 4 years ago

Nice -- this seems like a huge improvement to me! Thanks for your work on this

parsonsmatt commented 4 years ago

More details"

Persistent 3.0

persistent is an awesome database library, and with esqueleto for more SQL features, we have a compelling database story. However, in my time spent using and evolving codebases with it, I've discovered a problem, and fixing it will require a relatively big change to how persistent works.

Notably, we have to refactor Entity. The Yesod book talks about why Entity was created, and it's a great idea that works well for the most common cases. Many applications use datatypes where the Entity pattern fits perfectly.

Unfortunately, it's not perfect. Not all models have an ID column with an easy default. Currently, persistent fails with a runtime error if you try to insert such a model. You need to use insertKey instead, which accepts the key parameter to insert instead of relying on an assumed default.

Additionally, the id column isn't the only one you want the database to generate. It's really common to want a createdAt and/or updatedAt field that the database generates for you. This is awkward to handle in persistent - we must define our models to have Maybe UTCTime for those fields, even though we only need Nothing for writes and all reads should return a Just timestamp.

It's also awkward in the case of composite primary keys. In this case, an Entity record does not contain any more information than a mere record.

So, I propose this solution: a PersistEntity record implies that the type record completely describes the table, as it exists in the database. To solve the problems related to multiple different uses of the type, we create separate datatypes to represent those uses. All of the code here is a proposal, nothing is concrete or comitted.

To solve "The Insert Problem", we create a type family:

type family New record = insert | insert -> record

This type family uses the TypeFamilyDependencies language extension, which allows us to infer the type of record if we know the type of insert. This is important to preserve type inference.

And the type of insert becomes (modulo SqlPersistT stuff)

insert :: New record -> IO record

Without the type family dependency on New, we'd need to manually specify the type of record, which would be annoying!

Let's consider the easiest thing here: a record that has no default columns.

NoDefault
    Id    Text
    name  Text
    age   Int

This model expects that a unique textual ID will be generated by the application. This will now generate a Haskell record and an instance for the New type family.

data NoDefault = NoDefault
    { noDefaultId :: Text
    , noDefaultName :: Text
    , noDefaultAge :: Int
    }

type instance New NoDefault = NoDefault

Now, we can call insert NoDefault {..} and it Just Works, no errors, provided we supply all of the fields.

Let's get a little more complex. This is a User datatype that has a database generated ID.

User
    Id    UUID  default=uuid_generate_v4()
    name  Text
    age   Int

The default= expression tells the database what to use for generating columns. Because we have a default= clause, we don't need to actually provide this when inserting a value.

So we generate a datatype for the table and a datatype for inserting:

data User = User
    { userId :: UUID
    , userName :: Text
    , userAge :: Int
    }

data NewUser = NewUser
    { newUserName :: Text
    , newUserAge :: Int
    }

type instance New User = NewUser

insert :: NewUser -> IO User

We don't need insertKey anymore! The API to insert is now totally safe and cannot fail at runtime.

Let's get even fancier: timestamps!

User
    Id        UUID    default=uuid_generate_v4()
    name      Text
    age       Int
    createdAt UTCTime default=CURRENT_TIMESTAMP()

Because createdAt has a default= expression, it is not included on NewUser. So the NewUser type does not change with this addition. However, we do get a new field on User:

data User = User
    { userId :: UUID
    , userName :: Text
    , userAge :: Int
    , userCreatedAt :: UTCTime
    }

data NewUser = NewUser
    { newUserName :: Text
    , newUserAge :: Int
    }

type instance New User = NewUser

insert :: NewUser -> IO User

The value is supplied by the database. All is well.

But what if we want to supply these things? Then we should have a function:

insertRecord :: record -> IO record

that allows us to provide all of the values.

But, what if we want to supply some of those things, but not all? Then we'll have a new type for that:

type family Partial record = partial | partial -> record

data PartialUser = PartialUser 
    { partialUserId :: Maybe UUID
    , partialUserName :: Text
    , partialUserAge :: Int
    , partialUserCreatedAt :: Maybe UTCTime
    }

type instance Partial User = UserPartial

insertPartial :: Partial record -> IO record

This allows you to specify whatever subset of fields you'd like, and leave the Nothings to be generated automatically by the database backend.

You may note that there is a function from New record to Partial record where the Maybe fields are given a Nothing. Likewise, there is a function from record to Partial record where the default-able fields are given Just field. This suggests that we can accept any of a New record, record, or Partial record for the insert:

class ToPartial record part | part -> record where
    toPartial :: part -> Partial record

instance ToPartial User User where
    toPartial User {..} = PartialUser 
        { partialUserName = 
            userName
        , partialUserId = 
            Just userId
        , partialUserCreatedAt = 
            Just userCreatedAt
        , partialUserAge = 
            userAge
        }

instance ToPartial User (New User) where
    toPartial NewUser {..} = PartialUser
        { partialUserName = 
            newUserName
         partialUserAge =
            newUserAge
        , partialUserId = 
            Nothing
        , partialUserCreatedAt =
            Nothing
        }

instance ToPartial User (Partial User) where
    toPartial = id

Now, we can generalize insert:

class PersistEntity record where
    insert 
        :: (ToPartial record partial)
        => partial
        -> IO record
    insert = 
        insertPartial . toPartial

So that you can call:

insertUsers = do
    insert User 
        { userName, userAge, userId, userCreatedAt 
        }

    insert NewUser 
        { newUserName, newUserAge 
        }

    insert PartialUser 
        { partialUserName, partialUserAge
        , partialUserCreatedAt, partialUserId
        }

without needing a whole family of functions.

The Fate of Entity

The datatype Entity will become:

newtype Entity record = Entity { entityVal :: record }

entityKey 
    :: PersistEntity record 
    => Entity record 
    -> Key record
entityKey = 
    keyFromRecord . entityVal

Much like Single or Value, it'll be primarily useful as a newtype wrapper for providing various instances for PersistEntity related functionality.

What about reads and modifications?

Let's look at get:

get :: Key record -> SqlPersistT m (Maybe record)

We need a Key. Currently, Key is defined as a data family on the PersistEntity class. There's no reason we can't continue to do that. It works out quite well.

However, we have a large number of variations of these functions that operate on non-primary keys. The function getBy takes a Unique record. But a Key record is a Unique record.

Wouldn't it be nice to just have a function get that takes a uniqueness constraint of any sort (be it a Key or a Unique)? Does it really make sense for a Key to be separate from a Unique in any sense? Perhaps what we want is really a function:

get :: Unique record -> SqlPersistT m (Maybe record)

And one of the Unique constructors will promote the Key record -> Unique record. To simplify usage, we can write:

get :: (ToUnique record uniq) => uniq -> SqlPersistT m (Maybe record)

class ToUnique record uniq | uniq -> record where
  toUnique :: uniq -> Unique record

Almost all entities have a primary key, but some don't. So get doesn't make sense for these records, and you would need to use selectList for them.

An entity with a primary key defined would have a single Unique constructor:

instance ToUnique User (Key User) where
    toUnique = UniqueUserKey

But a User may also have another uniqueness constraint:

User
    Id          UUID    default=uuid_generate_v4()
    name        Text
    age         Int
    email       Text
    createdAt   UTCTime default=now()

    UniqueEmail email

This creates a constructor UniqueEmail :: Text -> Unique User. But it feels a bit weird to me that we have this single type Key User that gets to be all these nice instances (Ord, Eq, Show, etc). We can create a Map (Key User) User that indexes users by their ID column, but we can't write Map UserUniqueEmail User, even though we totally should be able to.

Perhaps all unique keys should be their own type. Then, UniqueEmail email creates:

newtype UserUniqueEmail = UserUniqueEmail
    { userUniqueEmail :: Text
    }

instance ToUnique User UserUniqueEmail where
    toUnique = UniqueEmail

And then we can just as easily write get (UserUniqueEmail "foobar@example.com") as get (UserKey "asdfasdfasdfasdf").

Basically, I want to make it as easy to use any random Uniqueness constraint as it currently is to use the Key record, without sacrificing the utility of Key.

Upsert

OK so upsert used to be unsafe because it required that you have exactly one unique key on a record. We accomplished this with a type class OnlyOneUniqueKey and by defining a regular instance for appropriate records and a TypeError instance for records with either 0 or multiple unique keys.

Why? Well, let's say you don't have any unique constraints with persistent-2. Then upsert is only ever going to insert! That's probably not what you want.

Now, what if you have two unique constraints? Suppose you do upsert with a value containing a clash for two different keys. You're definitely not doing an insert, so you're going to update one of the records. Which record? Who knows?!

So we just say upsert only works on a single uniqueness constraint, and if you have more, then you need to use upsertBy, which lets you supply the uniqueness constraint.

But in this new model, where record contains the primary key (along with any other unique columns), it's a little more tricky.

What should our signature be?

upsert 
    :: ToPartial record partial
    => partial 
    -> [Update record] 
    -> IO record

Well, we can definitely do an insert with the whole record. We can use the ToPartial machinery from insert to use one of New record, record, or a Partial record, and let the database fill in the boring values. But we need to know how we're going to handle updates.

If there are no unique keys, then this is just insert. If there are multiple unique keys, then we should require the user specify which uniqueness key. This might depend on the precise runtime value provided. Consider the User given above:

User
    Id          UUID    default=uuid_generate_v4()
    name        Text
    age         Int
    email       Text
    createdAt   UTCTime default=now()

    UniqueEmail email

Because there is no default= expression for email, even the New User will contain a unique key. But it will only contain one unique key, and so we can call upsert NewUser {..} without fear.

However, calling upsert User{..} will have two uniqueness constraints: the Id and the UniqueEmail. This should be forbidden and the user directed to write upsertBy. Using upsert PartialUser{..} would trigger the same warning, which is somewhat unfortunate, because we might be providing a Nothing for the Id, and then we wouldn't really have a conflict.

This makes me wonder: Should PartialUser be an HList-y construct that extends a NewUser, so that additional fields can be supplied?

data With rest (sym, val) = With rest val

type PartialUser =
    NewUser
    `With`
    ("createdAt", Maybe UTCTime)
    `With`
    ("id", Maybe UUID)

This feels overly complicated to require for an initial release, but if the demand for the feature is high enough, I suppose it could be added in.

Other Changes

So this is the main big change that I'd like to roll out for persistent-3 that will really change the library. There are a few other changes, and I've documented them on the GitHub issue tracker.

belevy commented 4 years ago

I am not sure if this is a valid concern but I am not a big fan of the new and partial prefixes on the field names. I don't know enough about the dangers of overlapping field names and I guess its just part of my general dissatisfaction with Haskell records.

So it looks like all of the cases of insert and upsert will need to be changed. What if instead of having a separate New User and Partial User there was a newUser :: Partial User value? This would be created by the QQ and would set defaults of Nothing for all the Maybes and TypeError for all the required fields. Then the user would override like

newUser { _userName = "Bob" }

I don't know if this would work though. Would the type errors trigger if they were overridden? Would they even trigger if they weren't overridden? Would this even play nice with inference?

This newUser could actually be added in 2.11 and just be implemented in terms of the existing User type. This could then be the migration path since you could migrate all insertion code over to using the newUser functionality. We of course still have the issue with the multiple uniqueness constraints with upsert but I think the general insert case is much easier to migrate with this approach.

Another concern is what is the path for users that opted to not use the TH approach. I guess they will be prompted to define Partial types by the compiler? Is there a way to have a default associated type on a type class like default functions on a type class, e.g.

class PersistEntity typ where 
    type Partial typ = <typ where id is maybe somehow>

that way they don't need to modify their PersistEntity instances and their inserts. I don't know how many people avoid the QQ so maybe this migration path is less of a concern. I guess they could manually add the newUser style indirection.

Overall, this is amazing work and I think persistent-3 + esqueleto-4 will be the best in class typed-db solution.

parsonsmatt commented 4 years ago

@thematten points out in FPCHat slack that making New and Partial data families (instead of type families) would be more consistent. This is true and also more resilient to changing schema. Consider:

Foo
    name String

This does not generate a PartialFoo or NewFoo datatype, and it sets type instance Partial Foo = Foo and type instance New Foo = Foo. We do insert Foo { fooName = "Hello" }, and all is well. But then we add a defaultable column:

Foo
    name String
    createdAt UTCTime default=now()

Suddenly, we generate the NewFoo and PartialFoo types and the type family equations are changed. The previous call to insert will now fail because Foo has uninitialized fields.

If we always generated new datatypes as a data family, then we'd be allowing the user to write:

insert Foo { fooName = "Hello" }
-- or,
insert NewFoo { newFooName = "Hello" }
-- or,
insert PartialFoo { partialFooName = "Hello" }

and you'd only need to modify your code if you wrote Foo or PartialFoo, and writing insert Foo {..} signals the intent of providing all values anyway, so that's a good break.

parsonsmatt commented 4 years ago

I am not sure if this is a valid concern but I am not a big fan of the new and partial prefixes on the field names. I don't know enough about the dangers of overlapping field names and I guess its just part of my general dissatisfaction with Haskell records.

I definitely agree with this - I don't like them either! But I do like them more than DuplicateRecordFields. All ears for a better convention there.

So it looks like all of the cases of insert and upsert will need to be changed. What if instead of having a separate New User and Partial User there was a newUser :: Partial User value? This would be created by the QQ and would set defaults of Nothing for all the Maybes and TypeError for all the required fields. Then the user would override like

Can you elaborate on that? Would this allow you to require that fields are overriden at the use site, but not blow up at the definition site?

Another concern is what is the path for users that opted to not use the TH approach.

I think in this case I am going to rely on extensive documentation on the PersistEntity class to provide a good set of instructions on how to upgrade. I've never used the persistent library without the QQ to generate the boilerplate, and it's not really a well supported workflow with the library.

belevy commented 4 years ago

Can you elaborate on that? Would this allow you to require that fields are overriden at the use site, but not blow up at the definition site?

Yah, I don't know, I figure as long as its never evaluated its fine, kinda like undefined. I need to test it. But basically the ideal would be.

newRecord :: Partial RecordType 
newRecord = 
    PartialRecordType 
      { _recordId = Nothing 
      , _recordOptionalField = Nothing
      , _recordRequiredField = TypeError (Text "requiredField is required")
     }

Then insert would be strict in the fields maybe so that if they weren't overridden they would blow. Of course I need to check that this won't blow the minute the record is defined because that would put the idea dead in the water.

EDIT: it looks like lazy type errors are hard and I will need to spend time figuring out how to make a value that only blows up when you evaluate it.

mmhat commented 4 years ago

If we were supporting composite primary keys properly what are the plans for Database.Persist.Types.EntityDef.entityId?

eborden commented 4 years ago

My primary concern here is that the dissolution of Entity and moving identifier information into the record produces a breaking behavioral change with no type errors.

A simple example is the difference between Set Foo and Set (Entity Foo). Today Set Foo is a set of unique values contained in the Foo table. With the dissolution of Entity Set Foo becomes an ordered list of all rows in the Foo table. These are the types of changes that can lead to insidious bugs in downstream consumers of this library.

eborden commented 4 years ago

Today

data Foo = Foo { fooVal :: Int }

fooVals :: Set Foo
fooVals = Set.fromList [Foo 1, Foo 2, Foo 3, Foo 4, Foo 1, Foo 3]

-- fromList [Foo 1, Foo 2, Foo 3, Foo 4]

Dissolution

data Foo = Foo { fooId :: Int64, fooVal :: Int }

fooVals :: Set Foo
fooVals = Set.fromList [Foo 1 1, Foo 2 2, Foo 3 3, Foo 4 4, Foo 5 1, Foo 6 3]

-- fromList [Foo 1 1, Foo 2 2, Foo 3 3, Foo 4 4, Foo 5 1, Foo 6 3]
eborden commented 4 years ago

Arguing with myself here. Intention will be required in removing entityVal calls, so this might not be as bad as I was initially thinking.

parsonsmatt commented 4 years ago

I am planning on continuing to maintain persistent-2.11 with bug fixes and minor improvements, so if you can't migrate to 3.0 then that's fine. A big part of why I want to do the 3.0 instead of 2.12 is to signal that this is a big change.

I'm hearing that you have a use case for Foo `Without` Key Foo, whatever Without ends up looking like. I think you could get there using Partial Foo, doing:

Set.fromList . map (\foo -> foo { partialFooId = Nothing }) . map toPartial 

It's kind of ugly, but it works? Supporting this in more detail might look like:

newtype WithoutId a = WithoutId a

instance (Ord (Partial a), PersistEntity a) => Ord (WithoutId a) where
  compare (WithoutId a0) (WithoutId a1) =
    comparing (nullifyId . toPartial) a0 a1
tysonzero commented 4 years ago

@parsonsmatt Wouldn't New Foo already give you Foo without the id?

parsonsmatt commented 4 years ago

New Foo would also be missing any other default-able fields (eg createdAt or updatedAt timestamps if so specified).

parsonsmatt commented 4 years ago

The problem with a generic FooWithoutId type being made is "What do you do with natural/composite keys?"

My library record-wrangler could pretty easily remove a value from an entity if this use case is common.

tysonzero commented 4 years ago

I kind of assumed people wanting full backwards compatibility would not have any default fields anyway, since they get meaningfully use them outside the initial migration. But yes other default fields would change things.

Yeah in general in our case FooWithoutId is no more fundamental than FooWithoutTimestamp.

It seems like nice extensible records underneath these types would be best, which seems similar to what you are aiming for with record-wrangler. Unfortunately Haskell's extensible records story is still in fairly early stages, although RecordDotSyntax is a nice step forward.

sestrella commented 4 years ago

@parsonsmatt about creating a completely new data type for partial records, perhaps that is something that could be avoided if we use TypeFamilies for columns with default values, then we should be able to pass an extra flag depending on the operation (read / write) in order to allow users to override a default value, or just leave it empty so the DB deals with that column:

Given the following template Haskell:

User
  firstName Text
  createdAt UTCTime default=now()
  updatedAt UTCTime default=now()

It could output something like this:

data User (op :: Operation) = User
  { userFirstName :: Text
  , userCreatedAt :: HasDefault op UTCTime
  , userUpdatedAt :: HasDefault op UTCTime
  }

data Operation = Read | Write

type family HasDefault (op :: Operation) a where
  HasDefault 'Read a = a
  HasDefault 'Write a = Maybe a

insert would look something like the one below:

type instance Partial (User 'Read) = User 'Write

insert :: Partial record -> m record

Then for a HasDefault field if the value is set to Nothing persistent would let the DB to deal with that by excluding the column from the INSERT INTO statement:

insert $ User "John" Nothing Nothing
-- INSERT INTO users (first_name) VALUES (?)

Otherwise, the value should be passed directly to the INSERT INTO statement:

insert $ User "John" (Just createdAt) Nothing
-- INSERT INTO users (first_name, created_at) VALUES (?, ?)

I think the main benefit of using a single data type is consistency, in the sense that record fields would look the same for read and write operations, which I think is more intuitive for developers. Any thoughts?

parsonsmatt commented 4 years ago

I would really like to avoid the Higher Kinded Data pattern. That's a level of complexity that I'd rather avoid.

eborden commented 4 years ago

New Foo would also be missing any other default-able fields (eg createdAt or updatedAt timestamps if so specified).

These are actually soft identifiers that break an ideal value based Eq, so New Foo might actually cover the use case.

eborden commented 4 years ago

Is New maybe an over-fit name? I like Partial, but it seems you intend a different use for that.

belevy commented 4 years ago

So following on from @eborden, New is the minimal version of the type, perhaps it could be MinimalUser or BaseUser. I think Base fits really nicely with the HList approach to the partial type.

cdparks commented 4 years ago

In general, I think this looks really solid. It solves the main problem that we've had to work around in graphula, which fully depends on persistent's Entity model. I suspect we'll be able to radically simplify graphula once this is released.

I'd also like to second using a data family instead of a type family with an injectivity annotation. I think the only thing you lose there is the ability to re-use the base constructor as the Partial/New constructor, which shouldn't be a big deal.

parsonsmatt commented 4 years ago

Fixes #1074

danbroooks commented 3 years ago

~I'd like to roll up as many other big changes as possible. If you have something you'd like to change or break about the library, now is a great time to mention it.~

~Thought I would link https://github.com/yesodweb/persistent/issues/849 here as it talks about making SqlPersistT a newtype wrapper, would 3.0 be a good time to make a change like that?~

Ok scratch that, I've just seen https://github.com/yesodweb/persistent/issues/1038 :) I will link those issues.

malteneuss commented 1 month ago

Really looking forward to have this at some point. A better (CQRS) separation between read models and insert, upsert etc. models seems very helpful.