yesodweb / persistent

Persistence interface for Haskell allowing multiple storage methods.
MIT License
467 stars 297 forks source link

Generalizing `insert` and gracefully handling defaults #995

Open parsonsmatt opened 4 years ago

parsonsmatt commented 4 years ago

Raise your hand if you've been frustrated by needing to specify a time or use Maybe for createdAt UTCTime fields.

If you're one of the few folks that hasn't, well, let me introduce the problem.

User
  name String

This is fine and easy. To insert a new user into the database, we write insert User { userName = "Matt" }. There is an implied surrogate key that is associated, and it should be an auto-incrementing integer. Because it has a default in the database, we don't need to specify it. This pattern is so common that we have the Entity type, which includes the Key entity for the entity.

Then we want to record when a user is created.

User
  name String
  createdAt UTCTIme

Database users are accustomed to writing a schema like:

CREATE TABLE user (
  id   SERIAL PRIMARY KEY,
  name VARCHAR NOT NULL,
  created_at TIMESTAMP NOT NULL DEFAULT=NOW()
);

And the persistent library happily supports the default= syntax, which does The Right Thing with migrations.

User
  name String
  createdAt UTCTIme default=NOW()

Unfortunately, we reach a problem when we go to insert a new value. insert's type is insert :: entity -> SqlPersistT m (Key entity). And userCreatedAt :: UTCTime - it's a required field!! So now we have two options:

  1. Make the timestamp in Haskell and forego the database default, writing:

    newUser :: String -> SqlPersistT m UserId
    newUser userName = do
      userCreatedAt <- liftIO getCurrentTime
      insert User {..}

    But this gets really annoying as the User gets additional arguments, and people really don't like writing this out when the database defaulting mechanism is designed to provide exactly this.

  2. Make the definition nullable and provide Nothing:

    insert User { userName = "Matt", userCreatedAt = Nothing }

    The database defaulting mechanism works out here, hooray. But now we have to care about Maybe at every use site! Gross.

So here's my plan:

  1. Extend the PersistEntity class with an associated type New:
    class PersistEntity entity where
        type New entity :: *
  2. Change the signature of insert to be: `insert :: (PersistEntity entity) => New entity -> SqlPersistT m (Key entity)
  3. For a definition with no default= clauses, define type New User = User
  4. For a definition with default= clauses,
    1. define a datatype NewUser with the required fields of a User and Maybe fields for any defaultable types
    2. define type New User = NewUser

In the QQ syntax, we can introduce a new attribute !default-only, and any field with a default-only attribute does not include that field in the New type. So we could write this:

User
  name String
  createdAt UTCTime default=NOW() !default-only

and we'd be able to write simply insert NewUser { newUserName = "Matt" } and it Just Works, precisely like you'd want it to.

Alternatively, we might want to default to default= things not being in the NewUser, and an attribute !allow-override, which puts a Maybe in the New record.

This also helps solve some of the issues with custom Id and Primary declarations. For example, consider this Person type with a UUID:

Person
  Id    UUID
  name String

With this example, it's a SQL-time error to do insert Person { personName "Matt" } - there won't be a default given for the UUID. So in this case, we actually want to define type New Person = NewPerson:

data NewPerson = NewPerson
  { newPersonId :: UUID
  , newPersonName :: String
  }

But if we specify a default, then we can have this pairing:

Person
  Id    UUID default=uuid_generate_v4()
  name String
instance PersistEntity Person where
  type New Person = Person

This design seems to work pretty well to solve all the pain points I experience with this stuff. I'm curious if anyone else has any input on pain points that may be addressed by this, or if there are design flaws that I haven't considered.

MaxGabriel commented 4 years ago

Not commenting on the design, but this would definitely improve persistent for us. We currently use triggers to set created/updated at:

      CREATE TRIGGER payment_purpose_document_join_table_insert BEFORE INSERT ON payment_purpose_document_join_table FOR EACH ROW EXECUTE PROCEDURE create_timestamps();
      CREATE TRIGGER payment_purpose_document_join_table_update BEFORE UPDATE ON payment_purpose_document_join_table FOR EACH ROW EXECUTE PROCEDURE update_timestamps();

and insert zeroed out UTCTimes:

-- | Placeholder value to use when you need an unused `UTCTime` to build a Persistent model.
-- This is necessary for columns that the DB auto-populates (& overrides) such as `createdAt` and `updatedAt`.
fakeUTCTime :: UTCTime
fakeUTCTime =  UTCTime (fromGregorian 1 1 1) 0

Which is not too bad, but hacky. Because of it, we often don't include the createdAt field on our .persistentmodels definition unless we actually need it in the code.

I think this feature could also be useful when you add new fields with a default, that you usually don't care about but sometimes do. For example if I add an isAdmin Bool then I have to go update everywhere that I insert a new user. But if I do isAdmin Bool default=false !default-only I'm saved the hassle. This use case suggests I may want a way to insert records to the database and override their default values, though (which is also plausible for e.g. createdAt type values if you are inserting historical data or other weird edge cases).

Potentially if this feature is well received, and this would be a huge breaking change and have some downsides, it would obviate Entity entirely?

Also as a side note just in case it was copied from real code, if you have:

Id    UUID default=uuid_generate_v4()

You may get improved performance from uuid_generate_v1() or uuid_generate_v1mc(), since subsequent UUIDs will sort next to each in the database table (is my understanding)

hdgarrood commented 4 years ago

I like this idea! However, it would be a pretty huge breaking change as described at the moment. Do you think there might be a sensible route to having New entity always be the same as entity for existing code, and having New entity be a separate data type only in cases where new annotations associated with this feature are being used?

Also, it should always be possible to go from User to NewUser, right? You're essentially just taking a subset of the fields? I think it might be useful to include a function entity -> New entity in the type class so that this is encoded more clearly.

I'd suggest keeping the Entity type around even though this feature could potentially obviate it, too, as removing or deprecating Entity would make this change a much bigger breaking change than it otherwise needs to be. I think libraries are using Entity for hanging instances on to as well, e.g. I think Entity helps esqueleto select instances for some of its own classes.

parsonsmatt commented 4 years ago

Yeah - a function like insertRecord :: PersistEntity record => record -> SqlPersistT m (Key record) would be useful.

I think it might be useful to include a function entity -> New entity in the type class so that this is encoded more clearly.

Yes, definitely. Good call!

Potentially if this feature is well received, and this would be a huge breaking change and have some downsides, it would obviate Entity entirely?

I'm not inclined to remove Entity. I'd actually really like to figure out a way to "annotate" datatypes like that - it should be possible to write eg

User
  WithTimestamps
  name String

where data WithTimestamps record = WithTimestamps { createdAt :: UTCTime, updatedAt :: UTCTime, record :: record }, and then the "full" type would decompose to Entity (WithTimestamps User). I'd like for these to be user-definable and customizable, as well.

But that's only one way to approach data factorization.

MaxGabriel commented 4 years ago

The breaking change could be phased in over time:

V1: Add insertRecord and insertNewRecord (example names). Deprecate insert. Suggest using insertRecord as direct replacement, or insertNew V2: Undeprecate insert, make insert = insertNewRecord

But I think given that for most cases existing code will just work, I'd be in favor of a straight breaking change

tysonzero commented 4 years ago

I would love to see this feature implemented.

However I think it's important for this to effectively replace Entity, as once I have New User it doesn't make much sense to me why User wouldn't have an id field.

Currently we only deal with User for initial insertion, as after that we often need the id. We also never use Entity User, due to the unwrapping noise. Instead we use Esqueleto to grab the individual fields, and put them in a custom local type that includes all the fields we need.

Keeping around Entity for a while for backwards compatibility seems reasonable. However I have to say it is my least favorite part of persistent. It leads to a significant amount of verbosity and also means that there are two versions of basically every function.

parsonsmatt commented 4 years ago

I think that we can keep Entity, but it'll be replaced by:

newtype Entity record = Entity { entityVal :: record }

entityKey :: (PersistEntity record) => Entity record -> Key record

It is still useful as an instance hanger (as @hdgarrood mentioned), and this should be backwards compatible with most uses of it.

ocharles commented 4 years ago

In your description you outline two options, but that's not exhaustive. A third option is that when you're inserting you should be providing SQL expressions, not Haskell values. When you do that, you can insert a User as a full record, with no need to generate timestamps in Haskell:

insert User { userCreatedAt = nowExpr }

This is like writing INSERT INTO user(createdAt) VALUES (now()).

I'm not sure this fits in with the design of persistent though, but thought it should be mentioned.

My other thoughts are that having Nothing mean default is gross, as you write in:

The database defaulting mechanism works out here, hooray. But now we have to care about Maybe at every use site! Gross.

But I only think it's Maybe that is gross here, an intentional type for defaults would be clearer:

insert User { userCreatedAt = Default }

With that, I'd probably look at having some kind of "create default with essential data function":

defaultUser :: NonDefaultFields -> User

I should clarify that I'm not a persistent user, so please don't put too much weight into my comments. I just like API design :sweat_smile:

retnuh commented 4 years ago

I'm also not a persistent user currently, but am curious to see how this evolves. I'm just wondering how this interacts with upserts?

tysonzero commented 4 years ago

In your description you outline two options, but that's not exhaustive. A third option is that when you're inserting you should be providing SQL expressions, not Haskell values.

This is probably more in scope for a library like Esqueleto. But yes in my personal use case, the most ideal situation would be an API pretty much identical to what PostgreSQL gives me for these tables.

But I only think it's Maybe that is gross here, an intentional type for defaults would be clearer:

Yeah I'd agree with that. Perhaps instead of using Maybe for either default or null we could use a GADT that handles both cases. Particularly since Maybe and null don't really work the same way, only the former can be nested.

data Insert n d a where
    Null :: Insert Nullable d
    Default :: Insert n Defaultable
    Explicit :: a -> Insert n d

With that, I'd probably look at having some kind of "create default with essential data function":

defaultUser :: NonDefaultFields -> User

I wish Haskell had better structural typing support in general such as { foo = bar } desugaring directly to an anonymous record.

As it stands I don't see a way to make this overly clean:

insert User { foo = 5, bar = Explicit 7, baz = Default }
-- vs
insert (defaultUser NonDefaultFields { foo = 5 }) { bar = Explicit 7 }

The latter scales better when you have lots of default fields, but it's still kinda ugly.

Something like:

insert $ User { foo = 5, bar = Explicit 7 }

Would be much nicer but would require proper first class anonymous record support.

parsonsmatt commented 4 years ago

Based on some preliminary testing, we can define:

type family New record = result | result -> record

and this has the same inference properties that we want, so that:

data User = User { userName :: String, userAge :: Int, userId :: Int }
data NewUser = NewUser { newUserName :: String, newUserAge :: Int }

type instance New User = NewUser

class PersistEntity record where
  insert :: New record -> IO record

instance PersistEntity User where
  insert NewUser {..} = pure User 
    { userName = newUserName
    , userAge = newUserAge
    , userId = random
    }

infers nicely and correctly. We can also have:

data NoAutogen = NoAutogen { a :: Int, b :: Char }

type instance New NoAutogen = NoAutogen

and the inference is A-OK.

eborden commented 4 years ago

Marking myself in agreement with what has already been stated.

However I would like to raise my support for keeping Entity. I actually like the separation of identity from value that Entity provides. An Entity Foo is a reference to a Foo that exists in the database, where as a Foo is just a simple value with all the sane Eq, Ord, etc instances.

tysonzero commented 4 years ago

That’s what New will now indicate:

Foo -> New Foo

Entity Foo -> Foo

eborden commented 4 years ago

@tysonzero moved my comment over to https://github.com/yesodweb/persistent/issues/1037#issuecomment-611136459

goolord commented 4 years ago

I think this is a great change. It would make adding "#281: syntax for updatedAt or other database performed updates" a breeze