yesodweb / persistent

Persistence interface for Haskell allowing multiple storage methods.
MIT License
467 stars 297 forks source link

Documentation re intended use and best practices #1115

Open parsonsmatt opened 4 years ago

parsonsmatt commented 4 years ago

It has been reported that persistent encourages you to use your database types in your domain.

We should investigate the documentation for persistent and ensure that is not the case. We should instead encourage a separation between domain types and database access types.

MaxGabriel commented 4 years ago

In the general case, I don't really buy that separating database types from some separate "business logic type" is really helpful. For most applications, if I have a users table in my database, I can have a Persistent Entity called User and my code can interact directly with that. In most apps, the "business logic" is pretty much directly interacting with the database anyway (e.g. "CRUD" apps), so a layer of indirection is likely useless.

Also, how you interact with a database is often really important to the semantics of your program (e.g. operating in a transaction so all writes execute or none or rows are locked for writes when necessary), as you said here: https://twitter.com/mattoflambda/status/1296526091952111616?s=20

googleson78 commented 4 years ago

In any application with a relational db that I have an A containing a [B], and in which I want to make some queries based on the Bs, it's usually not a good idea to represent your A with an actual list of of Bs in the db (e.g. for db performance reasons)

Similar things apply to sum types - there are multiple ways to represent them in a database, but I would definitely prefer for those direct representations not to appear in my API.

These two cases alone are so broad that I personally would always default to creating a "domain" type and a "db" type, and converting between them, and only think about optimisations if this conversion actually turns out to be slow.

joelmccracken commented 4 years ago

I think the thing is to be aware of the tradeoffs and if they become a problem address them by converting; I would def default to using them in most things though (and that is what I am doing in my own yesod app right now)

codygman commented 4 years ago

We should instead encourage a separation between domain types and database access types.

How do you motivate that though @parsonsmatt ? I'd be willing to bet most persistent users use the database access type directly.

I think it would be hard to actually get people to do this. Let me start with asking how do we prove to them (and ourselves) that:

[persistLowerCase |
    PersonDBO
      age Int
|]

data Person = Person { _id :: Int64, age :: Int}

mkPerson :: Int -> Maybe Person
mkPerson = if age pdbo > 0 then ... else Nothing

personDBOToPerson :: PersonDBO -> Maybe Person
personDBOToPerson pdbo = mkPerson <$> age pdbo

getEndpoint personId = do
    person <- personDBOToPerson <$> get personId
    case person of
       Just p -> HttpResponse "found person with age" <> age p
       Nothing _ -> HttpResponse "found no person"

is better than:

[persistLowerCase |
    PersonDBO
      age Int
|]

getEndpoint personId = do
    person <- get personId
    unless (age person > 0) $
        HttpResponse "person has invalid age"
    case person of
       Just p -> HttpResponse "found person with age" <> age p
       Nothing _ -> HttpResponse "found no person"

Note that I made these two slightly different for a reason.

This is the opposite of Parse, Don't validate and I think that's a negative.

I think that currently persistent for some unknown reason does encourage using entities directly and that it encourages code with more logic in the handlers.

I think it's because people are used to putting more logic in their handlers when coming from other languages and not encoding invariants in their data.

Basically they think:

I want a site that displays a users age. I'll write a handler to grab it from the database, do some verification, and then print it out.

Rather than:

I want a site that displays a users age. First I need a type to represent valid users. How can I make impossible users unrepresentable? Okay, only now that I have my code to create users can I start working on my handler.

I think it's partially because starting to work on the handler feels more productive. The problem is until they experience the benefit of the second approach first-hand there's not much anyone can do to convince them that the second is truly better.

Back to the example. I left those as Maybe because I think that's the default when people start to write smart constructors. So imagine you were entertaining "separate your domain types from your db types" after having had a default that looks like option number 2.

You'd probably think... great... now I don't get as good of an error and have all of this extra boilerplate. To recover the error you have to add a little more boilerplate in the form of either.

So if that person hasn't already felt like they spent enough time messing with this for seemingly nebulous benefits the least effort version of that using String would look like:

[persistLowerCase |
    PersonDBO
      age Int
|]

data Person = Person { _id :: Int64, age :: Int}

mkPerson :: Int -> Either String Person
mkPerson = if age pdbo > 0 then ... else Left "person has invalid age"

personDBOToPerson :: PersonDBO -> Either String Person
personDBOToPerson pdbo = mkPerson <$> age pdbo

getEndpoint personId = do
    person <- personDBOToPerson <$> get personId
    case person of
       Right p -> HttpResponse "found person with age" <> age p
       Left str -> HttpResponse str

Then the question might become:

"What did I gain from moving the error out of the handler and into the mkPerson function? It's the same, right?"

To which the answer is you gained a safe way to construct Person and the type guarantees certain properties hold.

Or put more directly: You don't have to do an if check in every handler you use Person.

@MaxGabriel I think this example might answer why the separation is helpful. As for all writes execute or none, nothing would prevent you in doing that when the domain and database types are separated right? You just convert from the domain type to the database type when you need that sort of guarantee, right?

parsonsmatt commented 4 years ago

A common antipattern in database schema is to have a "God Object," usually a User, that has all the fields you might want to render on a user's profile page. This means you have a toooon of nullable fields and a ton of mixed concerns and a ton of references.

The datatype to properly render a user's profile page - data UserProfile = ... - is a valid and useful datatype. But it almost certainly isn't the same as a good database representation for these details!

I think most domain types aren't properly represented by a single database table. So you can either a) request database tables from the API and stitch them together on the front-end, or b) request an API type which the server combines multiple tables to do appropriately.

@googleson78 noted it very well. A type for a Twitter user might be something like:

data User = User
  { followers :: [User]
  , retweets :: [Tweet]
  , tweets :: [Tweet]
  , favorites :: [TweetId]
  , name :: Text
  , atHandle :: Text
  }

But having a persistent-sql entity that directly mirrors this would be awful! It's totally denormalized. A database table structure that would be better for this would be:

User
  name Text
  atHandle Text

Tweet
  body Text
  author UserId

Retweet
  user UserId
  tweet TweetId
  Primary user tweet

Favorite
  user UserId
  tweet TweetId
  Primary user tweet

Now, with this database schema, our front-end can either call the users/#{userId} to get the User in question. Then tweets?by_user_id=#{userId} route, then retweets?by_user_id, etc. until the datatype is present on the front-end.

But this isn't great. Really the API would do better by firing a single request and receiving the full payload. So instead we want to summon the necessary tables via SQL, and then form the API type, and send it over the wire. Aside from the advantages of doing this in Haskell vs JavaScript or whatever, we can also take advantage of transactions - so the view of the data can be assumed to be consistent. Otherwise, there could be varying data depending on how quickly those requests came in. This may or may not be a dealbreaker.

But! You definitely want to have transactions when writing or saving data. And if your front-end wants to save a User by posting to each endpoint separately, then you're going to have a bad time. Posting the Big API Type allows you to save the whole thing in a single transaction, which won't leave your database in an inconsistent state.

Even better is a datatype representing the action that updates the database state. data NewTweet contains basically the body and author ID, which is fine. But data ModifyProfile = ... instead contains the new user profile information - some of which may modify fields in the UserBio table while some modifies the User table, and some modifies the UserLogin table, and some modify the... well, you get it.

Database types are serialiation and query types. They're probably not exactly what you want in your API. Though they certainly may be! Datatypes are cheap, and writing a custom datatype for a use-case is probably a pattern that should be used more.

charukiewicz commented 4 years ago

It has been reported that persistent encourages you to use your database types in your domain.

Any idea why the author of that Tweet (or anyone else) thinks that persistent encourages this? Is it the API itself? Is it the module documentation? Is it the persistent chapter in the Yesod Book?

I think addressing the issue of people thinking persistent encourages this requires figuring out where persistent encourages this. Personally, I think some of the language in the linked Yesod Book chapter might be part of it (emphasis mine):

Some other nice features are:

  • Convenient data modeling. Persistent lets you model relationships and use them in type-safe ways. The default type-safe persistent API does not support joins, allowing support for a wider number of storage layers. Joins and other SQL specific functionality can be achieved through using a raw SQL layer (with very little type safety). An additional library, Esqueleto, builds on top of the Persistent data model, adding type-safe joins and SQL functionality.

Perhaps what's missing is an explicit statement that coveys something like: Sometimes you'll feel that your persistent model types won't serve the needs of your domain logic. This is common, and if you find yourself in this situation, it makes sense to create additional data types that do fit well with your domain. Converting to and from persistent data types is a common practice when using the persistent library.

Either way, I think this is a good thing to put effort into, but I think we need to identify where it is that people reading about persistent get the impression that it encourages this. Addressing the issue will become a lot easier from that point.

joelmccracken commented 4 years ago

so there are a few questions here:

  1. Does persistent indeed encourage developers to use them as domain objects?
  2. Is that a good idea to do?

Personally I think 2 is only sometimes true, but I have never seen 1, except that perhaps there are minimal example tutorials which "omit" this conversion to domain objects, as is often a problem in programming tutorials, because it would otherwise distract from the "main" point of the tutorial, which is to show how persistent works.

edit: Ahh @charukiewicz just said the same thing basically

codygman commented 4 years ago

I'm beginning to think "don't use db types directly in prod code" is the best thing to say. Basically using persistent types directly seems to have a social effect of discouraging actual domain modelling.

Will post at length about this idea later.