prisma / prisma

Next-generation ORM for Node.js & TypeScript | PostgreSQL, MySQL, MariaDB, SQL Server, SQLite, MongoDB and CockroachDB
https://www.prisma.io
Apache License 2.0
39.37k stars 1.54k forks source link

[Schema] Please rethink model block syntax #90

Closed kartikthapar closed 3 years ago

kartikthapar commented 5 years ago

model User {
  id    Int    @id
  email String @unique
}

model User {
  id       Int    @id
  email    String @unique
  username String @unique
}

My biggest gripe with the current syntax is how spacing can change every single field in the model. See adding username changes spacing for other fields. This hurts blame for absolutely no reason. I don't get why:

No pull requests with different spacing schemes.

was such a big deal.

Like gofmt and unlike prettier, there are no options for configurability here. There is exactly one way to format a prisma file.

I am not sure I buy this completely given I don't know the drawbacks for Schema syntax in v1.34 (latest stable) and what this particular new syntax really adds.

(Early enter. Writing the post)

matthewmueller commented 5 years ago

Thanks for bringing this up @kartikthapar! You bring up some very good points that I'll try and address and then we can discuss further 😊

My biggest gripe with the current syntax is how spacing can change every single field in the model.

We spent a lot of time debating this internally and I was the one who pushed for this change. We ultimately thought that the readability of the schema outweighed the occasional "over-format". Let me walk you through our thinking.

We first heard about this issue while talking to some Facebook employees. The reason GraphQL isn't space-aligned is because they rely heavily on git blame tooling. This seemed like a compelling reason to ditch space-alignment. However when we probed further, we realized that they would often run codemods across their whole codebase that would also mess up their git blame's.

We also see very large Go projects (docker, kubernetes, terraform, google cloud) running gofmt across their whole codebase everyday. This suffers the same problem you described and since we haven't heard the community complain about it, this gave us the confidence to make this call.

One feature that will help reduce this "over-formatting" problem is that formatting can be "reset" with new lines. For example,

model User {
  id        Int    @id
  firstName String @default
  lastName  String @default
}

Can be re-formatted as:

model User {
  id Int @id

  firstName String @default
  lastName  String @default
}

With all this being said, Prisma 2 is still in preview so any decision can be walked back and we'd love to your thoughts!

No pull requests with different spacing schemes.

This is ends up being a big deal for large engineering organizations and large open source projects.

Maintainers and project owners are already strapped for time and can't afford to spend it requesting aesthetic changes. I learned this first-hand both working in a team of 30 at Yahoo and also managing https://github.com/cheeriojs/cheerio (before auto-formatters like prettier came along)

I am not sure I buy this completely given I don't know the drawbacks for Schema syntax in v1.34 (latest stable) and what this particular new syntax really adds.

I'd say the main motivation for this change was that a lot of people were confusing the prisma schema with GraphQL's schema. It makes sense because they use the same syntax, but they live on different layers of the application stack. Additionally, we wanted a single file to configure prisma and model-level attributes (for things like composite indexes).

Hope this helps! As I mentioned, during the Preview period every decision we've made so far can be revisited and I look forward to hearing your perspective!

kartikthapar commented 5 years ago

Thanks @matthewmueller for addressing this.

Question:

One feature that will help reduce this "over-formatting" problem is that formatting can be "reset" with new lines. ...

Are you saying that adding a new line with a larger length variable will not auto-reset all the other lines to accommodate? If so, I am happy with that as that will solve the issue as I understand it and you can ignore the rest of this comment.


Now some other thoughts.

This is ends up being a big deal for large engineering organizations and large open source projects. Maintainers and project owners are already strapped for time and can't afford to spend it requesting aesthetic changes.

I have seen different sides of this in my (small) engineering experience. However, I don't consider this as a preference problem - I see this as a tooling problem. At Facebook for example we have 2-space indents and Instagram we use 4 (uh). It is what it is. Linter, tooling will take care of it before committing code.

Now I don't expect similar enforcement in other open source projects but I believe in strict enforcement of such patterns. If the requirement is 2 spaces - just use 2 spaces or your code is rejected.

Re: Blame

We first heard about this issue while talking to some Facebook employees. The reason GraphQL isn't space-aligned is because they rely heavily on git blame tooling. This seemed like a compelling reason to ditch space-alignment. However when we probed further, we realized that they would often run codemods across their whole codebase that would also mess up their git blame's.

We do use git blame/mercurial blame quite significantly but yes that blame is heavily overwritten everyday through extensive code-mods. I am unsure how Github handles blame but we can skip different commits and continue blaming. It gets much easier to read blame like:

- modified by abc (skip) // don't delete this code
- modified by thapar (skip) - (void)doSomething {

vs

- modified by abc (skip) model User {
- modified by thapar (skip)  id        Int    @id
- modified by thapar (skip)  firstName String @default
- modified by thapar (skip)  lastName  String @default
- modified by abc (skip)}

Another point I failed to mention before is conflicts. One of the biggest issues in codegen or such auto-format mechanisms is crazy conflicts in code. Just imagine two engineers working on the schema and adding fields to the same model. If they are using different length'd fields, they will always conflict. That is poor devEx.

matthewmueller commented 5 years ago

Are you saying that adding a new line with a larger length variable will not auto-reset all the other lines to accommodate? If so, I am happy with that as that will solve the issue as I understand it and you can ignore the rest of this comment.

I think so! Let me clarify in code. Given this example:

model User {
  id        Int    @id
  firstName String
  lastName  String
}

Option 1: Big field auto-formats all the other lines. This is not what we want.

model User {
  id                                 Int    @id
  firstName                    String
  lastName                     String
  superLongFieldName String
}

The above will mess up git blame. What you can do in this case is add a newline:

model User {
  id        Int    @id
  firstName String
  lastName  String

  superLongFieldName String
}

I'll admit, it's a bit weird that you need to please the formatter for this case, but I think it's worthwhile compromise for readability.

Despite the slight weirdness, would that solve your concern?


Also, I didn't realize you worked for Facebook! I'm glad I got my facts right 😅. I think the (skip) in the git commit is a great idea and something we should consider incorporating. Basically move formatting to a separate commit step for large codebases.

Also great point about conflicts. That's an even bigger concern than git blame. Hopefully the above formatting will alleviate that.

sapkra commented 5 years ago

I totally agree with @kartikthapar.

@matthewmueller I think the new line formatting will also mess up the datamodel. If you want to handle the formatting like this you can also use the following formatting because then the indent will make no difference in readability anymore.

model User {
  id Int @id
  firstName String
  lastName String
  superLongFieldName String
}
matthewmueller commented 5 years ago

@sapkra thanks for your thoughts. I agree that for this small datamodel, no formatting is just as readable. Plus it's simpler.

While evaluating our decision, we've been looking at large datamodels that you'd find in a real app:

1 Column 3 Column

I think the column alignment improves readability quite a bit.

Akkuma commented 5 years ago

Putting in some feedback as someone who just started playing with prisma2.

The formatting feels like a tabs vs spaces sort of argument, while achieving something more akin to Smart Tabs.

I prefer the formatting that exists today and while that does seem to come at the cost of potential blames it seems like there are a few existing/potential solutions to alleviate some problems:

  1. Introduce a new line and organize by length of field.
    • Comes at the cost of conflicting with one's preferred organization of said fields and "false" grouping
  2. Introduce a new line as needed to prevent reformatting other lines
    • Comes at the potential cost of "random" new lines and "false" grouping
  3. Separate formatting commit on generate
    • Prisma would now have to be made git aware or it would essentially be a format command like 4
  4. Allow formatting to be skipped entirely & add format command
    • Solves most problems in that you could turn off formatting and automate formatting on merging a PR via prisma2 format. Multiple people could work in unison on a branch without a bunch of conflicts in a schema file.
  5. Remove formatting altogether
    • Would be tossing the baby out with the bathwater
  6. Allow formatting only indentation
    • Bypasses conflicts due to spacing/alignment from generate. Bypasses blame issues from dramatically changed alignment.

I feel like 4 & 6 solve the most problems. 4 seems the most robust and 6 sounds simplest to appease people for now.

dortamiguel commented 5 years ago

Can this be optional?

matthewmueller commented 5 years ago

@Akkuma thanks for the thoughtful feedback! We are indeed aiming for number 4. From a syntactic perspective, we've designed the schema where formatting is 100% optional.

We don't have an opinion on if you should format or not, just that there should only be one way to format files. Rather than a hundred options and prismarc.json's flying around everywhere.

For VSCode we'll auto-format by default, but you will be able to turn that off with:

"[prisma]": {
    "editor.formatOnSave": false
},
NEO97online commented 5 years ago

I notice a pattern here that I've noticed in many other areas of software development: separation of concerns.

The formatted version is sliced vertically.

The non-formatted version is sliced horizontally.

Which one is more useful when actually working in a large dataset? I would argue that horizontal organization is far more important. When glancing at a screenshot of code, columns look far more aesthetically pleasing; but that's not how we actually read our data when editing, adding, or debugging.

We don't vertically scan columns of types or tags. We scan field names, then look ahead horizontally to see which types and tags apply to that field. This is markedly more difficult in a column format, especially when there's a large gap between the longest and shortest field names.

How does this relate to separation of concerns? It reminds me of organizing HTML/CSS/JS by file-type instead of by component. Each line in the model is like a component - it has a name, type, and decorators. These three values should be closely coupled together and independently organized; not spread out based on the position of values in other rows.

I definitely see how column formatting appears more readable, but in practice it is not. If I have to drag my cursor across the line or hold a ruler up to my screen just to see which type/decorators apply to a field, it's a bad design. Not to mention the terrible diffs and merge conflicts that this causes.

steebchen commented 5 years ago

@auderer I want to reference a few things to prove that this works even in large scale applications. I think this was inspired by Go's fmt and other languages/tools such as Terraform/HCL. They already made this decision for their language and in my opinion, it works very well. For example, if you just think about kubernetes with 80k commits and 1000+ PRs I guess you can say it's not too bad.

Also, if you have some fields with drag other fields too much with them, you can always separate them with an extra new line, so it forms another group (see @matthewmueller's picture and the User's icon & id fields).

(edit: I'm not saying the formatted solution is better, I just wanted to add this here)

jasonkuhrt commented 5 years ago

We don't vertically scan columns of types or tags. We scan field names, then look ahead horizontally to see which types and tags apply to that field. This is markedly more difficult in a column format, especially when there's a large gap between the longest and shortest field names.

I definitely see how column formatting appears more readable, but in practice it is not.

There is a tradeoff though I think. There is a visual vibration caused by preceding and succeeding lines. Columns can be a tool for mitigation... Additional factors are at play too: typographic font in use and its leading (or your override of it); syntax colouring, cursor line highlight, ... I am also not sure that code reading/scanning in practice is as linear/logical/idealized as we might imagine. fwiw Code Complete 2 by Steve McConnell makes your case.

For myself, I would be interested to try both styles on some practical projects presently.

cryptiklemur commented 5 years ago

We don't vertically scan columns of types or tags. We scan field names, then look ahead horizontally to see which types and tags apply to that field. This is markedly more difficult in a column format, especially when there's a large gap between the longest and shortest field names.

I disagree with this pretty heavily.

With the formatting that @matthewmueller posted above, my first instinct is indeed to look for the key that i'm trying to find, but now, because everything is aligned by columns, my eye naturally moved to the exact spot where i know all of the types are, or where all of the metadata is. Without that, my eyes and brain spend more time trying to figure out exactly where it needs to look to find the data I want.

If I have to drag my cursor across the line or hold a ruler up to my screen just to see which type/decorators apply to a field, it's a bad design. Not to mention the terrible diffs and merge conflicts that this causes.

This just makes me think the code in question is over-cluttered and contains too much stuff, or that things aren't logically clustered into sections.

Mentioum commented 5 years ago

I definitely see how column formatting appears more readable, but in practice it is not. If I have to drag my cursor across the line or hold a ruler up to my screen just to see which type/decorators apply to a field, it's a bad design.

Highly recommend active line highlighting:

image

  1. Search for keyword
  2. The row is highlighted across the columns.

or with vim bindings

/searchterm [enter]

I'm a fan of gofmt and universal consistent formatting so just having an option to disable the one format in whatever IDE people are using is my preference :)

rajington commented 5 years ago

This seems like the right issue, but my concern is less over formatting and more over syntax (although relying on a popular syntax could also simplify formatting decisions).

Was there more on the decision to use a non-SDL format for the schema? it feels like everything in data modeling (including the datasource) could be defined using directives:

@datasource(
  type: sqlite
  url: "file:data.db"
  provider: "sqlite"
)
type User {
  id: ID!
  createdAt: DateTime! @default(time: now())
  email: String! @unique
  name: String
  role: Role! @default(role: USER)
  posts: [Post!]!
  profile: Profile
}

type Profile {
  id: ID!
  user: User!
  bio: String!
}

type Post {
  id: ID!
  createdAt: DateTime! @default(time: now())
  updatedAt: DateTime! @updatedAt
  author: User!
  title: String!
  published: Boolean! @default(boolean: false)
  categories: [Category!]!
}

type Category {
  id: ID!
  name: String!
  posts [Post!]!
}

enum Role {
  USER
  ADMIN
}

one thing that drew me to Prisma was that datamodel, and it's departure seems surprising and less standardized/shareable

matthewmueller commented 5 years ago

@rajington that's an interesting approach! Is @datasource valid GraphQL SDL syntax? And if so, would it attach to the User or the file?

rajington commented 5 years ago

Sorry I just updated it with argument names and positioned it closer to the User.

Directives don't apply to the whole document (yet), but there could be benefits to this (different types/fields with different datasources) or it could be moved into a nearby file: datasource.gql which uses Prisma's own (archived) file format and could enable things like datasource.production.gql.

I just don't see much value in letting datasource (or inline embeds might be another tricky one) be enough to create a new type system and deviate from being as close to GQL as possible. It also forces things like declarative migrations to use less GraphQL. I'd like to see more directives become part of the spec (e.g. @length(max:255)) and by staying as close to the spec this means that it can be validated at the database, api, and even client easily.

I apologize if this discussion already happened somewhere or if there's some advanced requirements that I'm missing, but I worry it will decrease adoption, considering it started as very GraphQL-like.

matthewmueller commented 5 years ago

I worry it will decrease adoption, considering it started as very GraphQL-like.

This is definitely top of mind and a concern of ours! We were looking for a couple solutions where the GraphQL syntax broke down. As you mentioned embeds were one of them, also attaching indexes to the models also was pretty messy. Another non-technical reason was that a lot of people were confusing Prisma with GraphQL, when we sit at very different layers. This confusion is 100% understandable but the GraphQL SDL syntax wasn't helping. Hope this clarifies the reasoning a bit.

rajington commented 5 years ago

Thanks for the replies!

embeds were one of them

I really appreciate GraphQL is implementation-agnostic and doesn't leak implementation details. Previously it seemed embed was limited to MongoDB which I feel breaks that.

Implementation-agnosticism is extremely powerful, the backing store could be not just MongoDB and PG but imagine a website that has an "admin overlay" to manipulate/preview local app state changes, that can get pushed to git, then CloudFlare Workers KV at the CDN edge. Currently I'm struggling with the complexities of Apollo local state verbosity and wish I could leverage Prisma's datamodel. That was always the most exciting potential I saw for Prisma, a GraphQL-based spec across all layers. If something like OpenCRUD could be generated from these models, then Admin UIs could be fully generated from introspection, and a single Admin UI could interact with any Prisma datamodel compatible APIs, not just local or locally owned data.

There is some clarity to embeds, and I think if it's a priority it could also be accomplished with directives:

Previous syntax


type City {
id: ID! @id
name: String!
coordinates: Coordinates
}

type Coordinates @embedded { latitude: Float! longitude: Float! }

> New syntax
```graphql
model City {
  id    String  @id
  name  String
  coordinates embed {
    latitude   Float
    longitude  Float
  }?
}

Potential syntax

type City {
id: ID! @id
name: String!
coordinates: Coordinates @embed(model: `
latitude: Float!
longitude: Float!
`)
}

I would actually suggest above should create a CityCoordinates or City_Coordinates type to reconcile conflicts from multiple locally declared embeds. embeds are absolutely a worthy goal, "content trees" for website copy is a very popular yet emerging pattern for headless CMSs.

Apollo Federation open-sourced their federation spec and make a lot of use of directives, including gql-in-gql like above.

What was some of the messiness around indexes?

samburgers commented 4 years ago

Our team for one would require both reliable autoformatting and a bulletproof git/blame history like all of the other files in our codebase... no ifs or buts. This would mean yes to autoformatting support and no to extra spaces for columns. All personal readability preferences aside i'm genuinely surprised at the resilience around this as it seems like many other teams here have the same hard requirements.

timsuchanek commented 4 years ago

Internal Note: This needs more investigation.

beeplin commented 4 years ago

I am sad to say with the new default fomatter, the trick using new line to break vertical alignments is not working any more. Any way to handle it? It is really PAINFUL to fight git diff...

matthewmueller commented 4 years ago

@beeplin I agree that the current regression makes this incredibly painful. I've opened this issue to hopefully resolve this soon: https://github.com/prisma/prisma/issues/2487


Since this discussion is getting long, I'm going to re-iterate what I said in the beginning:

We're not trying to do anything controversial here. We're simply trying to follow Go's rules for formatting. Formatting rules that have been applied to massive projects like Kubernetes, Docker, and Terraform.

Here's an example: https://github.com/kubernetes/kubernetes/blob/master/pkg/apis/core/types.go#L449-L460

You'll notice that you can use comments & newlines to reset formatting. When we get this working properly, I promise that you won't ever think about the formatting, but you'll be happy that it's there.

Californian commented 4 years ago

The most future-proof solution to this would be to use tabs, and implement/encourage/wait for elastic tabstops to be implemented in various editors. This means people who want their columns aligned can have that, and those who don't want confusing git blames or crazy merge conflicts (currently experiencing the latter because of two small changes to our schema by two members) don't have to deal with that either. It's really the best of every world, and the only real blocker is that editors aren't fully on board yet.

Relevant issues: https://github.com/neovim/neovim/issues/1419 https://github.com/microsoft/vscode/issues/3932

matthewmueller commented 4 years ago

@Californian very cool concept! This would be an elegant solution for the diff issue.

I noticed they talked about gofmt implementing this in http://nickgravgaard.com/elastic-tabstops/. Unfortunately, as you mentioned, not many editors support this yet, so while they implemented the algorithm (as did we, inadvertently), the final output is spaces.

Exploring the solution space further, there's a concept of a custom git diff driver: https://stackoverflow.com/questions/34885397/using-custom-diff-tool-with-git-show/34934452 that could tell git to do a custom diff for .prisma files.

Fortunately, it's not a huge deal to adjust the formatter over time (prettier is constantly making tweaks to its JS formatting), so none of these decisions or solutions are set in stone.

Californian commented 4 years ago

@Californian very cool concept! This would be an elegant solution for the diff issue.

I noticed they talked about gofmt implementing this in http://nickgravgaard.com/elastic-tabstops/. Unfortunately, as you mentioned, not many editors support this yet, so while they implemented the algorithm (as did we, inadvertently), the final output is spaces.

Exploring the solution space further, there's a concept of a custom git diff driver: https://stackoverflow.com/questions/34885397/using-custom-diff-tool-with-git-show/34934452 that could tell git to do a custom diff for .prisma files.

Fortunately, it's not a huge deal to adjust the formatter over time (prettier is constantly making tweaks to its JS formatting), so none of these decisions or solutions are set in stone.

Awesome to hear! A combination of flexibility and a mindset that allows for changes to incorporate good solutions in the future when viable is unfortunately rare in software. One day I think we'll look back and wonder how we lived without elastic tabstops!

pantharshit00 commented 3 years ago

Thanks for the discussion everyone.

Prisma has been production ready for a while now so it wouldn't be feasible to change this now. I am going to close this discussion now.

FelixZY commented 1 year ago

I must say I'm a bit disappointed to find there is no way to opt-out of the aligned indentation model.

I agree that there's a case to be made for readability but to me, git blame is an incredibly important part of readability. Based on @matthewmueller 's response (https://github.com/prisma/prisma/issues/90#issuecomment-510383292), there seems to have been some agreement on this initially:

The reason GraphQL isn't space-aligned is because they rely heavily on git blame tooling. This seemed like a compelling reason to ditch space-alignment.

From what I can see, this reasoning was discredited and eventually dropped "because other tools messed with git blame anyways":

we realized that they would often run codemods across their whole codebase that would also mess up their git blame's.

This is not a valid excuse IMO - rather it means you or your tools are doing something wrong. I have worked in projects with many developers and years of readable git blame; it most certainly is possible.

We also see very large [...] projects [...] [that] suffers the same problem [...] and since we haven't heard the community complain about it, this gave us the confidence to make this call.

I'm complaining about it; this thread is full of people complaining about it and I'm sure there are others complaining in these other projects as well, struggling to get their voices heard in the flood of bug reports and discussions. Personally, I also think a lot of people are not using git to its full potential and may simply be unaware of the impact unrelated changes can have.


In the end, I agree with @Akkuma (https://github.com/prisma/prisma/issues/90#issuecomment-515097961) - this seems like a tabs vs spaces debate. I don't think there will be a consensus. Therefore, I would like to highlight @dortamiguel 's underappreciated comment (https://github.com/prisma/prisma/issues/90#issuecomment-515783311) and make the same request:

Can this be optional?

Could the formatter please be extended to support an opt-in "git blame friendly" mode, where column alignment can be disabled?

Sorry to revive this old thread - this is an important feature to me.

benevbright commented 10 months ago

please let us use git-blame...