prisma / prisma

Next-generation ORM for Node.js & TypeScript | PostgreSQL, MySQL, MariaDB, SQL Server, SQLite, MongoDB and CockroachDB
https://www.prisma.io
Apache License 2.0
38.23k stars 1.48k forks source link

Additional type safety via field-level type branding #12860

Open chrisj-back2work opened 2 years ago

chrisj-back2work commented 2 years ago

Problem

Prisma type safety doesn't help with dev confusion at a field level.

Imagine a schema with many models including both Contract (with R) and Contact (no R).

The dev is writing queries and mutations, e.g. a sendContactWelcomeMsg wrapper that calls db.emailRequest.create. The screen or function s/he's working on involves both contacts and contracts and due to a typo, copy-paste sloppiness, or plain confusion s/he accidentally passes the contract_id (with R) to the email-request function that requires a no-R contact ID.

What will happen?

Similar problem: confusion with ID fields for a create function (could consume a PK value that will arise correctly / be needed later), an upsert function (disastrous - merging unrelated data onto an existing record), or a delete (also a disaster).

Different problem patterns exist for non-key fields. It would probably be much harder to detect if a dev accidentally writes the user ID to an account balance field; or accidentally writes an employee's social security number to an annual salary field; or accidentally writes dollars to a euros field.

This is a real problem. I've seen the "upsert onto the wrong record", "delete the wrong record", and "write numeric value to an unrelated field" errors in production applications. The result is severe data integrity and customer support problems.

Suggested solution

Prisma has wonderful type safety at the type level. Extend the same protection (optionally) to the field level by defining field types that go beyond primitive data types. This could be done using some form of type branding and a @branded modifier.

ID fields

ID fields are a special case -- probably the most important case to solve, and probably simpler than the general case.

An interface that requires an ID field (as PK or FK) shouldn't accept just any generic string or number that matches the primitive data type -- each ID field needs to be a distinct type with no automatic type conversion / casting. A simple parent / PK and child / FK example:

model User {
  id String @id @default(dbgenerated("uuid_generate_v4()")) @db.Uuid @branded
}

model UserEvent {
  id String @id @default(dbgenerated("uuid_generate_v4()")) @db.Uuid @branded
  user_id String @db.Uuid @branded
  user User @relation(fields: [user_id], references: [id], ...)
}

Then in the Prisma client/index.d.ts file, generate a branded type, and propagate it through all definitions:

type Brand<K, T> = K & { __brand: T }

export type User_ID = Brand<'string', 'User_ID'>

export type User = {
  id: User_ID
}

export type UserEvent_ID = Brand<'string', 'UserEvent_ID'>

export type UserEvent = {
  id: UserEvent_ID
  user_id: User_ID
}

Prisma would consider it to be a schema error if the @branded modifier appears on a PK and not all its FKs; or appears on any FK but not the PK. Prisma already enforces the rule that PK and FKs must share the primitive type -- now could do the same with the branded type instead.

When creating local variables for use in Prisma operations (as opposed to typed / branded results from queries and mutations), devs would need to import these types and use them when declaring variables that would be passed to Prisma operations, like let userId : User_ID = whatever. Note that what I'm proposing is backwards-compatible with all existing Prisma schemas and apps, because the schema developer isn't obliged to use @branded, but once s/he does, declaring variable types like this will be required to satisfy the narrower / more powerful types (depending on eslint / tsc settings).

Please consider this as an illustrative example -- I'm not trying to be prescriptive. The branding function I gave is too simple -- maybe something like this would be more suitable: https://stackoverflow.com/a/70262876/763269. I don't have a strong opinion about the internal naming conventions. All those can be debated later if the idea appeals to you. The main naming requirement is that the type name for the ID-field brand should have the same uniqueness quality / scope as the model name itself, so the uniqueness of model names would ensure the uniqueness of per-model ID type names.

I don't know if this concept should also apply to @unique fields or just @id fields.

The same solution pattern applies no matter what the primitive type is for the ID.

Non-ID fields

Non-ID fields are also an important case but somewhat more complicated.

ID fields are self-defining (from the core model's @id field) and it's obvious where those types should be applied (in the core model / PK field, and in related models / FK fields). The @branded attribute doesn't need more info for ID fields.

Non-ID field types don't have a single defining field -- e.g. there could be many "US dollar"-typed fields in the model. We can't rely on a field-naming pattern to imply the type (e.g. balance, amount etc. all may be denominated in USD). The type for a non-ID field needs to be named by the schema developer in some way, e.g. maybe via parameter(s) to @branded:

model InvoiceLine {
  id String @id @default(dbgenerated("uuid_generate_v4()")) @db.Uuid @branded
  unit_price Decimal @db.Decimal(23, 8) @branded('usd')
  quantity Int
  ext_price Decimal @db.Decimal(23, 8) @branded('usd')
}

model User {
  id String @id @default(dbgenerated("uuid_generate_v4()")) @db.Uuid @branded
  balance Decimal @db.Decimal(23, 8) @branded('usd')
}

Then in the Prisma client/index.d.ts file:

type Brand<K, T> = K & { __brand: T }

export type Brand_usd = Brand<'Prisma.Decimal', 'usd'>

export type InvoiceLine_ID = Brand<'string', 'InvoiceLine_ID'>

export type InvoiceLine = {
  id: InvoiceLine_ID
  unit_price: Brand_usd
  quantity: Prisma.integer
  ext_price: Brand_usd
}

export type User_ID = Brand<'string', 'User_ID'>

export type User = {
  id: User_ID
  balance: Brand_usd
}

Prisma would consider it to be a schema error if all fields with the same branded name don't have identical primitive data types (although they could differ in nullability and maybe some other attributes).

Naming pattern is a more interesting problem in this case. My initial reaction is brands should have the same conventions as model names -- distinct PascalCase names. But I'm not convinced of that. The closest analog to branding in the DB layer is Postgres domains with the normal PG naming convention, so you could have distinct usd, Usd and USD domains all in a single PG schema. Also the Prisma-provided type brands are Typescript types, which in general aren't required to have any particular name pattern, and an app could have several similarly-named types that exist prior to Prisma type brands and need to coexist with or be refactored to use the type brands. So I suspect the rule should be, any number of brands can be defined in a Prisma schema, and the precise name provided (i.e. any valid TS identifier) defines / identifies the branded type.

Alternatives

Brand as a first-class entity

While the ID-field approach seems neat and clean to me, the non-ID field approach I outlined is a little messy because each field that shares the brand needs to have identical typing in other respects too. It could make more sense to define the brand as a new top-level entity in the schema and apply that as a type, not just a modifier, in the Prisma schema:

brand Usd { Decimal @db.Decimal(23, 8) }
##          ^^ could be any valid Prisma field type definition

model InvoiceLine {
  id String @id @default(dbgenerated("uuid_generate_v4()")) @db.Uuid @branded
  unit_price brand.Usd
  quantity Int
  ext_price brand.Usd
}

Multiple fields that share a type brand must certainly share the more fundamental types. That can be defined implicitly for ID fields (is fully specified by the model's existing @id field definition), but would need to be explicit for non-ID fields.

Multi-field types

Prisma models can have multi-field ID fields specified by the @@id attribute. The problem is essentially the same (using a wrong field value in a 2-field PK reference is every bit as bad as doing it with a 1-field PK). While Prisma does generate types that represent multi-field IDs and unique constraints, we would still need the branding concept for individual fields within those multi-field types to help prevent the motivating problems like the Contract vs. Contact confusion.

Native DBMS support

This seems like an anti-requirement to me.

When the target DBMS supports a custom domain or type construct, it could be tempting to generate the DDL that corresponds to the branded type -- e.g. Postgres CREATE DOMAIN or MS SQL Server CREATE TYPE. This seems like a can of worms -- what happens if a field's branded type is removed or re-assigned in the Prisma schema? Is the ripple effect in the DB layer small or large? At first blush supporting the construct seems appealing, but the main benefit can be obtained via Prisma types only with zero impact on schema generation, migration, or introspection.

Relationship to GraphQL

GQL has the concept of custom scalar types -- e.g. https://www.apollographql.com/docs/apollo-server/schema/custom-scalars/. That seems like a decent inspiration for this, although I don't think in and of itself that would deliver the level of Typescript integration I'm envisioning. I prefer how Prisma presents the type info. But maybe this helps in some way.

Additional context

This would help make Prisma more powerful than the DBMSs it layers on top of.

olalonde commented 10 months ago

Wrote another example for a kind of bug this could catch here: https://github.com/prisma/prisma/issues/21024#issue-1889031437