prisma / prisma1

πŸ’Ύ Database Tools incl. ORM, Migrations and Admin UI (Postgres, MySQL & MongoDB) [deprecated]
https://v1.prisma.io/docs/
Apache License 2.0
16.54k stars 863 forks source link

Database migrations #1263

Closed schickling closed 2 years ago

schickling commented 6 years ago

Motivation

So far database migrations have been almost entirely implicit (except for the temporary directives @rename and @migrationValue). Especially when working in multi-stage/multi-branch environments, a deterministic (and imperative) migration solution is necessary.

Goals

Proposal

The basic idea is that whenever the data model (types.graphql) is changed, the CLI automatically creates a migration file with the missing migration steps when running prisma deploy.

Migration files are YAML based and contain a list of all migration steps and an optional before/after script. The filename starts with a UTC timestamp (format: YYYYMMDDHHMMSS) and optionally contains a short "description" (e.g. YYYYMMDDHHMMSS_create_products.yml).

The missing migration steps can always be calculated locally (without any "knowledge" of a deployed service) which is the delta of the current types.graphql file and all existing migration steps combined. If the data model described by types.graphql and the data model described by all migrations line up, the migrations are "in sync".

Minimal example

Assuming you're starting out with an empty service definition (0 models/types) and add a new User model with an id and name field, this is how your project should look like:

# types.graphql

type User @model {
  id: ID! @unique
  name: String!
}
# migrations/20080906120000.yml

steps:
  - operation: type.create
    definition: 'type User @model`

  - operation: field.create
    type: User
    definition: 'id: ID! @unique'

  - operation: field.create
    type: User
    definition: 'name: String!'

Terminology

List of all possible migration steps

steps:

    # Types
  - operation: type.create
    definition: 'type User @model'

  - operation: type.update
    id: User
    definition: 'type NewUser @model implements Human'

  - operation: type.delete
    id: NewUser

  - operation: type.create
    definition: 'type User @model'

    # Fields
  - operation: field.create
    type: User
    definition: 'firstName: String! @defaultValue(value: "John")'

  - operation: field.update
    id: User.firstName
    definition: 'name: String! @defaultValue(value: "John Doe")'

  - operation: field.delete
    id: User.name

    # Relations
  - operation: relation.create
    definition:
    - 'posts: [Post!]! @relation(name: "UserPosts")'
    - 'user: User! @relation(name: "UserPosts")'

  - operation: relation.update
    id: UserPosts
    definition:
    - 'posts: [Post!]! @relation(name: "UsersOnPosts")'
    - 'users: [User!]! @relation(name: "UsersOnPosts")'

  - operation: relation.delete
    id: UserOnPosts

    # Multi-field unique constraints
  - operation: field.create
    type: User
    definition: 'firstName: String! @unique(group: "UserName")'

  - operation: field.create
    type: User
    definition: 'lastName: String! @unique(group: "UserName")'

    # Interfaces
  - operation: interface.create
    definition: |
      interface Human {
        name: String!
      }

  - operation: interface.update
    id: Human
    definition: |
      interface DoubleHuman {
        firstName: String!
        lastName: String!
      }

  - operation: interface.delete
    id: DoubleHuman

    # Enums
  - operation: enum.create
    definition: |
      enum RGBColor {
        Blue
        Green
        Red
      }

  - operation: enum.update
    id: RGBColor
    definition: |
      enum Color {
        Blue
        Green
        Red
        Yellow
      }

  - operation: enum.delete
    id: Color

  - operation: api.lock

  - operation: api.unlock

Migration scripts

You can run a script before and/or after a migration. This allows you to migrate not only the structure of your data but your data itself (e.g. iterate over all User nodes and calculate a value for a newly created field).

A migration script can be written in any language and is simply invoked through a bash command. When a migration script fails (i.e. returns non-zero exit code) the migration will be rolled back. Migration scripts are executed locally (or in your CI environment) - not on the Prisma server.

# migrations/20080906120000.yml

scripts:
  before: node before-migration.js
  # after: node after-migration.js

steps:
# ...

Ambiguous cases

Some data model changes result in ambiguous migration steps where the developer needs to choose the desired behaviour from an interactive CLI prompt.

The following cases are ambiguous whether an element should be renamed or deleted+recreated:

  1. Renaming fields/types
  2. Renaming enum values
  3. Renaming relation names/relation fields
  4. Renaming interfaces

Here is an example for changing the type and according fields:

type Person {
  name: String
  description: String
}

# ...migrate to...

type User {
  firstName: String
  lastName: String
  comment: String
  age: Float
}

type Member {
  name: String
  title: String
  other: String
}

image

image

Deployment workflow

WIP

Implementation should incorporate changes described in Improved infrastructure for service deployment.

This is what the deployment API could look like:

type Schema {
  mutation: Mutation
}

input MigrationStep {
  operation: String!
  definition: String
  id: String
  type: String
}

input Migration {
  id: String! # UTC timestamp
  hash: String! # based on id + file content
  steps: [MigrationStep!]!
}

type MigratePayload {
  success: Boolean!
  messages: [String!]!
}

type Mutation {
  migrate(migrations: [Migration!]!): MigratePayload!
}

Design decisions

Open questions

references

marktani commented 6 years ago

Ambiguous deltas

Update: this is now accounted for in the proposal πŸŽ‰

old comment

> The missing migration steps can always be calculated locally (without any "knowledge" of a deployed service) which is the delta of the current types.graphql file and all existing migration steps combined. Deltas are sometimes ambiguous - how can the developer distinguish between different 'migration paths' in such a case? Example: ### Base Version ```yml type User @model { id: ID! @isUnique title: String! } ``` ### Latest version ```yml # latest version type User @model { id: ID! @isUnique name: String! } ``` ### Possible migrations Solely based on the two `types.graphql`, at least two migration paths are matching. One is to rename `title` to `name`: ```yml - operation: field.update id: User.title definition: 'name: String!' ``` another is to remove `title` and add `name`: ```yml - operation: field.delete id: User.title - operation: field.create type: User definition: 'name: String!' ``` Note that applying the former migration when the latter migration was intended leads to **undesired data loss.**

dpetrick commented 6 years ago

Thanks, some thoughts from my side:

Migration Scripts:

Migration .yml files: This is more like thinking out loud how things must work and tries to fill the gaps of the spec (as you didn't talk about the client-server interaction yet).

Okay so files are apparently ordered by timestamp. One goal for migrations is Resistant to merge conflicts. I assume the merge conflicts in question are git merge conflicts. I also assume that migration files are checked into version control, as the other case would be too complicated for multiple devs to handle with wiring up all the correct script files and whatnot. There will simply be next to no merge conflicts if two developers happily develop on different branches and then merge. Why? Git will just take all those separate files and throw them into the migration folder, as there are no (or rarely) name conflicts due to timestamps naming. The only potential for merge conflicts here is of course the types.graphql file, which is certainly an indicator for developers that they need to take a hard look at their migrations as well, but even this is not a given.

Now imagine a scenario where two branches from two devs diverged from master. One branch is shortliving and one is a longer, larger feature. The shortliving branch gets merged and the migrations are run. The larger branch is merged and now master has interleaving migrations of run and not run migrations:

I assume state keeping is done on the server which knows which migrations have been executed already (e.g. a traditional migrations table), or else this scenario can't work, as there is no way to detect an issue with C. It also can't be computed locally (ie. as there is no state available), so the CLI is out here and has to rely on errors from the server during migrate or deploy, which is a good thing. Will the server just error out immediately as soon as it finds a migration that is not run and before an already applied one, or will it try to resolve the conflict (more complex, more sophisticated, higher critical bug potential)? What is the flow if there is such an error? Does the developer create a new migration and delete the old one, or redate the conflicting migrations to a newer date? The tooling and the (error) messages certainly have to be on point to reduce friction as much as possible. Most of these things I mentioned are more or less solved by rails migrations already and should give us a good starting point of what developers expect.

One random thought: In rails it was common to delete or squash all migrations at some point into one base migration if the migration folder grew too large. The schema.rb file (that was more or less used as the base migration) was sort of our types.graphql. We should definitely consider this scenario, as serious projects will amass migrations fairly quickly.

One last thing here: @marktani has a good point with the migration ambiguity. I wonder how this is complicated by the flow I described above with parallel branches / parallel changes to the types, I have to think more about these edge cases.

"Apologies for being picky" points:

sorenbs commented 6 years ago

@dom

Some comments :-D

Input to migration scripts

In Rails each migration script is unrestricted ruby code - it could send an email if it wanted to. I assume this is what you mean by input? What are the primary use cases for this - I assume loading or manipulating existing data?

Script execution

The current idea is that the cli will invoke the scripts. During development, a single developer is assumed to work exclusively on a branch. We recommend spinning up a database for each feature branch. On shared databases - typically staging and production, the cli should be executed from a CI environment that can control exclusive access to running migrations. I see a future where Graphcool provides the CI environment for Graphcool database migrations, but for now people are encouraged to use their existing CI setup. Additionally I think that Graphcool should prevent multiple migrations from running at the same time. When this involves arbitrary scripts before and after the actual migration we need a way for the cli to inform the server about a migration beginning or ending. We can either make this implicit (cli sends messages to server) or explicit (the scripts will have to run mutations to mark beginning and end).

Script - connection to Graphcool

I agree that this has to be tied to input from the cli. Maybe we should have a concept of a special connection for migrations

Nested deploys

I agree that only one deploy should be allowed at a time.

Learnings from Rails

Migrations from two branches with interleaving timestamps Rails simply run all migrations in order, skipping any that has already been executed. We should investigate if this has any severe downsides in our context. Otherwise, that would probably be the solution that is easiest to understand.

Merge Conflicts Rails include the timestamp of the latest migration in db/schema.rb in order to ensure that the developer is made aware that another branch has introduced changes and should verify that there are no issues.

Relationship between migrations and db/schema.rb In Rails, individual migrations are created manually (either by writing code or giving input to the cli.) After running a migration, Rails automatically inspects the resulting database and recreates the db/schema.rb file. This file is auto-generated, and manual changes will not be preserved. So even though migrations are the source of the current state of the database, db/schema.rb is the authoritative source for the database schema. This is a bit weird, but comes out of a practical realisation:

There is no need (and it is error prone) to deploy a new instance of an app by replaying the entire migration history. It is much simpler and faster to just load into the database a description of the current schema.

In Graphcool this works differently. Our equivalent of db/schema.rb is types.graphql which developers are expected to modify directly. This file is never automatically generated. Instead migrations are automatically generated by diffing against the current state.

Also worth noting that db/schema.rb is database independent (works with mysql, postgress etc), but if engine specific features are used, rails support switching to a sql based format instead.

Transactional guarantees I don't think we can rely on database transactions for migrations. For one, this isn't supported by MySql, and secondly there is no good way to ensure that we use a single connection for both the migration steps as well as possible migration scripts. Instead we should add primitives to temporarily disable writes and potentially reads.

sorenbs commented 6 years ago

Developer actions

A list of actions a developer will perform when working with Graphcool. Please point out any actions not on this list yet so we can make it exhaustive

Init empty database

This happens when you use Graphcool the first time or when you start working on a new project

  1. gc init: Create types.graphql and first migration file to init empty database
  2. gc deploy: run first migration (or whatever we do to create empty db)

Create migration from diff

This happens on a daily basis as you make changes to your db structure

  1. Make changes in types.graphql
  2. gc deploy: generate current database schema; diff against new database schema from types.graphql; iff they are different, create migration file
  3. gc deploy: run all pending migrations against server

Recreate migration from diff

When you realise that your latest migration wasn't quite right before you actually perform the migration. Don't do this if migration has been executed against a shared database (ie. staging or prod) or the branch has been merged

  1. Make corrective changes in types.graphql
  2. gc deploy --recreate: generate current database schema, excluding last migration; diff against new database schema generated from types.graphql; iff they are different, overwrite last migration file
  3. gc deploy: run all pending migrations against server

Rollback and recreate migration from diff

When you realise that your latest migration wasn't quite right and you have already performed migration against your local database. Don't do this if migration has been executed against a shared database (ie. staging or prod) or the branch has been merged

  1. Make corrective changes in types.graphql
  2. gc deploy --recreate: generate current database schema excluding last migration; diff against new database schema generated from types.graphql; iff they are different, overwrite last migration file
  3. gc deploy --redo: roll back latest migration and then run all pending migrations against server

Run migration

When you integrate changes from the master branch that includes new migrations, or when running migrations against a shared database (typically staging and prod). Migrations have already been created and tested by developers in the feature branch where they were introduced.

  1. gc deploy run all pending migrations against server

Init new database from existing types.graphql

This happens whenever you need to spin up a new test database and when you start working on a new feature branch as well as when a new developer joins the project.

  1. gc deploy Should we replay all old migrations or simply create a new giant migration that goes directly to finished schema? Rails does the latter for performance and robustness and I think we should do the same.

Get types.graphql from existing database

If you have lost access to your migration files and types.graphql you need a way to get it back from the server

  1. gc pull-types Server generates types from the existing database structure. cli stores in local file

Note

above gc deploy always performs the following check to decide if it should a) generate a new migration file or b) perform migrations on the database: generate current database schema by iterating over all migration files; diff against new database schema generated from types.graphql; are there any changes?

antho1404 commented 6 years ago

Awesome, if this can help a bit I have a script that manage the migrations scripts. Like rails there is different migration files prefixed the date and then all migrations are stored in the database as a new type (that can be included in the system api) and from that just apply every migrations that are not yet in the database. If this can help https://gist.github.com/antho1404/576642a12f07f09f853a42ecf67a45f3

Also in rails few times I had to do a migration of the structure then run a script and migrate again the structure (when add a unique not null field) for exemple. Of course it's still possible to generate 3 different migrations (create the field, migrate the data, mark as unique and not null) but I think it could be nice to have this in a single file and so allow to add scripts between operations and not only before or after the migration

I also think that the types.graphql should only be generated, if a project have migrations it will be really messy to have migrations and types.graphql file, which one will be applied if both are modified with different data ? Maybe have a script that can generate all the initial migrations based on the file and then ignore it

sorenbs commented 6 years ago

Thanks @antho1404! This is very helpful.

One thing I want to clear up right away: The types.graphql file is never generated. It is and will continue to be the primary way you change your Graphcool database schema. If, for example you want to add a field to an existing model, the workflow looks like this:

  1. Add the field to the model in your types.graphql file
  2. Generate the migration (this happens automatically by diffing new types.graphql with previous state)
  3. Run migration

The generated migrations should not be changed manually as this will result in your types.graphql being behind causing the cli to create a new correcting migration that will revert your manual change to the migration file.

Does that make sense?

sorenbs commented 6 years ago

We should consider if migrations is the right way for us to do schema change deployments.

Redgate and Microsoft SQL Server Sata Tools use a state based diffing approach compared to rails style migrations. This is described in Critiquing two different approaches to delivering databases: Migrations vs state

This might be a more natural fit to Graphcool

antho1404 commented 6 years ago

Hum my bad I didn't read carefully

whenever the data model (types.graphql) is changed, the CLI automatically creates a migration file

I feel it's kind of weird to do that way. Maybe because I worked too much with rails but for me I feel if I want to update something on my database (types, data, relations, whatever...), I don't want to have to go in the big file with everything and loose time to know where to put my modification and maybe it will be at the beginning and the end of the file, but just want to write one file with only the updates I want to apply exactly like the migration files you describe and then just run a command to dump the new migration into the types.graphql file and then deploy the modification.

I guess maybe the architecture doesn't suit this idea and maybe I'm just really biased with rails migrations. Maybe I just have to try to think differently :)

sorenbs commented 6 years ago

It's really interesting to learn from developers with different experiences. To me, the fact that I can have one coherent view of my data model in the types.graphql file is the magic behind the Graphcool migration system.

Having to manually write migration files would be a big step backwards for me. I'd be very curious to hear from other people how they feel about this :-)

antho1404 commented 6 years ago

I think the stuff that "bothers" me a bit is to call it migrations maybe call it "diff" or something like that will be less confusing for me or also another idea if the idea is really to have the state of the database and just update the state maybe it can be an idea to have all the full different states (so different versions of the types.graphql) The stuff confusing is to have some migrations and some states I feel less confusing to have only one. Again this are just some thoughts

tamlyn commented 6 years ago

This sounds really good. I especially like that it's all static and doesn't introspect the DB.

Some thoughts:

sorenbs commented 6 years ago

@tamlyn

Presumably it would be possible to edit the generated migrations provided the end result of running all the migrations matches types.graphql? For example to change one of the "Ambiguous change" decisions made in the CLI.

Yes

Would the before/after scripts be stored in the migration file or referenced or auto loaded based on a naming convention?

Migration scripts has to be explicitly referenced from within the migration file. Migration scripts will be invoked by the CLI

In development you could end up with a whole load of small migrations while you tinker with the schema. In this case could you just delete all the uncommitted migrations and have the CLI regenerate a single migration?

This is a great point. As long as no migration scripts are involved, this will be possible

mavilein commented 6 years ago

Migrations

Sorry for chiming in so late. I thought a bit about the CLI centered approach for migrations and i don’t think it’s the right approach. I agree that migrations should be stored in the Git repo of a Graphcool project. However i don’t think that this means that migrations need to be generated by the CLI.

Disadvantages of the CLI centered approach:

Ideas for a server centered approach:

The server stores migrations in an immutable log. The server is responsible for generating migration proposals which can then be edited by the developer. Once the editing is done the migrations can be pushed to the server.

The process of deploying would look like this:

When deploying to a new env:

General unanswered questions:

schickling commented 6 years ago

Not directly related to this proposal but more on the implementation side, @cjoudrey shared a couple of thoughts with me. He was suggesting to look into online schema changes similar to these projects to continuously allow read/write access to the database while performing the migrations:

sorenbs commented 6 years ago

Step Granularity

The proposed steps are fairly coarse. given the following schema:

Initial Schema

type User {
  id: ID! @unique
  name: String
}

If a developer want to make it a unique field they would generate the following migration script:

Migration A:

- operation: field.update
    id: User.name
    definition: 'name: String @unique'

This includes the intended change (User.name should be unique) as well as the current state of the field (called name, of type String, not a list, and optional)

Merge conflicts

If a second branch created from the same initial schema introduces a change to make the field required, that migration will look like this:

Migration B:

- operation: field.update
    id: User.name
    definition: 'name: String!'

Again, the desired change as well as the current state of the field is contained in the migration.

In this situation two branches introduce changes that could in theory be compatible. Before we dive into the possible conflict resolution strategies I want to clarify how Graphcool and Git branches play together:

Graphcool and branches

In a team with multiple developers it is common to have the following environments:

local development can use the local docker based version of the GraphQL Database or connect to a hosted development cluster. Crucially, changes to the database schema during development of a feature should be applied to an environment available only to the current developer.

It is suggested that a separate branch is maintained throughout the development of the feature. When the feature is done, the branch is merged to a dedicated staging branch and continuous integration will deploy the database changes to a dedicated staging database.

When tested in staging, migrations can be performed on production.

Conflict resolution

In the scenario above, two developers created a feature branch from the same starting point, and now merge their branches to staging at the same time.

As the migrations are stored in files with unique names, there will be no git-level conflict. So we will have to deal with the conflict at deploy time.

We have 3 possible conflict resolution strategies:

  1. Reject the second deploy
  2. Let the second deploy override the first (last write wins)
  3. For each deploy, apply only the actual change, preventing the second deploy from overriding the first

1. Reject the second deploy

To do this we would need to store extra information about the origin allowing us to keep track of the origin of a deploy. This would be very similar to the tree structure git maintains for branches. Maybe we could find a trick that allows us to implement this in a more simplistic way.

2. Last write wins

This would be very easy to implement. If we go for this I believe we need some other way to make the developer aware that there is a potential issue. Rails deals with this by always forcing a source control conflict when branches simultaneously change the database schema Link.

This is easy to implement, but probably not the developer experience we want to deliver.

3. Perform more granular migrations

In this particular case the conflict could be avoided if we perform only the intended change instead of overriding all settings for the field. This can be implemented in two ways:

- operation: field.update
    id: User.name
    required: true
sorenbs commented 6 years ago

We should consider adding a migration status command: https://orator-orm.com/docs/0.9/migrations.html#getting-migrations-status

mavilein commented 6 years ago

Overview of possible Conflicts

With this comment i would like to work towards a common understanding of conflicts that can occur with regards to migrations. When we discuss actual solutions we can check how a given conflict would be handled within that solution.

Parts of a Migration

It is important to note that migrations consist of 2 parts based on our current understanding:

  1. The change that is applied to the schema of a project.
  2. The script that gets executed before and/or after the changes to the schema are made.

A conflict might affect only one or both parts of a migration. The affected part is mentioned in each of the following scenarios.

Standard Scenario

The following is the standard scenario for the potential conflicts. Developer A and B start a branch from master from the same commit. Both developers perform their changes on their respective branches. Developer A gets to merge first. Developer B merges afterwards and therefore might be exposed to a conflict.

image

1. Renames

When does the conflict occur? This conflict can occur whenever developer A renames something (model, field, enum..) and developer B does not know that change. If a migration of developer B refers to the old name it will fail when it is applied after the changes of developer A.

Example Developer A changes the name of the type User to Customer. Developer B adds a new field to the User type on their branch. Developer A merges first to master and the change gets applied to the staging environment. Developer B merges afterwards and when their changes get applied the system does not know the type User anymore and therefore fails.

Affected migration part schema & script

Consequences

2. Deleting something

When does the conflict occur? This conflict can occur whenever developer A deletes something (model, field, enum..) and developer B does not know that change. If a migration of developer B refers to the deleted thing it fails.

Examples Developer A deletes a field and Developer B refers to it (updating or deleting it).

Affected migration part schema & script

Consequences

3. Name Clashes

When does the conflict occur? This conflict can occur whenever developer A and B use the same name for something new (model, field, enum).

Examples Developer A and B create a field with the same name.

Affected migration part schema

Consequences

4. Simultaneous Updates

When does the conflict occur? This conflict can occur whenever developer A and B update the same thing (model, field, enum).

Examples Developer A changes the type of a field and Developer B makes that field required.

Affected migration part schema & script

Consequences

Conclusion

The fundamental problem is that developer B performs their changes based on the knowledge about a certain starting point. That starting point is the commit this developer used for branching. When developer B merges their pull request however this starting point might have changed drastically. Therefore B's migrations might fail or won't have the intended effect. What needs to happen here is that the changes of B need to be adapted to the new starting point. In Git speak developer B needs to rebase their migrations. In a fully manual approach that would mean to go through all the migrations they have created and reevaluate them against the new starting point.

mavilein commented 6 years ago

There are 2 ways to store migrations

With this comment i want to make us aware that there 2 fundamental ways to store migrations. The goal of a migration is that we want to apply it to a service in a given state and thereby transition it to the next state in a reproducible fashion. We can express this with the following formula: P(n) + M(n, n+1) = P(n+1). This can be read as:

Given a project in state n: If we apply the Migration steps that capture the transition of the project state n to the state n+1, we end up in state n+1.

The initial project state P(n) is always given. In a migration file we can either store the migration steps M(n, n+1) or the next state of the project P(n+1) in order to capture what needs to be done to migrate a project into the next state. Therefore we have 2 options to store a migration:

  1. Storing the migration steps: The approach that @schickling outlined initially falls into this category. In this case the server would apply the migration steps M(n, n+1) to the current project state P(n) and calculate the next project state P(n+1).
  2. Storing the next project state: In this approach a migration would store the next project state P(n+1). In this case the server would calculate the required migration steps M(n, n+1) and apply them to the project.

Conclusion

There are 2 ways to store a migration in a file. We either store the concrete steps or the next project state.

Open Question: Would it be beneficial to store the migration steps and the next project state?

dpetrick commented 6 years ago

Pro and Con of the migration approaches

Applying explicit steps

Pro:

Con:

Applying the target schema, aka inferring migration steps

Pro:

Con:

alireza-ahmadi commented 6 years ago

Is there any news on this?

falconmick commented 6 years ago

Coming from a dotnet/EF background I highly value this. Especially for the ability to roll forward and backwards in both Data and schema

robbyemmert commented 6 years ago

Here's an example which I think summarizes a lot of the concerns:

  1. Developer A renames a field and deploys to his local environment
  2. Developer A removes the temporary rename directive, and renames another field.
  3. Developer B pulls all of Developer A's changes and deploys them to her local environment
  4. Developer B adds a few changes of her own (possibly more rename directives)
  5. Developer C deploys Developer A and B's changes to staging.
  6. After testing, and a few bug fixes, Developer C deploys to production.

This is a very common process. There are a few things that would currently break (as I understand it).

To summarize: While a static schema is necessary for maintaining simplicity, and the "magic" of Prisma (of which I'm a huge fan), some history of these changes needs to be recorded in the (version controllable) code base. Then the server needs to be able to accurately interpret that history, and identify where its state fits in to that history, so that it can apply the migration steps necessary to automatically update its state to be current with the code base. The server will also need to be able to roll back in time (manually or automatically) to a previous state, in the case of a failed update.

So far, the discussion seems to be surrounding how the history "log" gets generated, and what generates it, and what format it should take.

Is this a fair summary? Has the conversation progressed at all?

For the record, I'm sitting on the edge of my seat. Some of the stuff you guys are doing is blowing my mind, and I can't wait until it can blow my clients' minds as well.

postformeravi commented 6 years ago

Hi , I want to migrate db2 schema to mysql database. Is there any open source tool or script which can help to migrate schema as well as data. Please help!

Regards, Ravi

Weakky commented 6 years ago

Hey there, here's my two cents:

(This opinion does not give any solutions regarding conflicts (eg: a migration tries to access to a deleted field), but rather a general view of what a migration file could look like, and a way to give developers flexibility)

As @dpetrick said, we really need to give developers flexibility regarding migrations, and break a little the magic of Prisma for the sake of liberty.

As soon as you run into production, you will inevitably fall into some situations where you have to customize migrations to maintain backward compatibility with old stuff, or to provide default values on targeted nodes based on some very specific needs.

Let's say you're making required a non-required field that has already nodes. You will have to provide default values to make sure your schema stays in sync with your database.

For now, the advised way to do so, is to run graphql mutations on the playground such as:

updateUser(
  where: { firstName: null }
  data: { firstName: "DEFAULT_FIRST_NAME" } 
)

I think leveraging the power of GraphQL Bindings and Javascript to offer flexible scripted migrations could be super powerful.

In PHP using Doctrine, a migration looks like this:

/**
 * Auto-generated Migration: Please modify to your needs!
 */
class Version20180507073532 extends AbstractMigration
{
    /**
     * @param Schema $schema
     */
    public function up(Schema $schema)
    {
        // this up() migration is auto-generated, please modify it to your needs
        $this->abortIf('mysql' !== $this->connection->getDatabasePlatform()->getName(), 'Migration can only be executed safely on \'mysql\'.');

        $this->addSql('ALTER TABLE company_metadata ADD return_policy LONGTEXT DEFAULT NULL, ADD warranty_policy LONGTEXT DEFAULT NULL');
    }

    /**
     * @param Schema $schema
     */
    public function down(Schema $schema)
    {
        // this down() migration is auto-generated, please modify it to your needs
        $this->abortIf('mysql' !== $this->connection->getDatabasePlatform()->getName(), 'Migration can only be executed safely on \'mysql\'.');

        $this->addSql('ALTER TABLE company_metadata DROP return_policy, DROP warranty_policy');
    }
}

As you might have noticed, it's clearly written: Please modify to your needs!.

That means you're allowed to script a migration to your needs while leveraging the """"power""" (or not πŸ˜›) of PHP.

Use case

Let's say we introduced a new required field called allowedToAccessWebsite.

Using Javascript, that would allow us to script a migration like this:

class Version20180507073532 extends AbstractMigration
{
    async function up(db: Prisma)
    {
      const minorUsers = await db.query.Users({ where: { age_lt: 18 } });
      const majorUsers = await db.query.Users({ where: { age_gte: 18 } });

      const minorUsersIds = minorUsers.map(minorUser => minorUser.id);
      const majorUsersIds = majorUsers.map(majorUser => majorUser.id);

      await db.mutation.updateUser(
        { where: { ids_in: minorUsersIds } },
        { data: { allowedToAccessWebsite: false } }
      );
      await db.mutation.updateUser(
        { where: { ids_in: majorUsersIds } },
        { data: { allowedToAccessWebsite: true } }
      );
    }

    async function down(db: Prisma)
    {
      ...
    }
}

Those migrations should indeed have records of the schema updates as well. The end format could look like this:

class Version20180507073532 extends AbstractMigration
{
   /* Please, do not change this */
   function schemaUpdate() {
     return {
        schema: {},
        steps: {},
        ...
     }
   }

    async function up(db: Prisma)
    {
      ...
    }

    async function down(db: Prisma)
    {
      ...
    }
}

Or even have them separated in two files, timestamped at the same date, with one file containing the schema changes, and another containing the scripts.

falconmick commented 6 years ago

Coming from C# and Entity Framework to node I've really found that this is the biggest thing I miss. If Prisma could make somthing similar to Entity Framework (looks allot like what @Weakky just layed out) it would make prisma a real asset to the node community.

marktani commented 6 years ago

Thanks a lot everyone for chiming in. Migrations is a super interesting and tough problem, and we would love to learn more about your use cases to arrive at a solution that covers them.

We set up a working group to further discuss this, if you're interested I would love to have you in the #migrations channel in our Slack community.

terion-name commented 6 years ago

I've written in slack channel and just now found this issue. I agree with @mavilein, cli-centered approach is a bad idea. CLI-workflow is good for prototypes and quick-starts, but for heavy continious/team development with proper workflows, CI/CD pipelines, etc, it is unsuitable.

This should be server-centric

@mavilein

There are 2 ways to store a migration in a file. We either store the concrete steps or the next project state.

It's kinda imperative vs declarative. Storing next project state (e.g. declarative way) will lack transition steps (how to go from one state to another). And this is a huge problem. One should not rely on diffing algorithms, this gives very high chances for data loss or corruption. IMO migrations should always be imperative, e.g. they should instruct server exactly how to transition to new state, rather then what the state should be.

As of slack channel is very inactive, I'l also write here. I see two ways to work with versioned migrations:

Second variant I see as more solid and controllable, but it fundamentally changes approach to data modeling and requires development of special datamodel-builder syntax and/or special cluster api for this. An example of graphql api for migrations can be something like this:

mutation {
  model(
    name: "Article", 
    addField: [{ name: "slug", default: "", index: INDEX_UNIQUE }],
    changeFeild: [{ name: "old_name", rename: "new_name", nullable: true }]
    removeField: [{name: "old_name"}]
  ) {
    status
  }
}

So the migration will be a file with mutations like this. Also this approach simplifies adding basic data: not seeds with test data, but system-critical data, like initial admin account or initial settings, etc

ntziolis commented 6 years ago

Without advanced migrations for migrating up AND down (inkl hooks to invoke data migrations scripts) prisma can NOT be considered production ready for any team using CI/DI. Basically this means any larger customer especially enterprise will find the lack of this a show stopper.

I don't really care about the format, but it has to be on disk, editable incl being able to make any changes to the data we need.

What is the current status and how can we help (slack channel seems to be dead)?

valstu commented 6 years ago

Is this still active? Is there something community could help?

Gdewilde commented 6 years ago

Can we get an update on this? We're currently looking to move away from Prisma due to the lack of migrations support. Thanks.

goulderb commented 6 years ago

I second @Gdewilde's sentiment. The lack of proper scripted migrations is a huge problem for any production application built on prisma right now, and is one of many issues that is causing my team to reconsider using prisma going forward.

ntziolis commented 5 years ago

@marktani Do you guys have any update on your thoughts here?

jhalborg commented 5 years ago

@schickling @sorenbs - friendly ping πŸ˜„

It would be great with confirmation that this is still planned for Q4 and being worked on. Not having migrations in Prisma is the single largest problem IMO for production applications, effectively rendering CI/CD impossible.

robbyemmert commented 5 years ago

Agreed. This is why I'm not using Prisma on any client projects yet.

On Wed, Nov 7, 2018 at 3:34 PM jhalborg notifications@github.com wrote:

@schickling https://github.com/schickling @sorenbs https://github.com/sorenbs - friendly ping πŸ˜„

It would be great with confirmation that this is still planned for Q4 and being worked on. Not having migrations in Prisma is the single largest problem IMO for production applications, effectively rendering CI/CD impossible.

β€” You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/prisma/prisma/issues/1263#issuecomment-436530895, or mute the thread https://github.com/notifications/unsubscribe-auth/AHJKJcTvm38WJ0nWLzxTrRS02QewF2hYks5usozqgaJpZM4QbeyN .

--

Robby Emmert Owner, Software Specialist @ Snowball Development, LLC https://snowballdev.com/ LinkedIn https://www.linkedin.com/in/robbyemmert/ | @robby4g https://twitter.com/robby4g

sorenbs commented 5 years ago

Thank you for all the input on this topic!

We are making progress towards a more flexible migration system, and I'd like to describe our approach. From talking to many users of Prisma we realised that the declarative migration system is tremendously valuable to a large group of users. Thank you @terion-name for describing the difference between declarative and imperative migrations above.

Short term improvements

There are two major problems with the current declarative migration system:

We are currently taking steps to address both issues without abandoning the idea of declarative migrations.

First, we introduced the passive mode earlier this year allowing you to use Prisma with an existing database. We are addressing some limitations and bringing this capability to MySQL in addition to Postgres. If you need full control over database migrations or want to run scripts in ci/cd, it is a viable option to simply not use the declarative migration system.

Second, we want to make it seamless to use the declarative migration system most of the time, but opt out for specific migrations that require complex logic. This way you will be able to do most migrations like you do today, and write a sql migration when you have to. We are streamlining the datamodel in order to support this. When this work is complete, the declarative migration system will also become more flexible and for example support relations without a relation table.

Work on this is underway but we are not able to give a concrete timeline yet.

In addition to these larger changes, we are also working on small items to make the declarative migration system more useful: https://github.com/prisma/prisma/issues/2323

Long term vision

Over the past year we have seen Prisma be adopted for more different use cases than we could have imagined. It has become clear that the current monolithic design is not serving us well. In the future we will split Prisma into individual projects each with a very clear focus. Declarative migrations is likely to be the first feature that is promoted to its own project. When this has happened we will be in a better position to deliver on the vision laid out in the original feature request.

Feedback

I would love to hear your thoughts on this approach as well as specific feedback on the new datamodel describe in this spec.

valstu commented 5 years ago

Great news! Thanks for the update!

jhalborg commented 5 years ago

It's great to hear that this is actively being worked on! However, I'm a bit disappointed that you're leaning towards the declarative direction, with raw SQL as the 'escape hatch'/fallback. I'd much rather prefer the imperative and versioned migration model that @terion-name described, having specific migration code run on the DB for each up/downgrade of the database.

It's not usable in CI/CD if it just works 90% of the time, then i just defeats the purpose, and will not meet the requirements for larger projects. I think it should be fully controllable, meaning version for the DB, and coded migration scripts up/down for each version - and preferably not in SQL.

Or maybe I misunderstood?

sorenbs commented 5 years ago

@jhalborg - I like that vision too. When we have extracted the migration system to a separate project we will be able to better iterate on advanced use cases like this. I definitely think it is possible that the Prisma migration system will evolve to support imperative, non-sql migrations as suggested by @terion-name and @schickling in the initial feature request.

terion-name commented 5 years ago

@sorenbs separation of migrating system in separate project, that interacts with Prisma server via clear API is a good idea IMO, because it will open possibilities to choose what migrating system suits project/setup needs. And maybe it could even be in a pipeline manner to stack them (e.g. overlay some programmatic migration scripts over generated by declarative builder/differ)

As it is very huge amount of work and obviously will not be implemented soon, it is very important to force #2323 (please!) β€” this really will make life easier for many cases.

And the biggest problem for now is migrating data when structure changes. This is really a huge problem. But there are so many pitfalls with this, that I can't design any approach that would be easily tweaked in current system. Obvious part is to run some queries in process (after adding raw sql to query engine it became more useful), but how to hook in the process with this queries?

One thing that is routing in my mind is make callbacks on warnings. Now, when migrator faces data loss, it spawns warnings with certain info, what and where it sees problems. Theoretically this info can be used to form a pattern-based string, that will match a "fixing" query, list of which should be passed with schema.

For example, we change User schema by removing FullName and changing it to FirstName and LastName. Migrator creates new columns and encounters data loss due to dropping of FullName. So it searches for "fixing" or "migrating" query in payload provided, that is named something like this: User_FullName_drop. If it is found - it runs and there we can write a query, that takes data from fullName and moves it to firstName and lastName columns.

It's just a concept that obviously will not fix everything with declarative migration, but again, it can resolve a lot of situations IMO

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 10 days if no further activity occurs. Thank you for your contributions.

nikolasburk commented 5 years ago

The specification for the new migration system can be found here.

brillout commented 5 years ago

Declarative migrations is likely to be the first feature that is promoted to its own project

Does this mean that we will be able to use this declarative migrations library independently of Prisma? That would be super neat. I've been looking for / considering implementing a do-one-thing-do-it-well database-agnostic migration library.

cihadturhan commented 5 years ago

I wonder if there is an update on new migration system and if there is a release date of it.

Right now, I'm planning to create some node script to create a migration file under migrations folder, and fill it manually. Such as,

const { prisma } = require('./generated/prisma-client')
module.exports = async ()=>{
     await prisma.$graphql('mutation {executeRaw(query: "UPDATE `default@default`.User SET foo=\'bar\'")}')
}

However, this process is error tolerant and not maintainable in long-term.

I want to know if any of you have any other solutions.

ben-foster commented 4 years ago

I believe this is the latest update -- Lift in Prisma 2 preview: https://www.prisma.io/blog/announcing-prisma-2-zq1s745db8i5#prisma-2-next-generation-database-tooling https://lift.prisma.io/