hasura / graphql-engine

Blazing fast, instant realtime GraphQL APIs on your DB with fine grained access control, also trigger webhooks on database events.
https://hasura.io
Apache License 2.0
31.18k stars 2.77k forks source link

Regression in v2.0: Migration State was Stripped from the Database #7610

Open joanrodriguez opened 3 years ago

joanrodriguez commented 3 years ago

Version Information

Server Version: 2.0.3

Environment

Cloud

What is the expected behaviour?

Keywords

What is the current behaviour?

How to reproduce the issue?

  1. Start with a local environment and a PROD environment that are on the same database version (all migrations applied)
  2. Create a new migration locally, run it against PROD
  3. Create a second migration locally, but do not run it on PROD
  4. Now dump the PROD DB and restore it locally
  5. Run hasura migrate apply locally => Error, table/column already exists

Any possible solutions?

Two solutions:

Why?

We are a team of 20 developers, everyone is writing migrations. Sometimes they fail and they must be rewritten. Sometimes they conflict with each other. By dumping a copy of PROD, restoring locally and running only the latest migrations we can enforce consistency and improve developer's workflow. Also, every time someone does it locally, it serves as a dry run for when we actually deploy the migrations on PROD.

Can you identify the location in the source code where the problem exists?

No because this is specific to Hasura Cloud.

If the bug is confirmed, would you be willing to submit a PR?

No because this is specific to Hasura Cloud.

BenoitRanque commented 3 years ago

One possible solution would be to respect the HASURA_GRAPHQL_METADATA_DATABASE_URL if it is provided. Then, users could opt-in to storing metadata themselves (and be responsible for managing it) or by default avoid hasura modifying their database at all.

One issue with this is that ideally migration status should be stored per-database, on the database, to allow proper management of multiple sources.

olup commented 3 years ago

Our teams are paying hasura cloud enterprise clients, and we are facing the same issue.

Without being able to dump the migration state with the database itself when duplicating the staging db on local dev environment we are unable to apply new ones as needed for a proper development workflow.

Is there an ETA on resolution ?

BenoitRanque commented 3 years ago

Without being able to dump the migration state with the database itself when duplicating the staging db on local dev environment we are unable to apply new ones as needed for a proper development workflow.

@olup Please note that if you do know the migration state of your production DB when you create the dump, you can mark the existing migrations as applied with the --skip-excution flag. This is a workaround of course, but should be workable until a better solution is available.

I am sadly not able to provide an ETA on resolution.

olup commented 3 years ago

We went that route, but it means checking on the remote staging branch what migrations are there (the coder might be on a wip branch and wanting to reset but still apply local migrations) and apply migrations with a go-to argument which very sadly cannot seem to be usable with the skip execution argument.

Edit : We could apply them one by one with a skip execution, but this would take a long while.

So now we are planning to build the json migration object by hand and insert that in HDB catalog version on local. A bit messy to say the least.

vidrepar commented 3 years ago
  1. Can we get an answer to why is hdb_catalog.hdb_version.cli_state empty on Cloud instances?
  2. Can someone tell us where does hasura CLI read migration status from exactly when cli_state is empty?
  3. JSON approach seems fine, it's just a list of migrations after all, but: 3.1. Is this change mentioned in the documentation or any related document (schema_migrations -> JSON cli_state)? 3.2. What was the reason behind this change? Was it because of the multiple databases support?
  4. Please provide ETA on resolution - this is critical.
joanrodriguez commented 2 years ago

Any update on this @BenoitRanque ?

RoryO commented 2 years ago

This has been an issue for a little over a year now. I finally get fed up enough with this that I took it upon myself to do something about it.

tl:dr use/adapt this script to copy the migration status from a hasura prod instance to a local hasura instance. This uses the hasura management api.

// not needed if running on node >= 17.5
import fetch from 'node-fetch';

const admin_secret = process.env['HASURA_GRAPHQL_ADMIN_SECRET'];

if (admin_secret === undefined) {
    console.error('HASURA_GRAPHQL_ADMIN_SECRET not set');
    process.exit(1);
}

const prod_endpoint_url = process.env['HASURA_PROD_ADDRESS'];
if (prod_endpoint_url === undefined) {
    console.error('HASURA_PROD_ADDRESS not set');
    process.exit(1);
}

(async () => {
    const prod_migration_status = await fetch(`https://${prod_endpoint_url}/v1/metadata`, {
        method: 'POST',
        headers: {
            'X-Hasura-Admin-Secret': admin_secret
        },
        body: JSON.stringify({
            type: 'get_catalog_state',
            args: {}
        })
    });
    const prod_migration_status_resp = await prod_migration_status.json();
    if (!prod_migration_status.ok) {
        console.error('error fetching prod metadata');
        console.error(prod_migration_status_resp);
        process.exit(1);
    }
    const prod_cli_state = prod_migration_status_resp.cli_state;

    const set_local_cli_state = await fetch('http://localhost:8080/v1/metadata', {
        method: 'POST',
        body: JSON.stringify({
            type: 'set_catalog_state',
            args: {
                type: 'cli',
                state: prod_cli_state
            }
        })
    });

    console.log(await set_local_cli_state.json());
})()

Explanation

This was quite a journey. This API is completely undocumented in the API documentation. Seems likely that's intentional. It is a bit raw as it required digging into both the server side haskell and the hasura cli go code. One layer is that internally hasura refers to this has the 'catalog', and externally that terminology wasn't used. This lead to lots of false dead ends in searching. Another is that the hasura cli go code is... "well abstracted" is the most charitable way I can think of explaining it. Lastly, it took some digging to figure out how the commands in the API map to Haskell code through some magic compiler directives.

There are two undocumented metadata api commands: get_catalog_state and set_catalog_state, which are self explanatory. get_catalog_state requires an empty object as arguments. It simply returns hdb_catalog.cli_state and hdb_catalog.console_state columns. set_catalog_state is a little tricky. It requires two arguments, the type and the state. The state is just a json blob of the value of the column. It does look like the server type checks the posted state when de-serializing before updating the column, so that adds a bit of safety. type was a bit of a journey to find. Essentially the metadata handler figures out the type through some string manipulation into a haskell type through a compiler directive. You can see the exact definition here.

Although undocumented, these should be fairly stable APIs, as changing these would break the hasura cli tool and probably every automated deploy tool built. Still, it would be nice if there was an actual migration API.

gmlion commented 1 year ago

I would like to add to the discussions some use cases where having migrations detached from the relevant db is cumbersome.