Proposal: mysql_schema resource

joshgarnett commented 7 years ago

With Terraform being responsible for creating my instances, databases, and container services, the biggest gap right now is schema management. You can use a local-exec provisioner on a mysql_database resource to apply a schema, but this is only executed when the database is first created and does not handle updates to the schema.

To address this my idea is to create a mysql_schema resource. It would take two attributes, name and schemas. Name would be the database name and could be a reference to mysql_database. Schemas would be a list of schema definitions to apply.

Schemas would be applied in order. The hash of each schema and the index of the last schema applied would be store in the terraform state file. If the hash of any already applied schema changes, terraform would exit with an error. If the list size changes, it will apply any of the new schemas that have been added to the list. If applying any of the schemas fails, terraform will persist the index and hashes that were successful, and then exit with an error.

Example:

resource "mysql_database" "app" {
  name = "my_awesome_app"
}

resource "mysql_schema" "app" {
  name = "${mysql_database.app.name}"
  schemas = [file("initial.sql"), file("add_column.sql"), file("add_index.sql")]
}

If there are no objections or concerns with the design, I'm going to start working on a PR to add the functionality.

apparentlymart commented 7 years ago

Hi @joshgarnett,

Thanks for taking the time to write up this proposal!

As the original implementer of the MySQL provider I made a very conscious decision to draw the line with "database" being the deepest abstraction that Terraform attempts to manage.

The reason for this is that schema management can be a very hairy topic. Terraform's job is to compare the current state to a desired state and automatically figure out the changes to get there, but when it comes to migrating a database schema there is often not one obvious, safe path to reach the desired state.

There is already a wealth of software out there for dealing with this problem in various ways, so I felt it would be a mistake for Terraform to try to tread into this territory; it would never be as robust and usable as a specialized implementation.

In your proposal you've addressed this by essentially making the list of scripts an append-only list, assuming that the user will take on the task of maintaining an ever-increasing list of migration scripts to get from an empty database to the current schema. This is a clever compromise that would certainly work, though I worry that its limitations could cause users to get "painted into a corner" for some reason, such as:

Over time this list becomes overly large and hits some as-yet-unknown scaling limit in Terraform with large lists. All of Terraform's existing use-cases have been able to assume relatively-small lists that don't grow larger over time.
Some future change (e.g. MySQL upgrade, or change of configuration) causes the older scripts to no longer apply, creating an awkward situation where an older script would need to be changed, violating the rule that old scripts are immutable.

One might ask then why mysql_database does get to be a Terraform resource: what's so special about that concept? My rationale was that in my experience the database often acts as the boundary between "infrastructure" and "application" concerns. Many applications are able to populate any tables, views, and stored procedures they need, but most expect a database to already have been provisioned by a separate process. Terraform is therefore attempting to accommodate the "infrastructure" side of this divide, while leaving the "application" side up to other software to solve.

I hope all of this feels reasonable. I'm certainly not closed to having my opinion changed on this one if you have some compelling arguments :grinning: but I do think we should tread carefully here and not get Terraform into an area that might pull its design in a strange direction or create an "attractive nuisance" that could end up enticing users into tricky situations in the long term.

A possible (though admittedly ugly) way to achieve what you're trying to achieve with Terraform's current functionality is to use a third-party DB migration harness but trigger it with a Terraform provisioner:

resource "null_resource" "db_migrate" {
  triggers = {
    # Trigger on the hash of one or more files that will change whenever there's a new
    # migration to execute.
    "migration_hash" = "${sha1(file("example_migration_config_file"))}"
  }

  provisioner "local-exec" {
    command = "example-external-migration-tool"
  }
}

This way you can get Terraform to take charge of executing the migration but delegate the details of how exactly that gets done to this external program, which is better equipped to behave safely and robustly. I expect that in future Terraform will get more robust ways to trigger external functionality like this, rather than hacking around using the null_resource resource.

joshgarnett commented 7 years ago

Hi @apparentlymart,

I'm with you on that it is a hairy subject. My hope was this would be a good enough solution to bootstrap development environments and handle minor changes to schemas over time. There are tools like, rake, flyway, evolution, etc... that are tailored for schema management, but that can be heavy weight. In my ideal world I generate my build, pass in artifacts to terraform, and then let it do everything that's needed to get things up and running.

The null_resource could be a way to hack around that. Do you know if there is an easy way to get the output of a shell script and use that as a variable? Digging around docs and google I'm not seeing anything jump out at me.

apparentlymart commented 7 years ago

In 0.8 there is a new data source that can run external programs to gather data. It's not as straightforward as just running a shell command, but a small wrapper script can achieve that.

In suggesting this I'm assuming that you want to use it to compute a value to put in the triggers map, as opposed to using it to actually run the migration tool. A data source is not an appropriate way to run something with side-effects since Terraform assumes that there is no risk in refreshing a data source.

There are some other issues around for capturing results from provisioners or running arbitrary actions in ways other than data sources; something like this will probably show up eventually. For now, a null_resource wrapping a provisioner is the well-trodden detour.

joshgarnett commented 7 years ago

Actually, I was thinking if the migrator had some way to return its current state and whether it had work to do that would be a useful trigger. That would side step having to maintain some separate file to flag changes. I'd still use the local-exec to actually run the migration.

I guess stepping back, what really is the ideal long term solution here, outside of using a schemaless database or having the application responsible for setting up the schema (not easy in a multi-server environment)?

apparentlymart commented 7 years ago

Right... using the external data source to ask the external program for some sort of value that represents its current state is what I meant by what I said there... the important thing being that it's a value that changes whenever there are new schema changes to apply.

I'm not sure I'm really ready to make a statement about the long-term approach here; I'd leave that sort of long-term visioning to the Hashicorp team! :grinning: In the short term I see Terraform already trying to tackle a lot of problems and I think managing database schemas might be a bridge too far. Delegating to an external program that specializes in schema management feels right to me for the foreseeable future, whether that program is something specific to the application itself or some third-party tool, like sqlachemy-migrate or db-migrate.

joshgarnett commented 7 years ago

Thanks again for the feedback. I think I'm still going to go down this path, but I'll build out a custom plugin instead. My plan is to probably rework the provider/resource configuration as well so that multiple mysql endpoints can be managed at the same time.

Example:

provider "sql" {}

// apply creates the database
// destroy drops the database
resource "sql_database" "app" {
  endpoint = "${var.mysql_endpoint}"
  username = "${var.mysql_username}"
  password = "${var.mysql_password}"

  name = "my_awesome_app"
}

// apply follows rules specified in apply_to comments
// destroy migrates down
// (optional) compute hash of sql files, hash change of applied sql forces new resource
resource "sql_schema" "app" {
  endpoint = "${var.mysql_endpoint}"
  username = "${var.mysql_username}"
  password = "${var.mysql_password}"

  database = "${sql_database.app.name}"

  // The user can specify how far to apply
  // If greater than current applied, migrate up
  // If lower than current applied, migrate down
  // If not specified, migrate up to max index
  apply_to = 1

  // TBD: 0 or 1 based indexes?
  // If a migration step fails, it saves the successful state changes, and exits with an error
  schema {
    index = 1
    up = "${file("${path.module}/001_create_table.up.sql")}"
    down = "${file("${path.module}/001_create_table.down.sql")}"
  }

  schema {
    index = 2
    up = "${file("${path.module}/002_alter_table.up.sql")}"
    down = "${file("${path.module}/002_alter_table.down.sql")}"
  }
}

elad commented 7 years ago

@apparentlymart @joshgarnett please reopen this for discussion. It's very unfortunate that Terraform can't bring up MySQL to a point where it's usable by apps. I understand the arguments against dealing with schemas, but even a compromise where the schema is created only if it doesn't exist would let one provision a database without resorting to an additional tool between Terraform and the app itself. Again - not proposing this as the way to go, just saying that I believe it may warrant more discussion and at least keeping the issue open to signal there's something to be resolved here.

joshgarnett commented 7 years ago

@elad we just ended up writing our own custom plugin that gave us this functionality. I can double check with my company if we can share the work that we've done.

elad commented 7 years ago

@joshgarnett please do! Sharing it is the first step towards pushing it upstream. :)

joshgarnett commented 7 years ago

I'll see if I can get something packaged up. The other main change we made was removing setting the credentials in the provider itself. This allows you to standup the database and schema in the same terrafom config. With the current plugin, it attempts to connect immediately and will fail if the database is not already up. That would be a breaking change from the previous functionality, but I think is critical for having an all in one dev environment setup.

elad commented 7 years ago

Yeah there are a few rough edges. For example I came across a corner case where the provider would hang when trying to refresh resources because aws_db_instance was configured with publicly_accessible = false. Haven't dug in yet to figure out if there's a way to bypass that issue though.

kenden commented 5 years ago

There is now SQL listed in the Terraform Community Providers. It's using rubenv/sql-migrate to do database migrations.

ghost commented 5 years ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

hashicorp / terraform

Proposal: mysql_schema resource #10740