Provisioners belonging to providers

apparentlymart commented 9 years ago

Currently there is a strict separation between providers and provisioners, which makes sense given the current set of available provisioners.

However I have some use-cases where a provisioner and a provider would be more closely related:

For AWS opsworks (working on that in hashicorp/terraform#1892), certain lifecycle events are triggered via the API and could be useful to use as provisioners on an opsworks stack.
With Rundeck (see hashicorp/terraform#2412) one could trigger a rundeck job as a provisioner, allowing rundeck to handle the details of SSHing into the necessary machines and retaining the audit logs of what was done.

While these certainly could be implemented as standalone provisioners that happen to interact with the same APIs as the provider, this is inconvenient both as an implementer (need to re-implement things such as client instantiation, credentials handling) and as a user (need to duplicate all of the provider settings inside the provisioner block, rather than just having them inherit from the provider as we see with resources).

It feels to me like it would be most convenient for providers to be able to provide provisioners as well as resources, and then provider-provided provisioners would get access to the same "meta" value that the resource definitions get access to, which most providers use to stash their API client. Presumably in the dependency graph such a provisioner would depend both on the resource it's provisioning and on the provider it came from.

I'm mainly just opening this ticket to start a discussion about the issue and see if folks have other similar use-cases or alternative approaches.

apparentlymart commented 9 years ago

Having delved into the code some I think I have some idea of how this could work:

terraform.ResourceProvider interface extended with a new method Provisioner(name string)that returns terraform.ResourceProvisionerFactory if it has a provisioner with the given name.
terraform.BuiltinEvalContext.InitProvisioner behavior changed slightly:
- first check ctx.Provisioners[name]. If a factory is present, it is used just like today.
- otherwise, look in the name for the first dash and take all characters before it as a potential provider name. If that string exists in ctx.Providers:
- instantiate the provider
- call provider.Provisioner, passing in as the name the remainder of the provided provisioner name after the first dash (for example, "aws-opsworks-deploy" would result in provider.Provisioner("opsworks-deploy")
- if a provisioner factory function is returned, then call it to obtain the provisioner
- if neither path produces a provisioner factory, fail with an error saying the provisioner doesn't exist
A new terraform.ResourceProviderProvisioner interface extends terraform.ResourceProvisioner with an additional Provider() method that returns the provisioner's provider, so that the graph builder knows to create the dependency between the provisioner and the provider's Configure step.
New struct schema.Provisioner to allow providers using the schema helper to implement provisioners. Contains a Schema for the provisioner's config and an Apply callback of type func (c *schema.ProvisionerConfig, d *schema.ResourceData, uio terraform.UIOutput, meta interface {}) which is the main implementation of the provisioner.
The schema.Provider struct includes a new map[string]Provisioner with which it implements terraform.ResourceProvider.Provisioner(n string).

With the above in place, providers can then declare provisioners under a private namespace (as long as no top-level provisioner starts with a provider name) and the a schema-based provider can implement a provisioner in a manner that should be familiar and intuitive to folks who are familiar with resource implementation via the schema helper.

The two use-cases in the initial issue write-up can then be implemented:

aws-opsworks-deploy would be implemented by the aws provider and use the opsworks connection available on the AWS client.
rundeck-exec-job and rundeck-exec-command would be implemented by the rundeck provider and use the already-configured Rundeck client to run either a configured job or an arbitrary command (respectively) over a set of configured nodes.

Unresolved detail: How would this interact with the concept of provider aliases, allowing multiple instances of the same provider? Would the provisioner configuration block need to include a special "provider" attribute just like resources do? Presumably that would then prevent a provisioner from having its own configuration attribute called "provider", but that's likely for the best to reduce confusion anyway.

(Sidebar: how did it end up that resources are named with underscore-separated words but provisioners are named with dash-separated words? Given that there are currently only two provisioners with multiple words, would it be worth renaming them local_exec and remote_exec to improve consistency, and then allow the provider-scoped-provisioners to be named like aws_opsworks_deploy, rundeck_exec_job, etc?)

glenjamin commented 9 years ago

I've been doing some work with creating AWS users, and I think that CreateLoginProfile would work really well as a provisioner on aws_iam_user.

When creating a user for a person, I'd want to generate their initial password and flag it as to-be-changed - but only when first creating the user.

apparentlymart commented 9 years ago

That's a great additional use-case @glenjamin. Thanks!

apparentlymart commented 8 years ago

Another use-case: A provisioner that makes cache invalidation requests to Amazon CloudFront, so that caches can be purged when a new app version is deployed.

apparentlymart commented 8 years ago

I've been thinking more about this as I start to run into cases where I'd want to use rundeck jobs as provisioners.

Specifically I've been considering a architecture shift where the concept of standalone provisioners goes away and all provisioners belong to providers, transforming the "provider" concept into containers for sets of functionality that relate to a particular use-case or service.

In this hypothetical model, providers would be able to provide a few different objects:

resources (just like today): things that support a "CRUD" lifecycle
provisioners: arbitrary actions executed for their side-effects
connection config schemas: validation for particular types of connection block

Just like with resources, the provisioners and connection schemas would start with the provider name, so the Rundeck provider might expose rundeck_job as a provisioner, which would be distinct from the existing rundeck_job resource.

Some new providers would be created to absorb the existing standalone provisioners and connection schemas:

A local provider hosts a local_exec provisioner, replacing the local-exec provisioner.
An ssh provider gets ssh_file and ssh_exec provisioners, replacing the SSH mode of the file and remote-exec provisioners respectively, and a connection schema called ssh.
A winrm provider gets winrm_file and winrm_exec provisioners, replacing the WinRM mode of the file and remote-exec provisioners respectively, and a connection schema called winrm.
A chef provider (e.g. hashicorp/terraform#3084) gets a chef provisioner, replacing the existing standalone chef provisioner.

For backward-compatibility, deprecated aliases would be provided that make the old names still work, which I expect would just be hard-coded within Terraform core rather than retaining the concept of and mechanisms for standalone provisioner plugins. (This would require some special handling to correctly delegate file to either ssh_file or winrm_file, which requires a little more thought.)

"Connection schemas" generalize the existing connection block by defining which arguments are valid for a given type of connection. So the ssh provider's ssh schema would define the arguments like host, port, private_key, agent etc. Each resource and provisioner can have multiple connection blocks of different types, with each provisioners using whichever one is appropriate for it.

Just as today, resources can provide "default" connection information. In this new architecture, they may provide a default connection config for each connection schema. The aws_instance resource would, for example, provide either an "ssh" or a "winrm" connection config depending on the instance type. The aws_db_instance resource might provide a "mysql" or "postgresql" connection config that could be used by hypothetical provisioners from the providers proposed at hashicorp/terraform#3122 and hashicorp/terraform#3653 respectively.

Putting this all together, here's a hypothetical configuration showing some of these ideas:

provider "chef" {
    // If the server_url is specified at the provider level then it's no longer necessary to
    // specify it in each chef provisioner block.
    server_url = "http://chef.example.com/"
    // Likewise environment and version which is likely to be the same for most/all nodes in
    // a given configuration.
    environment = "production"
    version = "12.4.1"
    secret_key = "..."

    // client credentials, etc, etc...
}

provider "rundeck" {
    url = "http://rundeck.example.com/"
    auth_token = "SuperSecureToken"
}

provider "aws" {
    region = "us-west-2"
}

resource "chef_role" "es_server" {
    name = "elasticsearch-server"
    run_list = ["elasticsearch"]
}

provider "aws_instance" "elasticsearch" {
    // (all the usual aws_instance stuff)

    count = 5

    // aws_instance sets a default "ssh" connection config, but we'll
    // override it here so we can specify the private key, set a bastion
    // host, etc...
    connection {
        // This now means that the "ssh" provider gets to validate and
        // normalize the arguments.
        type = "ssh"
        host = "${self.private_ip}"
        private_key = "${file("${path.module}/provisioning_key.pem")}"
    }

    provisioner "chef" {
        // The provisioner looks for a connection of type "ssh" or "winrm"
        // to decide how to reach the instances.

        node_name = "${self.private_dns}"
        run_list = ["role[${chef_role.es_server.name}]"]

        // (+ all the same stuff the chef provisioner supports today, but with
        // server_url, environment and version now optional when
        // specified on the provider.)
    }
}

resource "null_resource" "es_cluster" {
    // Each time the set of ES servers changes, use a Rundeck job
    // to join all of the servers into a cluster.
    triggers = {
        hosts = "${join(" ", aws_instance.elasticsearch.*.id)}"
    }

    provisioner "rundeck_job" {
        project = "elasticsearch"
        job = "Create Server Cluster"
    }
}

resource "aws_s3_bucket" "website" {
    // ...
}

resource "aws_s3_bucket_object" "homepage" {
    bucket = "${aws_s3_bucket.website.name}"
    key = "index.html"
    source = "website/index.html"

}

resource "null_resource" "website_invalidate" {
    triggers = {
        // The "source" actually interpolates as a hash of the content due to the statefunc,
        // so this triggers each time the file contents change.
        "index.html" = "${aws_s3_bucket_object.homepage.source}"
    }

    // Invalidate some paths in cloudfront whenever we change the content.
    provisioner "aws_cloudfront_invalidate" {
        distribution_id = "${aws_cloudfront_distribution.website.id}"
        paths = ["/", "/index.html"]
    }
}

resource "aws_cloudfront_distribution" "website" {
    origin_domain_name = "${aws_s3_bucket.website.website_endpoint}"
    // ...
}

As the above example shows, the UX doesn't really change at all except that there are more provisioners to choose from and the UX of the existing "chef" provisioner is improved by it being able to inherit settings from the provider block.

The documentation IA already has room for providers to have additional concepts besides resources, as shown by this mock of how the Rundeck provider's provisioners might be presented:

terraform_rundeck_provisioners

Mainly I'm just dropping this here to note my latest design work for future reference. It seems like the Hashicorp team doesn't have an opinion yet on this topic, so I'm going to hold off on implementation until I get some more concrete design feedback.

apparentlymart commented 8 years ago

One further simplification, which I'm considering but not so sure about, is to unify the idea of connection blocks with provider configurations.

Under that model, a connection block of type "ssh" would in fact just be a locally-scoped provider "ssh" block, which overrides any global SSH provider config for any provisioners within its area of influence.

This would allow a different formulation for the rundeck provisioners in configurations where the Rundeck provider is only used to provision a single resource:

resource "null_resource" "foo" {
    connection {
        type = "rundeck"
        url = "http://rundeck.example.com/"
        auth_token = "abcd1234"
    }

    provisioner "rundeck_job" {
        // as before
    }
}

I think the primary benefit of this unification would be implementation simplicity rather than anything users would care about, since it would eliminate connection block schemas as a distinct concept.

I remain ambivalent about this particular aspect since I'm not sure how I'd explain it within the documentation in a way that speaks to user needs rather than implementation details.

apparentlymart commented 6 years ago

In hashicorp/terraform#4824, @partamonov offered the additional use-case of running an AWS Lambda function in a provisioner-like way.

A Lambda-based provisioner could monitor for the exit status of the Lambda function and mark the resource as tainted if it fails, just like we can do for the shell-based execution provisioners. Capturing the output of a Lambda provisioner might be tricky since we'd probably need to interact with Cloudwatch Logs, but we could prototype that and see if it's reasonable to do that or if we'd need to accept just showing the final result of the function.

4dz commented 6 years ago

An AWS lambda (or generically, serverless function) would be a great option for database provisioning. For example, creating users and passwords - and even keeping those secrets outside of Terraform if the Lambda code author wishes.

One workaround is to create a lambda resource then trigger it with a cloudwatch event cron(...) based on ${timeadd(timestamp, “1m”)} - but the function and event will live until destroyed. You also can’t obtain the result.

Instead of or as well as provisioners:

An aws_lambda_exec data source would be much like an S3 data source? Ie execute and read result of a lambda function - see also https://github.com/terraform-providers/terraform-provider-aws/issues/2385

An aws_lambda_exec resource could execute a lambda when the resource is created and/or destroyed.

4dz commented 6 years ago

I've discovered that it is actually possible to execute a Lambda 'on apply' and get its result by using a CloudFormation stack, and a "CustomResource". I've put together some examples here. https://registry.terraform.io/modules/connect-group/lambda-exec/

radeksimko commented 3 years ago

Am I right that this is no longer relevant since vendor provisioners were deprecated and built-in provisioners are planned to stay in core?

i.e. I think this issue can now be closed?

bflad commented 2 years ago

I agree, @radeksimko, for the same exact reasons. Provisioner support was also purposefully not added to protocol version 6. If for some reason we would intend on re-introducing this type of functionality across the plugin protocol, it is probably best as a new design issue if/when that time comes. 👍

apparentlymart commented 2 years ago

Indeed... the main new insight that we've become aware of in the meantime is that a provisioner block is functionally equivalent to a resource which either has only a "create" action or only a "destroy" action, depending on the when argument, and there are real examples in the public provider registry of resource types doing such actions in that way, instead of as provisioner plugins.

I think the main thing we're missing to complete that story are some official providers that can more-or-less replace local-exec, remote-exec and file, where the first two would essentially be hooks for shell-powered custom actions during create and destroy while the last would (I think) ideally be a declarative declaration of the existence of a file on a remote system, with a similar meaning to local_file in the hashicorp/local provider.

However, none of that requires any changes in the plugin SDK, since it could all be implemented today with either this SDK or the new framework, either by other teams at HashiCorp or by third-party provider developers. (and indeed, some of those use-cases already have third-party providers available to meet them)

github-actions[bot] commented 2 years ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

hashicorp / terraform-plugin-sdk

Provisioners belonging to providers #58