hashicorp / terraform-plugin-framework

A next-generation framework for building Terraform providers.
https://developer.hashicorp.com/terraform/plugin/framework
Mozilla Public License 2.0
304 stars 93 forks source link

Feature Request: "Sequential" Resource / Data Source Flag #907

Open tgoodsell-tempus opened 10 months ago

tgoodsell-tempus commented 10 months ago

Module version

v1.5.0

Use-cases

Certain resources/APIs do not like "concurrent" operations, where the API is used at the same time as other operations (or in "too close" succession).

An example of this is the GCP Networking Peering API: https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_network_peering, which can be found to fail when multiple peering operations against a project are being performed.

These API errors can either be put through provider specific retry logic, or just results in operation failure if the API errors have that result in the resource.

Attempted Solutions

Generally the "workarounds" for this I've found involve:

Plugin / Provider Specific:

Proposal

For APIs where we know this limitation in usage exists, it would be nice if the Resource schema supported marking these resources (or data sources) as "Sequential" (or pick your favorite term for this).

This is something I would expect to be added to the Resource Schema and configured by a provider developer (either in plugin-sdk-v2 or framework-sdk).

The results of this flag would be that the Terraform provider would not concurrently perform operations on resources of that type, globally within the full configuration. However, that would only apply to those specific resources, and standard rules would apply to everything else. This would have to be smart enough to sequentially order those resources globally and also not try to perform operations on resources that depend on those resources until they've gone through the process.

Ultimately, the "why" is to make a more semantic and standard mechanism for enforcing "sequential" operations on APIs which require it, rather than using workarounds or "provider specific" workaround which may be inconsistent and vary in quality.

Example of implementation expectation:

From https://github.com/hashicorp/terraform-provider-scaffolding-framework/blob/main/internal/provider/example_resource.go

func (r *ExampleResource) Schema(ctx context.Context, req resource.SchemaRequest, resp *resource.SchemaResponse) {
    resp.Schema = schema.Schema{
        // This description is used by the documentation generator and the language server.
        MarkdownDescription: "Example resource",

                // EXAMPLE IMPLEMENTATION: This would mark this as "Sequential"
                Sequential: True,

        Attributes: map[string]schema.Attribute{
            "configurable_attribute": schema.StringAttribute{
                MarkdownDescription: "Example configurable attribute",
                Optional:            true,
            },
            "defaulted": schema.StringAttribute{
                MarkdownDescription: "Example configurable attribute with default value",
                Optional:            true,
                Computed:            true,
                Default:             stringdefault.StaticString("example value when not configured"),
            },
            "id": schema.StringAttribute{
                Computed:            true,
                MarkdownDescription: "Example identifier",
                PlanModifiers: []planmodifier.String{
                    stringplanmodifier.UseStateForUnknown(),
                },
            },
        },
    }
}

References

chrismarget-j commented 10 months ago

There's discussion here about solving a problem like this using a provider-managed sync.Mutex.

Would that solve the problem without requiring any changes to the schema?

If something like Sequential were to be implemented in the framework schema, my vote would be for a string (or []string) rather than a bool which names a shared mutex so that resources of different types could be de-conflicted with each other:

func (r *ExampleResourceA) Schema(ctx context.Context, req resource.SchemaRequest, resp *resource.SchemaResponse) {
        resp.Schema = schema.Schema{
                Sequential: []string{"ExampleResource A and B cannot run concurrently"},
        }
}

func (r *ExampleResourceB) Schema(ctx context.Context, req resource.SchemaRequest, resp *resource.SchemaResponse) {
        resp.Schema = schema.Schema{
                Sequential: []string{
                        "ExampleResource A and B cannot run concurrently",
                        "ExampleResource B and C cannot run concurrently",
                },
        }
}

func (r *ExampleResourceC) Schema(ctx context.Context, req resource.SchemaRequest, resp *resource.SchemaResponse) {
        resp.Schema = schema.Schema{
                Sequential: []string{"ExampleResource B and C cannot run concurrently"},
        }
}

*danger! deadlock potential with the []string approach :)

tgoodsell-tempus commented 9 months ago

There's discussion here about solving a problem like this using a provider-managed sync.Mutex.

Would that solve the problem without requiring any changes to the schema?

If something like Sequential were to be implemented in the framework schema, my vote would be for a string (or []string) rather than a bool which names a shared mutex so that resources of different types could be de-conflicted with each other:

func (r *ExampleResourceA) Schema(ctx context.Context, req resource.SchemaRequest, resp *resource.SchemaResponse) {
        resp.Schema = schema.Schema{
                Sequential: []string{"ExampleResource A and B cannot run concurrently"},
        }
}

func (r *ExampleResourceB) Schema(ctx context.Context, req resource.SchemaRequest, resp *resource.SchemaResponse) {
        resp.Schema = schema.Schema{
                Sequential: []string{
                        "ExampleResource A and B cannot run concurrently",
                        "ExampleResource B and C cannot run concurrently",
                },
        }
}

func (r *ExampleResourceC) Schema(ctx context.Context, req resource.SchemaRequest, resp *resource.SchemaResponse) {
        resp.Schema = schema.Schema{
                Sequential: []string{"ExampleResource B and C cannot run concurrently"},
        }
}

*danger! deadlock potential with the []string approach :)

You're example isn't quite what I'm trying to solve for:

Also while yes each provider can implement something on their own, the reason I'm asking for a addition to the framework SDK is to avoid each provider needing to reinvent the wheel for a similar class of problems, and the varying quality of those inventions I've seen in the providers that do use them (such as dramatically altering performance, or inconsistent functionality of these workarounds).

bflad commented 9 months ago

Hi @tgoodsell-tempus šŸ‘‹ Thank you for this feature request and @chrismarget-j for contributing to the discussion. Our main attention is elsewhere at the moment (finishing up Terraform 1.8 features such as provider-defined functions and the ability to state move across resource types), but to keep the discussion going here, there are a few upfront considerations here which make choosing how to expose this sort of functionality harder than it might seem on the surface. This is not to say that these details/decisions are insurmountable, but any design or potential implementation proposal should account for at least these:

Ideally, it would be the API SDKs themselves that would implement this sort of concurrency handling as they are the closest to the API implementation details causing the issues. In practice though, this does tend to be a problem left for API consumers to figure out, which is quite unfortunate. The prior terraform-plugin-sdk (v1 only šŸ˜‰ ) implemented a mutexkv package as a "baked in" solution so provider developers could create key-specific mutex, e.g. have a mutex per API call. While this was ultimately removed because there are a other Go concurrency libraries that exist with similar functionality, the design was flexible enough to enable concurrency handling in spite of the nuance of many of considerations above.

I'm going to stop here because that was quite a lot of words on this topic, but we are happy to discuss this more in detail in the future. Thanks again for raising this.

chrismarget-j commented 9 months ago

You're example isn't quite what I'm trying to solve for:

  • To prevent concurrent operations of the SAME resource (aka multiple instances of resource A in the same root module).

I think "a provider-managed sync.Mutex" can deconflict multiple instances of the same resource (though I'm not clear on what you thought I was suggesting ... deconflict a single instance from ... itself?) Having the ability to deconflict multiple resource types is just a bonus.

You'd create the mutex in the provider's Configure() method, and make it available (via pointer) to all instances of the resource via the resource Configure() methods. This way, each instance of the resource waits for its turn talking to the API.

KylePeterDavies commented 8 months ago

Hi. I'm in this same boat. I'm building a Terraform Provider with the Terraform Provider Framework for a Bot Defender Platform. The API of the Bot Defender has a Race Condition where two Bot Defender Rules created at the same time can prevent the other from being created. To work around this I have been using a Rate Limiter to limit the create operation to one create every two seconds.