Open tgoodsell-tempus opened 10 months ago
There's discussion here about solving a problem like this using a provider-managed sync.Mutex
.
Would that solve the problem without requiring any changes to the schema?
If something like Sequential
were to be implemented in the framework schema, my vote would be for a string
(or []string
) rather than a bool
which names a shared mutex so that resources of different types could be de-conflicted with each other:
func (r *ExampleResourceA) Schema(ctx context.Context, req resource.SchemaRequest, resp *resource.SchemaResponse) {
resp.Schema = schema.Schema{
Sequential: []string{"ExampleResource A and B cannot run concurrently"},
}
}
func (r *ExampleResourceB) Schema(ctx context.Context, req resource.SchemaRequest, resp *resource.SchemaResponse) {
resp.Schema = schema.Schema{
Sequential: []string{
"ExampleResource A and B cannot run concurrently",
"ExampleResource B and C cannot run concurrently",
},
}
}
func (r *ExampleResourceC) Schema(ctx context.Context, req resource.SchemaRequest, resp *resource.SchemaResponse) {
resp.Schema = schema.Schema{
Sequential: []string{"ExampleResource B and C cannot run concurrently"},
}
}
*danger! deadlock potential with the []string
approach :)
There's discussion here about solving a problem like this using a provider-managed
sync.Mutex
.Would that solve the problem without requiring any changes to the schema?
If something like
Sequential
were to be implemented in the framework schema, my vote would be for astring
(or[]string
) rather than abool
which names a shared mutex so that resources of different types could be de-conflicted with each other:func (r *ExampleResourceA) Schema(ctx context.Context, req resource.SchemaRequest, resp *resource.SchemaResponse) { resp.Schema = schema.Schema{ Sequential: []string{"ExampleResource A and B cannot run concurrently"}, } } func (r *ExampleResourceB) Schema(ctx context.Context, req resource.SchemaRequest, resp *resource.SchemaResponse) { resp.Schema = schema.Schema{ Sequential: []string{ "ExampleResource A and B cannot run concurrently", "ExampleResource B and C cannot run concurrently", }, } } func (r *ExampleResourceC) Schema(ctx context.Context, req resource.SchemaRequest, resp *resource.SchemaResponse) { resp.Schema = schema.Schema{ Sequential: []string{"ExampleResource B and C cannot run concurrently"}, } }
*danger! deadlock potential with the
[]string
approach :)
You're example isn't quite what I'm trying to solve for:
Also while yes each provider can implement something on their own, the reason I'm asking for a addition to the framework SDK is to avoid each provider needing to reinvent the wheel for a similar class of problems, and the varying quality of those inventions I've seen in the providers that do use them (such as dramatically altering performance, or inconsistent functionality of these workarounds).
Hi @tgoodsell-tempus š Thank you for this feature request and @chrismarget-j for contributing to the discussion. Our main attention is elsewhere at the moment (finishing up Terraform 1.8 features such as provider-defined functions and the ability to state move across resource types), but to keep the discussion going here, there are a few upfront considerations here which make choosing how to expose this sort of functionality harder than it might seem on the surface. This is not to say that these details/decisions are insurmountable, but any design or potential implementation proposal should account for at least these:
resource.Resource
type that does what any framework implementation might potentially do. In fact, I would actually recommend doing something like this as part of an implementation proposal. It is a great way to test how it works in practice and could potentially be exposed for other developers to try/use.PlanResourceChange
vs ApplyResourceChange
vs ImportResourceState
. Should these rules be further granular to the framework abstractions, e.g. Create
vs Read
vs Update
vs Delete
? Should these rules be further granular to particular API calls themselves, which is ultimately the source of these concurrency rules? It is not guaranteed that a restricted API call only occurs within one resource. Will it be more or less frustrating to have a concurrency solution that is partially detached from the problematic calls? What if a resource implements a concurrency restricted API call followed by concurrency safe API calls (e.g. a restricted, but quick "creation" call, followed by polling calls to wait until its ready)?Ideally, it would be the API SDKs themselves that would implement this sort of concurrency handling as they are the closest to the API implementation details causing the issues. In practice though, this does tend to be a problem left for API consumers to figure out, which is quite unfortunate. The prior terraform-plugin-sdk (v1 only š ) implemented a mutexkv
package as a "baked in" solution so provider developers could create key-specific mutex, e.g. have a mutex per API call. While this was ultimately removed because there are a other Go concurrency libraries that exist with similar functionality, the design was flexible enough to enable concurrency handling in spite of the nuance of many of considerations above.
I'm going to stop here because that was quite a lot of words on this topic, but we are happy to discuss this more in detail in the future. Thanks again for raising this.
You're example isn't quite what I'm trying to solve for:
- To prevent concurrent operations of the SAME resource (aka multiple instances of resource A in the same root module).
I think "a provider-managed sync.Mutex
" can deconflict multiple instances of the same resource (though I'm not clear on what you thought I was suggesting ... deconflict a single instance from ... itself?) Having the ability to deconflict multiple resource types is just a bonus.
You'd create the mutex in the provider's Configure()
method, and make it available (via pointer) to all instances of the resource via the resource Configure()
methods. This way, each instance of the resource waits for its turn talking to the API.
Hi. I'm in this same boat. I'm building a Terraform Provider with the Terraform Provider Framework for a Bot Defender Platform. The API of the Bot Defender has a Race Condition where two Bot Defender Rules created at the same time can prevent the other from being created. To work around this I have been using a Rate Limiter to limit the create operation to one create every two seconds.
Module version
Use-cases
Certain resources/APIs do not like "concurrent" operations, where the API is used at the same time as other operations (or in "too close" succession).
An example of this is the GCP Networking Peering API: https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_network_peering, which can be found to fail when multiple peering operations against a project are being performed.
These API errors can either be put through provider specific retry logic, or just results in operation failure if the API errors have that result in the resource.
Attempted Solutions
Generally the "workarounds" for this I've found involve:
Plugin / Provider Specific:
Proposal
For APIs where we know this limitation in usage exists, it would be nice if the Resource schema supported marking these resources (or data sources) as "Sequential" (or pick your favorite term for this).
This is something I would expect to be added to the Resource Schema and configured by a provider developer (either in
plugin-sdk-v2
orframework-sdk
).The results of this flag would be that the Terraform provider would not concurrently perform operations on resources of that type, globally within the full configuration. However, that would only apply to those specific resources, and standard rules would apply to everything else. This would have to be smart enough to sequentially order those resources globally and also not try to perform operations on resources that depend on those resources until they've gone through the process.
Ultimately, the "why" is to make a more semantic and standard mechanism for enforcing "sequential" operations on APIs which require it, rather than using workarounds or "provider specific" workaround which may be inconsistent and vary in quality.
Example of implementation expectation:
From https://github.com/hashicorp/terraform-provider-scaffolding-framework/blob/main/internal/provider/example_resource.go
References