hashicorp / terraform-plugin-sdk

Terraform Plugin SDK enables building plugins (providers) to manage any service providers or custom in-house solutions
https://developer.hashicorp.com/terraform/plugin
Mozilla Public License 2.0
433 stars 230 forks source link

Determine provider SDK strategy for minimizing API calls #7

Open radeksimko opened 5 years ago

radeksimko commented 5 years ago

Data sources

Data sources currently can't take advantage of Etag-based (or really any other) caching. Whenever a data source is refreshed, the SDK does not read any existing state, so etag is ignored, even if it's stored there as a computed attribute.

Resources

While resources can technically be cached on Etag basis, I think we could do better and make it easier for arbitrary APIs which follow the convention and use the Etag header, perhaps through some helper functions that can be plugged in as HTTP transports or something like that.

To start with we could adopt https://github.com/terraform-providers/terraform-provider-github/blob/2eaf170dbd1efa403d3bce08ade826c827a4063b/github/transport.go#L21-L40 in some form to the SDK.

radeksimko commented 4 years ago

May be related to https://github.com/hashicorp/terraform-plugin-sdk/issues/134

paultyng commented 4 years ago

Merging all the caching, batching, etc. issues to one to think through at once.

armsnyder commented 3 years ago

Hi, I'm coming here from #66, which was merged with this issue.

My problem is I am trying to write a new Terraform provider for an upstream API that ONLY supports batch API calls. I could code each resource to make a batch API call containing a single entry; however, this is not what my API is designed for. #blockchainreasons

I also wanted to highlight this comment from @apparentlymart which summarized the design challenges nicely: https://github.com/hashicorp/terraform-plugin-sdk/issues/66#issuecomment-355162724

bcsgh commented 3 years ago

Where do things actually get throttled? If the core isn't throttling (or can be told not to) and the plugin controls things, then a plugin should be able to merge requests on it's own without any further support. (If a request would block on capacity; queue it up. If there is already a batch-able request of a compatible type queued; merge with it.)

paultyng commented 3 years ago

Parallelism defaults to 10 in the CLI, so a provider can get up to that number of requests in parallel (or whatever the user overrides it to be). There are no guarantees its the same "resource type" or anything though.

Some providers (I think Google cloud?) use some naive batching for these kinds of reasons that can be better taken advantage of by increasing that via CLI flag, but there is no definitive protocol method to achieve this (or a "commit" type step). Yet at least. And there is a possibility you could cause deadlocking or other problems if not careful in implementation.

bcsgh commented 3 years ago

As long as the naive provider implementation doesn't create new deadlock risks when increasing that parallelism then that seems like not-the-SDK's-problem. Dealing with any potential for deadlocks sounds like something that providers will already need to worry about given any parallelism and they are also in a way better position to deal with than the CLI or SDK would be.

How hard would it be to allow providers to override that default? That by it self would open up the ability for providers to solve this for themselves and avoids the CLI/SDK needing to try and understand the domain specific details.

jason-swissre commented 3 years ago

My issue was also merged here. My feature request was that terraform keep a bit more knowledge about the resources so that it will know on a destroy that it doesn't have to destroy every individual resource. If, for example, you destroy a keyvault that will destroy all secrets, policies, etc. without having to delete them all individually as is done now. Likewise, if you destroy the resource group then everything in the resource group is destroyed.

This would radically improve the speed of the destroy command on Azure.

jason-swissre commented 2 years ago

This issue is mentioned in regards to batching but in the case of delete no batching is needed. Just some kind of meta system which detects what things must be explicitly deleted and which things will be deleted by something higher up scheduled for deletion.

AlexHunterCodes commented 2 years ago

detects what things must be explicitly deleted and which things will be deleted by something higher up scheduled for deletion

This would be risky if you have any child resources deployed which the Terraform config or state doesn't know about, either because:

  1. They're automatically created by the cloud API, in which case you either don't care and probably do want them to be automatically destroyed, OR maybe you do want to keep them (VM disks, static IPs) but you don't know they exist or don't know they will be destroyed as opposed to simply detached/orphaned.
  2. They are deployed by a different Terraform state, perhaps by different team, or as part of cross-team/cross-environment shared infrastructure.

It would be great if Terraform could understand these relationships to reduce API calls and save time, but it would also create an opportunity to identify out-of-state resources and warn the user in the plan.

jason-swissre commented 2 years ago

But cases 1 and 2 are not a new risk because it is already how terraform behaves. For an example, if I make a keyvault with terraform and someone else makes secrets that I did not know about and is not in terraform then when I run terraform destroy it will destroy the secrets it knows about and then destroy the keyvault. Destroying the keyvault will destroy the secrets it was not aware of.

I do wish there were more options for out-of-state resources. At the moment terraform just fails or ignores them (depending on what it was doing). It would be nice to have options for things like "import if already exists".

AlexHunterCodes commented 2 years ago

They're not a new risk, but if this issue turns into some new Terraform feature, they are a new way to break your apply.

A lot of cloud services I work with won't let you delete parent resources at all if they still have children, but those providers have no way of knowing that in advance when they decide to cancel out a batch of child API calls in favour of one parent call.

Terraform would have to make it very clear when it is about to merge deletions, and let you configure that behaviour on a per-resource basis.

jason-swissre commented 2 years ago

There is already precedent for this. There is a flag for telling terraform not to remove a resource group if it's not empty.

gk-drw commented 7 months ago

Why not focus on Batch Get requests? They are the ones likely to make the most difference (due to extra long refresh times when running terraform plan) and their dependency issues are the easiest to figure out - no deletion involved

bcsgh commented 7 months ago

@gk-drw I second that point. I strongly suspect that API calls are very biased (10x or grater) in terms of read-only calls and also that for non-read-only calls the overhead (quota, throttling, wall-clock time, etc.) of non-batched call is likely much more negligible compared to the work that's done.

It's probably worth thinking out in generalities how RW operations would be batched to avoid SDK mismatches down the line, but the RO cases shouldn't block on the details of something that may never even be worth implementing.