--max-reconcile-rate doesn’t seem to be working on v0.13.0

cachaldora commented 5 months ago

What happened?

I’ve updated terraform provider from version 0.11.0 to 0.13.0 to be able to use pluginCache and parallelism without concurrency issues. With 0.13.0 version I’ve enabled plugin cache and everything seems to be working however reconciliation was taking too long.

As I have -d flag enabled, I was looking into pod logs and noticed that after the upgrade the provider seems to be slow and picking less workloads. The resources used by pod are also significantly less.

How can we reproduce it?

Configure terraform provider:

apiVersion: pkg.crossplane.io/v1alpha1
kind: ControllerConfig
metadata:
  name: controller-terraform-config
  labels:
    app: provider-terraform
spec:
  args:
    - -d
    - --poll=5m
    - --max-reconcile-rate=40
---
apiVersion: tf.upbound.io/v1beta1
kind: ProviderConfig
metadata:
  name: provider-terraform-config
  namespace: upbound-system
spec:
  pluginCache: true

Create several workspaces and take a look into pod logs and consumed resources.

What environment did it happen in?

Crossplane Version: 1.14.3-up.1
Provider Version: 0.11.0
Kubernetes Version: 1.26.10
Kubernetes Distribution: AKS

ytsarev commented 5 months ago

@cachaldora I do not observe any related change in the provider codebase.

If we look at https://github.com/upbound/provider-terraform/blame/main/cmd/provider/main.go the related changes of maxReconcileRate were happening 2 years ago.

If we look at https://github.com/upbound/provider-terraform/tree/main?tab=readme-ov-file#known-limitations , there is statement

Setting --max-reconcile-rate to a value greater than 1 will potentially cause the provider to use up to the same number of CPUs. Add a resources section to the ControllerConfig to restrict CPU usage as needed.

Is it possible that you are overloading your system with a high value of 40, and it produces this undesireable 'long reconciliation' side effect? Could you try to use the CPU limit as per recommendation?

bobh66 commented 5 months ago

The description in #233 mentions having cpu.requests == 1000m which is essentially the same as setting the max-reconcile-rate to 1. If you want to use max-reconcile-rate=10 then you also have to set cpu.requests to 10 AND the node must have 10 vCPUs available to give to the pod. It's not very efficient but it's the only way to run multiple terraform CLI commands in parallel.

toastwaffle commented 5 months ago

@bobh66 is there anything which enforces the CPU count limiting the reconcile rate (i.e. by somehow reserving 1 CPU per terraform invocation), or is your comment based on the assumption that any terraform invocation will just use 1 CPU continuously? AIUI most long running terraform operations will be IO bound on calls to resource APIs (or in some cases sleeping between such calls as it polls for status)

bobh66 commented 5 months ago

Each goroutine that gets spawned for a reconciliation calls the terraform CLI several times for workspace selection and the Observe step (terraform plan), and may call it again for either apply or delete, either of which may be long-running processes. Each invocation of the terraform CLI calls exec.Command which is blocking for the duration of that execution. When there is only a single CPU allocated there can be only one terraform plan/apply/delete running at a time. Even if the CLI command is blocked waiting for the remote API to finish - which I agree is happening a lot - it still won't allow the other CLI commands to run until it is finished. We have talked about using async execution, but that would require bypassing the CLI which would require a lot of rework.

bobh66 commented 5 months ago

This may be related to the Lock behavior described in #239

cachaldora commented 5 months ago

It seems to be related. We also experienced the same as @toastwaffle testing with --max-reconcile-rate=40 and resource.request.cpu=4 and resource.limit.cpu=4. We were looking into cpu load that usually didn't go above 2 and to processes running that seemed to be running a workspace at a time most of the time.

project-administrator commented 1 month ago

Is it still an issue with v0.16? We're still using the v0.12 where concurrency is working OK and can't upgrade to any newer version because we need the concurrency working properly. Newer versions have an issue with TF plugin cache locking https://github.com/upbound/provider-terraform/issues/239

toastwaffle commented 1 month ago

@project-administrator (nice username) This should be fixed from v0.14.1 onwards - see #240 which more or less fixed #239. If you are using high concurrency, I strongly recommend setting up a PVC for your TF workspaces.

project-administrator commented 1 month ago

With higher concurrency values like 10 we need to reserve an appropriate amount of RAM and CPU for the terraform pod to run multiple "terraform apply" instances. For us it's 1 CPU core and 1 Gb of RAM per terraform invocation. Provider-terraform pod stays 99% of time idle, but it really needs these resources when we apply some change globally for multiple TF workspaces, then it needs to reconcile all of them. Given the above, it looks like we're reserving the resources for the provider-terraform and it's not using them most of the time.. Would be really nice if we could run the "terraform apply" as a kubernetes job with its own requests and limits instead of running everything in a one single pod..

project-administrator commented 1 month ago

I wonder if we can use the DeploymentRuntimeConfig replicas setting to run several instances of the provider? Has anyone tested this configuration?

bobh66 commented 1 month ago

@project-administrator You can run multiple replicas of the provider but it will not help with scaling. The provider is a kubernetes controller (or multiple controllers) and by design controllers cannot run more than one instance. There is (currently) no way to ensure that events for a specific resource instance will be processed in order by the same controller instance, so all controllers are run as single instances. If there are multiple replicas defined they will do leader election and the non-leader instances will wait until the leader is unavailable before they try to become to leader and process the workload.

upbound / provider-terraform