upbound / provider-terraform

A @crossplane provider for Terraform
Apache License 2.0
124 stars 55 forks source link

The node was low on resource: ephemeral-storage #233

Open cachaldora opened 5 months ago

cachaldora commented 5 months ago

What happened?

Earlier this week I’ve been migrating crossplane resources from an old cluster to a new one. During this process we needed to reconcile about 400 terraform workspaces (half of them with remote state).

After adjusting TF provider pod resources (requests.memory: 2G, limits.memory: 2G, requests.cpu: 1000m, limits.cpu: 5000m) it was being evicted every 20 minutes with the following error:

Warning  Evicted              117s  kubelet            The node was low on resource: ephemeral-storage. Threshold quantity: 7859887835, available: 7344104Ki. Container package-runtime was using 51370076Ki, request is 0, has larger consumption of ephemeral-storage.  Normal   Killing              117s  kubelet            Stopping container package-runtime  Warning  ExceededGracePeriod  107s  kubelet            Container runtime did not kill the pod within specified grace period.

The workaround was adjusting pod resources.request.ephemeral-storage to 60Gi and this increased time to eviction.

TF provider (v0.11.0) was configured to disable plugin cache because it had --max-reconcile-rate=10.

What environment did it happen in?

ytsarev commented 5 months ago

@cachaldora I think you can try to enable plugin cache again after we got https://github.com/upbound/provider-terraform/pull/215 merged. It should help with overall performance situation

ytsarev commented 5 months ago

@cachaldora the above mentioned change was released in https://github.com/upbound/provider-terraform/releases/tag/v0.12.0 . I recommend to upgrade to the latest https://github.com/upbound/provider-terraform/releases/tag/v0.13.0

cachaldora commented 5 months ago

About the peformance, I've upgraded to 0.13.0 and open a related issue: https://github.com/upbound/provider-terraform/issues/234