hashicorp / terraform-provider-kubernetes

Terraform Kubernetes provider
https://www.terraform.io/docs/providers/kubernetes/
Mozilla Public License 2.0
1.58k stars 968 forks source link

kubernetes_manifest listing all CRDs each time #1651

Open bchess opened 2 years ago

bchess commented 2 years ago

Terraform Version, Provider Version and Kubernetes Version

Terraform version: v0.15.5
Kubernetes provider version: v2.8.0
Kubernetes version: v1.21.3

Affected Resource(s)

Steps to Reproduce

  1. terraform plan

Expected Behavior

The kubernetes provider may query the API server once to get a list of known CRDs, and cache the result for subsequent resource reads

Actual Behavior

It appears that the provider executes a LIST query of /apis/apiextensions.k8s.io/v1/customresourcedefinitions for each kubernetes_manifest resource. Somehow it's not caching the results, and this makes running plan unnecessarily slow. This is occurring even with kubernetes_manifest of basic builtin types like ConfigMap

Here is the stack trace:

2 @ 0x100a1f140 0x100a1f1cc 0x100a504c8 0x100a5ef24 0x100f4a258 0x100f5a81c 0x100b5525c 0x100b56098 0x100da1618 0x100d9ff58 0x100d9f2d0 0x100da3de8 0x100f5d590 0x100a93558 0x100ce5488 0x101ef08ac 0x101ef0410 0x101ef0128 0x101eeff08 0x101eefb04 0x101ef0334 0x10221780c 0x10272c46c 0x10272a520 0x102727f88 0x10118f3f0 0x1011548cc 0x1011397d8 0x101027b34 0x10102c61c 0x101024c04 0x100a54344
#   0x100a504c7 sync.runtime_notifyListWait+0x157                                           /opt/homebrew/Cellar/go/1.17.6/libexec/src/runtime/sema.go:513
#   0x100a5ef23 sync.(*Cond).Wait+0x93                                                  /opt/homebrew/Cellar/go/1.17.6/libexec/src/sync/cond.go:56
#   0x100f4a257 golang.org/x/net/http2.(*pipe).Read+0x387                                       /Users/bchess/terraform-provider-kubernetes/vendor/golang.org/x/net/http2/pipe.go:65
#   0x100f5a81b golang.org/x/net/http2.transportResponseBody.Read+0xbb                                  /Users/bchess/terraform-provider-kubernetes/vendor/golang.org/x/net/http2/transport.go:2110
#   0x100b5525b bufio.(*Reader).fill+0x25b                                              /opt/homebrew/Cellar/go/1.17.6/libexec/src/bufio/bufio.go:101
#   0x100b56097 bufio.(*Reader).ReadByte+0xb7                                               /opt/homebrew/Cellar/go/1.17.6/libexec/src/bufio/bufio.go:253
#   0x100da1617 compress/flate.(*decompressor).huffSym+0x97                                     /opt/homebrew/Cellar/go/1.17.6/libexec/src/compress/flate/inflate.go:719
#   0x100d9ff57 compress/flate.(*decompressor).huffmanBlock+0x87                                    /opt/homebrew/Cellar/go/1.17.6/libexec/src/compress/flate/inflate.go:494
#   0x100d9f2cf compress/flate.(*decompressor).Read+0x21f                                       /opt/homebrew/Cellar/go/1.17.6/libexec/src/compress/flate/inflate.go:347
#   0x100da3de7 compress/gzip.(*Reader).Read+0xa7                                           /opt/homebrew/Cellar/go/1.17.6/libexec/src/compress/gzip/gunzip.go:251
#   0x100f5d58f golang.org/x/net/http2.(*gzipReader).Read+0x1af                                     /Users/bchess/terraform-provider-kubernetes/vendor/golang.org/x/net/http2/transport.go:2578
#   0x100a93557 io.ReadAll+0x1a7                                                    /opt/homebrew/Cellar/go/1.17.6/libexec/src/io/io.go:633
#   0x100ce5487 io/ioutil.ReadAll+0x47                                                  /opt/homebrew/Cellar/go/1.17.6/libexec/src/io/ioutil/ioutil.go:27
#   0x101ef08ab k8s.io/client-go/rest.(*Request).transformResponse+0xbb                                 /Users/bchess/terraform-provider-kubernetes/vendor/k8s.io/client-go/rest/request.go:1067
#   0x101ef040f k8s.io/client-go/rest.(*Request).Do.func1+0x4f                                      /Users/bchess/terraform-provider-kubernetes/vendor/k8s.io/client-go/rest/request.go:1039
#   0x101ef0127 k8s.io/client-go/rest.(*Request).request.func2.1+0x47                                   /Users/bchess/terraform-provider-kubernetes/vendor/k8s.io/client-go/rest/request.go:996
#   0x101eeff07 k8s.io/client-go/rest.(*Request).request.func2+0x3a7                                    /Users/bchess/terraform-provider-kubernetes/vendor/k8s.io/client-go/rest/request.go:1021
#   0x101eefb03 k8s.io/client-go/rest.(*Request).request+0x6b3                                      /Users/bchess/terraform-provider-kubernetes/vendor/k8s.io/client-go/rest/request.go:1023
#   0x101ef0333 k8s.io/client-go/rest.(*Request).Do+0xa3                                        /Users/bchess/terraform-provider-kubernetes/vendor/k8s.io/client-go/rest/request.go:1038
#   0x10221780b k8s.io/client-go/dynamic.(*dynamicResourceClient).List+0x1ab                                /Users/bchess/terraform-provider-kubernetes/vendor/k8s.io/client-go/dynamic/simple.go:254
#   0x10272c46b github.com/hashicorp/terraform-provider-kubernetes/manifest/provider.(*RawProviderServer).lookUpGVKinCRDs+0x42b     /Users/bchess/terraform-provider-kubernetes/manifest/provider/resource.go:218
#   0x10272a51f github.com/hashicorp/terraform-provider-kubernetes/manifest/provider.(*RawProviderServer).TFTypeFromOpenAPI+0x1ef   /Users/bchess/terraform-provider-kubernetes/manifest/provider/resource.go:91
#   0x102727f87 github.com/hashicorp/terraform-provider-kubernetes/manifest/provider.(*RawProviderServer).ReadResource+0x1417       /Users/bchess/terraform-provider-kubernetes/manifest/provider/read.go:93
#   0x10118f3ef github.com/hashicorp/terraform-plugin-mux.SchemaServer.ReadResource+0xdf                        /Users/bchess/terraform-provider-kubernetes/vendor/github.com/hashicorp/terraform-plugin-mux/schema_server.go:265
#   0x1011548cb github.com/hashicorp/terraform-plugin-go/tfprotov5/tf5server.(*server).ReadResource+0x4cb               /Users/bchess/terraform-provider-kubernetes/vendor/github.com/hashicorp/terraform-plugin-go/tfprotov5/tf5server/server.go:744
#   0x1011397d7 github.com/hashicorp/terraform-plugin-go/tfprotov5/internal/tfplugin5._Provider_ReadResource_Handler+0x2d7      /Users/bchess/terraform-provider-kubernetes/vendor/github.com/hashicorp/terraform-plugin-go/tfprotov5/internal/tfplugin5/tfplugin5_grpc.pb.go:349
#   0x101027b33 google.golang.org/grpc.(*Server).processUnaryRPC+0x1253                                 /Users/bchess/terraform-provider-kubernetes/vendor/google.golang.org/grpc/server.go:1282
#   0x10102c61b google.golang.org/grpc.(*Server).handleStream+0x80b                                 /Users/bchess/terraform-provider-kubernetes/vendor/google.golang.org/grpc/server.go:1616
#   0x101024c03 google.golang.org/grpc.(*Server).serveStreams.func1.2+0xa3                              /Users/bchess/terraform-provider-kubernetes/vendor/google.golang.org/grpc/server.go:921

And here is a pprof sorted by cumulative time:

(pprof) top100 -cum
Showing nodes accounting for 33.86s, 79.39% of 42.65s total
Dropped 319 nodes (cum <= 0.21s)
Showing top 100 nodes out of 169
      flat  flat%   sum%        cum   cum%
         0     0%     0%     25.92s 60.77%  google.golang.org/grpc.(*Server).processUnaryRPC
         0     0%     0%     25.83s 60.56%  github.com/hashicorp/terraform-provider-kubernetes/manifest/provider.(*RawProviderServer).TFTypeFromOpenAPI
         0     0%     0%     25.83s 60.56%  google.golang.org/grpc.(*Server).handleStream
         0     0%     0%     25.73s 60.33%  google.golang.org/grpc.(*Server).serveStreams.func1.2
         0     0%     0%     25.09s 58.83%  k8s.io/client-go/dynamic.(*dynamicResourceClient).List
         0     0%     0%     25.06s 58.76%  github.com/hashicorp/terraform-provider-kubernetes/manifest/provider.(*RawProviderServer).lookUpGVKinCRDs
         0     0%     0%     23.87s 55.97%  k8s.io/apimachinery/pkg/util/json.Unmarshal
         0     0%     0%     23.77s 55.73%  k8s.io/apimachinery/pkg/apis/meta/v1/unstructured.unstructuredJSONScheme.decode
         0     0%     0%     23.75s 55.69%  k8s.io/apimachinery/pkg/apis/meta/v1/unstructured.unstructuredJSONScheme.Decode
         0     0%     0%     23.74s 55.66%  k8s.io/apimachinery/pkg/runtime.Decode
         0     0%     0%     19.68s 46.14%  k8s.io/apimachinery/pkg/apis/meta/v1/unstructured.unstructuredJSONScheme.decodeToList
         0     0%     0%     14.90s 34.94%  encoding/json.(*Decoder).Decode
         0     0%     0%     13.57s 31.82%  encoding/json.(*decodeState).value
         0     0%     0%     13.56s 31.79%  encoding/json.(*decodeState).object
         0     0%     0%     13.49s 31.63%  encoding/json.(*decodeState).unmarshal
         0     0%     0%     11.46s 26.87%  runtime.systemstack
         0     0%     0%      9.86s 23.12%  runtime.gcBgMarkWorker.func2
     0.15s  0.35%  0.35%      9.86s 23.12%  runtime.gcDrain
     0.14s  0.33%  0.68%      9.54s 22.37%  encoding/json.(*decodeState).objectInterface
     0.02s 0.047%  0.73%      9.49s 22.25%  encoding/json.(*decodeState).valueInterface
         0     0%  0.73%      9.16s 21.48%  encoding/json.(*decodeState).array
         0     0%  0.73%      8.96s 21.01%  github.com/hashicorp/terraform-plugin-go/tfprotov5/tf5server.(*server).UpgradeResourceState
         0     0%  0.73%      8.95s 20.98%  github.com/hashicorp/terraform-provider-kubernetes/manifest/provider.(*RawProviderServer).UpgradeResourceState
         0     0%  0.73%      8.94s 20.96%  github.com/hashicorp/terraform-plugin-mux.SchemaServer.UpgradeResourceState
         0     0%  0.73%      8.89s 20.84%  github.com/hashicorp/terraform-plugin-go/tfprotov5/internal/tfplugin5._Provider_UpgradeResourceState_Handler
         0     0%  0.73%      8.89s 20.84%  github.com/hashicorp/terraform-plugin-go/tfprotov5/tf5server.(*server).PlanResourceChange
         0     0%  0.73%      8.88s 20.82%  github.com/hashicorp/terraform-plugin-go/tfprotov5/internal/tfplugin5._Provider_PlanResourceChange_Handler
         0     0%  0.73%      8.88s 20.82%  github.com/hashicorp/terraform-plugin-mux.SchemaServer.PlanResourceChange
         0     0%  0.73%      8.88s 20.82%  github.com/hashicorp/terraform-provider-kubernetes/manifest/provider.(*RawProviderServer).PlanResourceChange

References

Community Note

alexsomesan commented 2 years ago

Hi, thanks for opening this conversation.

In fact, the provider needs to make sure it knows about any CRDs that may have been created during the same operation, so that it can handle any CRs of that type.

However, I think there is room for optimizing the number of these calls. We previously avoided to introduce any optimisations until the provider had stabilized enough. In fact, we actually had to roll-back some caching we had introduced too early that was causing hard to diagnose issues.

At this point, I think we can take a look at reducing the number of CRD retrieval calls.

github-actions[bot] commented 1 year ago

Marking this issue as stale due to inactivity. If this issue receives no comments in the next 30 days it will automatically be closed. If this issue was automatically closed and you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. This helps our maintainers find and focus on the active issues. Maintainers may also remove the stale label at their discretion. Thank you!

varunthakur2480 commented 1 year ago

this is starting to hurt us really bad, we have a set of 20 resources (flux) that are being created and it takes ~20 minutes to run a plan and often it times out . Please can this be prioritised asap?

bchess commented 1 year ago

fwiw we moved to kubectl_manifest and so far it's been much better

varunthakur2480 commented 1 year ago

I can try that but it will be good to address this issue too

github-actions[bot] commented 5 months ago

Marking this issue as stale due to inactivity. If this issue receives no comments in the next 30 days it will automatically be closed. If this issue was automatically closed and you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. This helps our maintainers find and focus on the active issues. Maintainers may also remove the stale label at their discretion. Thank you!

cjgibson commented 4 months ago

Just noting this is still an active issue with the upstream Hashi provider here so the bot doesn't close this issue - generally speaking if you're managing CRDs or CRs at all (or anything else applied via bare YAML) you shouldn't use Hashi's provider.

vihangm commented 3 months ago

This continues to be quite slow, any ETA on whether this is going to be addressed?