Closed bothra90 closed 7 months ago
You didn't have any config changes for the discovery service? Also could you share this config?
Nothing was changed in the discovery service config. Here's the full config we use:
version: v3
teleport:
data_dir: /var/lib/teleport
join_params:
method: iam
token_name: outpost-token
proxy_server: fennel.teleport.sh:443
log:
output: stderr
severity: INFO
format:
output: text
ca_pin: sha256:bc2783105140465fa95eac5e3748d1ad7bb12c39e39b40f0fb3d3727ff01d286
diag_addr: ""
ssh_service:
enabled: "yes"
commands:
- name: "fennel.ai/cluster-id"
command: ['echo', '%%FENNEL_CLUSTER_ID%%']
period: 1m0s
discovery_service:
enabled: "yes"
discovery_group: "aws-prod"
aws:
- types: ["eks"]
regions: ["%%REGION%%"]
tags:
"managed-by": "fennel.ai"
"fennel.ai/cluster-id": "%%FENNEL_CLUSTER_ID%%"
kubernetes_service:
enabled: "yes"
resources:
- labels:
fennel.ai/cluster-id: %%FENNEL_CLUSTER_ID%%
app_service:
enabled: "yes"
apps:
- name: "%%FENNEL_CLUSTER_ID%%-aws-console"
uri: "https://console.aws.amazon.com/ec2/v2/home"
labels:
fennel.ai/cluster-id: %%FENNEL_CLUSTER_ID%%
# Explicitly disabled
auth_service:
enabled: "no"
proxy_service:
enabled: "no"
https_keypairs: []
https_keypairs_reload_interval: 0s
acme: {}
@bothra90 I see that you have two kube agents connected to the auth. Is it intentional? Maybe when you upgraded the discovery server you started new one, but left old one running?
We had two nodes, both running almost the same conf as above. I have shut down one of them, but still seeing some errors:
2024-02-17T01:17:42Z INFO [KUBERNETE] Starting Kube service via proxy reverse tunnel. pid:112890.1 service/kubernetes.go:252
2024-02-17T01:17:42Z INFO [DISCOVERY] kube_cluster eks-cluster-eksCluster-85868e8-us-west-1-824489454832 matches, creating. kind:kube_cluster pid:112890.1 services/reconciler.go:162
2024-02-17T01:17:42Z WARN [DISCOVERY] Unable to reconcile resources. error:[
ERROR REPORT:
Original Error: trace.aggregate failed to create kube_cluster eks-cluster-eksCluster-85868e8-us-west-1-824489454832
kubernetes cluster "eks-cluster-eksCluster-85868e8-us-west-1-824489454832" doesn't exist
Stack Trace:
github.com/gravitational/teleport/lib/services/reconciler.go:131 github.com/gravitational/teleport/lib/services.(*Reconciler[...]).Reconcile
github.com/gravitational/teleport/lib/srv/discovery/kube_watcher.go:99 github.com/gravitational/teleport/lib/srv/discovery.(*Server).startKubeWatchers.func4
runtime/asm_arm64.s:1197 runtime.goexit
User Message: failed to create kube_cluster eks-cluster-eksCluster-85868e8-us-west-1-824489454832
kubernetes cluster "eks-cluster-eksCluster-85868e8-us-west-1-824489454832" doesn't exist] pid:112890.1 discovery/kube_watcher.go:100
Even if we have multiple discovery servers running, shouldn't the "discovery_group" lead to resources getting dedup-ed?
@AntonAM : got some debug logs from the teleport agent. There's not that much new information here, but sharing anyway.
2024-02-17T06:16:08Z DEBU [DISCOVERY] EKS cluster status is valid: ACTIVE cluster_name:eks-cluster-eksCluster-85868e8 pid:6577.1 fetchers/eks.go:228
2024-02-17T06:16:08Z DEBU [DISCOVERY] Reconciling 0 current resources with 1 new resources. kind:kube_cluster pid:6577.1 services/reconciler.go:112
2024-02-17T06:16:08Z INFO [DISCOVERY] kube_cluster eks-cluster-eksCluster-85868e8-us-west-1-824489454832 matches, creating. kind:kube_cluster pid:6577.1 services/reconciler.go:162
2024-02-17T06:16:08Z DEBU [DISCOVERY] Creating kube_cluster eks-cluster-eksCluster-85868e8-us-west-1-824489454832. pid:6577.1 discovery/kube_watcher.go:112
2024-02-17T06:16:08Z DEBU [DISCOVERY] Updating kube_cluster eks-cluster-eksCluster-85868e8-us-west-1-824489454832. pid:6577.1 discovery/kube_watcher.go:141
2024-02-17T06:16:08Z WARN [DISCOVERY] Unable to reconcile resources. error:[
ERROR REPORT:
Original Error: trace.aggregate failed to create kube_cluster eks-cluster-eksCluster-85868e8-us-west-1-824489454832
kubernetes cluster "eks-cluster-eksCluster-85868e8-us-west-1-824489454832" doesn't exist
Stack Trace:
github.com/gravitational/teleport/lib/services/reconciler.go:131 github.com/gravitational/teleport/lib/services.(*Reconciler[...]).Reconcile
github.com/gravitational/teleport/lib/srv/discovery/kube_watcher.go:99 github.com/gravitational/teleport/lib/srv/discovery.(*Server).startKubeWatchers.func4
runtime/asm_arm64.s:1197 runtime.goexit
User Message: failed to create kube_cluster eks-cluster-eksCluster-85868e8-us-west-1-824489454832
kubernetes cluster "eks-cluster-eksCluster-85868e8-us-west-1-824489454832" doesn't exist] pid:6577.1 discovery/kube_watcher.go:100
@bothra90 yes, it should deduplicate, or rather not try to change identical resources. But it looks like one of the discovery services didn't actually see eks clusters for some reason, so it ended up that one service was creating it and another one deleting it.
Regarding further errors, could you run command tctl get kube_clusters
and show its result here? (with a user that has sufficient permissions to get this data)
Closing due to inactivity.
Expected behavior: After discovery, the cluster should be accessible via the kubernetes service
Current behavior: The cluster is repeatedly added and removed (see logs below)
Bug details:
Feb 14 19:01:19 ip-10-42-224-38.us-west-1.compute.internal teleport[7259]: 2024-02-14T19:01:19Z INFO [KUBERNETE] kube_cluster eks-cluster-eksCluster-85868e8-us-west-1-824489454832 matches, creating. pid:7259.1 services/reconciler.go:162 Feb 14 19:01:48 ip-10-42-224-38.us-west-1.compute.internal teleport[7259]: 2024-02-14T19:01:48Z INFO [KUBERNETE] kube_cluster eks-cluster-eksCluster-85868e8-us-west-1-824489454832 removed, deleting. pid:7259.1 services/reconciler.go:144