Closed simnalamburt closed 3 years ago
/reopen
@olemarkus: Reopened this issue.
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
.
Mark the issue as fresh with /remove-lifecycle rotten
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close
@fejta-bot: Closing this issue.
Tested with terraform:0.14.3
and kops
Version 1.18.2 (git-84495481e4)
Still capturing the .k8s.local instead of the correct ELB address. Workarounds doesn't seem to work.
Validation failed: unexpected error during validation: error listing nodes: an error on the server ("") has prevented the request from succeeding (get nodes)
Tried re-exporting the ELB endpoint:
kops export kubecfg --name ${CLUSTER_NAME} && \
kops update cluster ${CLUSTER_NAME} \
--out=. \
--target=terraform && \
terraform apply -auto-approve && \
kops rolling-update cluster ${CLUSTER_NAME} --cloudonly --force --yes
Doing this makes the master node seem to be stuck on the initializing
status on AWS on a few occasions. Eventually becomes okay
Also tried creating the gateway and ELB first before using terraform apply
same result:
kops create ...
...
terraform apply -target=aws_internet_gateway.${CLUSTER_PREFIX}-k8s-local -auto-approve && \
terraform apply -target=aws_elb.api-${CLUSTER_PREFIX}-k8s-local -auto-approve
kops update cluster \
--out=. \
--target=terraform
terraform apply -auto-approve && \
kops rolling-update cluster --cloudonly --force --master-interval=1s --node-interval=1s --yes
I am using t3a.small
for the nodes, t3a.medium
for the master node.
Still experiencing this regarding gossip-based clusters. Abandoning the infrastructure-as-code for now (via terraform), will just deploy via kops only.
Hopefully you reopen this for tracking. Thank you!
Issue still persists. Great feature, but not usable at the moment.
/reopen
@alen-z: You can't reopen an issue/PR unless you authored it or you are a collaborator.
/remove-lifecycle rotten
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close
@k8s-triage-robot: Closing this issue.
If you create a cluster with both terraform and gossip options enabled, all
kubectl
commands shall fail.How to reproduce the error
My environment
Setting up the cluster
Spoiler Alert: Creating the self-signed certificate before creating actual Kubernetes cluster is the root cause of this issue. Please continue to see why.
Scenario 1. Looking up non-existent domain
This is basically because of erroneous
~/.kube/config
file. If you run thekops create cluster
with both terraform and gossip options enabled, you'll get wrong~/.kube/config
file.Let's manually correct that file. Or, you'll get good config file if you explicitly export the configuration once again.
Then the non-existent domain will be replaced with the ELB of master nodes' DNS name.
And you'll be ended up to the scenario 2 when you retry.
Scenario 2. Invalid certificate
This is simply because the DNS name of ELB is not included in the certificate. This scenario occurs only when you create the cluster with terraform option being enabled. If you try to create the cluster only with gossip option not using the terraform target, the self-signed certificate will properly contain the DNS name of ELB.
(Sorry for the Korean, this is the list of DNS alternative names of certificate)
The only way to workaround this problem is forcing "kops-temp.k8s.local" to point proper IP address via manually editing
/etc/hosts
, which is undesired for many people.I'm not very familiar with Kops internal, but I expect a huge change to properly fix this issue. Maybe using AWS Certificate Manager can be a solution. (#834) Any ideas?