kubernetes / kops

Kubernetes Operations (kOps) - Production Grade k8s Installation, Upgrades and Management
https://kops.sigs.k8s.io/
Apache License 2.0
15.99k stars 4.65k forks source link

kops validate cluster failing with error on AWS #10717

Closed nambyats closed 3 years ago

nambyats commented 3 years ago

1. What kops version are you running? The command kops version, will display this information. Version 1.19.0 (git-04d36d7d92c72601efd918877fc180c846129ffb)

2. What Kubernetes version are you running? kubectl version will print the version if a cluster is running or provide the Kubernetes version specified as a kops flag. Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.2", GitCommit:"faecb196815e248d3ecfb03c680a4507229c2a56", GitTreeState:"clean", BuildDate:"2021-01-13T13:28:09Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"} The connection to the server localhost:8080 was refused - did you specify the right host or port?

3. What cloud provider are you using? AWS

4. What commands did you run? What is the simplest way to reproduce this issue?

5. What happened after the commands executed? root@ip-:/home/ubuntu# kops validate cluster Validating cluster dev.k8s.sajeer.in

Validation failed: cannot load kubecfg settings for "dev.k8s.sajeer.in": context "dev.k8s.sajeer.in" does not exist root@ip-172-31-37-224:/home/ubuntu#

6. What did you expect to happen?

validation failed 7. Please provide your cluster manifest. Execute kops get --name my.example.com -o yaml to display your cluster manifest. You may want to remove your cluster name and other sensitive information. apiVersion: kops.k8s.io/v1alpha2 kind: Cluster metadata: creationTimestamp: "2021-02-02T19:46:29Z" name: dev.k8s.sajeer.in spec: api: dns: {} authorization: rbac: {} channel: stable cloudProvider: aws configBase: s3://dev.k8s.sajeer.in/dev.k8s.sajeer.in containerRuntime: docker dnsZone: sajeer.in etcdClusters:


apiVersion: kops.k8s.io/v1alpha2 kind: InstanceGroup metadata: creationTimestamp: "2021-02-02T19:44:06Z" labels: kops.k8s.io/cluster: dev.k8s.sajeer.in name: master-us-east-2c spec: image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20210119.1 machineType: t3.medium maxSize: 1 minSize: 1 nodeLabels: kops.k8s.io/instancegroup: master-us-east-2c role: Master subnets:


apiVersion: kops.k8s.io/v1alpha2 kind: InstanceGroup metadata: creationTimestamp: "2021-02-02T19:44:06Z" labels: kops.k8s.io/cluster: dev.k8s.sajeer.in name: nodes-us-east-2c spec: image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20210119.1 machineType: t3.medium maxSize: 1 minSize: 1 nodeLabels: kops.k8s.io/instancegroup: nodes-us-east-2c role: Node subnets:

8. Please run the commands with most verbose logging by adding the -v 10 flag. Paste the logs into this report, or in a gist and provide the gist link here. root@ip-172-31-37-224:/home/ubuntu# kops validate cluster -v 10 I0203 10:03:59.383300 5706 factory.go:68] state store s3://dev.k8s.sajeer.in I0203 10:03:59.383664 5706 s3context.go:334] product_uuid is "ec2227c4-6f80-3a0c-9147-3e6623392e73", assuming running on EC2 I0203 10:03:59.387134 5706 s3context.go:166] got region from metadata: "us-east-2" I0203 10:03:59.466983 5706 s3context.go:213] found bucket in region "us-east-2" I0203 10:03:59.467168 5706 s3fs.go:290] Reading file "s3://dev.k8s.sajeer.in/dev.k8s.sajeer.in/config" I0203 10:03:59.496041 5706 aws_cloud.go:1501] Querying EC2 for all valid zones in region "us-east-2" I0203 10:03:59.500413 5706 request_logger.go:45] AWS request: ec2/DescribeAvailabilityZones I0203 10:03:59.543172 5706 s3fs.go:327] Listing objects in S3 bucket "dev.k8s.sajeer.in" with prefix "dev.k8s.sajeer.in/instancegroup/" I0203 10:03:59.560942 5706 s3fs.go:355] Listed files in s3://dev.k8s.sajeer.in/dev.k8s.sajeer.in/instancegroup: [s3://dev.k8s.sajeer.in/dev.k8s.sajeer.in/instancegroup/master-us-east-2c s3://dev.k8s.sajeer.in/dev.k8s.sajeer.in/instancegroup/nodes-us-east-2c] I0203 10:03:59.561097 5706 s3fs.go:290] Reading file "s3://dev.k8s.sajeer.in/dev.k8s.sajeer.in/instancegroup/master-us-east-2c" I0203 10:03:59.585023 5706 s3fs.go:290] Reading file "s3://dev.k8s.sajeer.in/dev.k8s.sajeer.in/instancegroup/nodes-us-east-2c" Validating cluster dev.k8s.sajeer.in

I0203 10:03:59.596540 5706 validate_cluster.go:130] instance group: kops.InstanceGroupSpec{Role:"Master", Image:"099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20210119.1", MinSize:(int32)(0xc000b1793c), MaxSize:(int32)(0xc000b17930), MachineType:"t3.medium", RootVolumeSize:(int32)(nil), RootVolumeType:(string)(nil), RootVolumeIops:(int32)(nil), RootVolumeThroughput:(int32)(nil), RootVolumeOptimization:(bool)(nil), RootVolumeDeleteOnTermination:(bool)(nil), RootVolumeEncryption:(bool)(nil), RootVolumeEncryptionKey:(string)(nil), Volumes:[]kops.VolumeSpec(nil), VolumeMounts:[]kops.VolumeMountSpec(nil), Subnets:[]string{"us-east-2c"}, Zones:[]string(nil), Hooks:[]kops.HookSpec(nil), MaxPrice:(string)(nil), SpotDurationInMinutes:(int64)(nil), AssociatePublicIP:(bool)(nil), AdditionalSecurityGroups:[]string(nil), CloudLabels:map[string]string(nil), NodeLabels:map[string]string{"kops.k8s.io/instancegroup":"master-us-east-2c"}, FileAssets:[]kops.FileAssetSpec(nil), Tenancy:"", Kubelet:(kops.KubeletConfigSpec)(nil), Taints:[]string(nil), MixedInstancesPolicy:(kops.MixedInstancesPolicySpec)(nil), AdditionalUserData:[]kops.UserData(nil), SuspendProcesses:[]string(nil), ExternalLoadBalancers:[]kops.LoadBalancer(nil), DetailedInstanceMonitoring:(bool)(nil), IAM:(kops.IAMProfileSpec)(nil), SecurityGroupOverride:(string)(nil), InstanceProtection:(bool)(nil), SysctlParameters:[]string(nil), RollingUpdate:(kops.RollingUpdate)(nil), InstanceInterruptionBehavior:(string)(nil), CompressUserData:(bool)(nil), InstanceMetadata:(*kops.InstanceMetadataOptions)(nil)}

I0203 10:03:59.596795 5706 validate_cluster.go:130] instance group: kops.InstanceGroupSpec{Role:"Node", Image:"099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20210119.1", MinSize:(int32)(0xc000b3026c), MaxSize:(int32)(0xc000b30260), MachineType:"t3.medium", RootVolumeSize:(int32)(nil), RootVolumeType:(string)(nil), RootVolumeIops:(int32)(nil), RootVolumeThroughput:(int32)(nil), RootVolumeOptimization:(bool)(nil), RootVolumeDeleteOnTermination:(bool)(nil), RootVolumeEncryption:(bool)(nil), RootVolumeEncryptionKey:(string)(nil), Volumes:[]kops.VolumeSpec(nil), VolumeMounts:[]kops.VolumeMountSpec(nil), Subnets:[]string{"us-east-2c"}, Zones:[]string(nil), Hooks:[]kops.HookSpec(nil), MaxPrice:(string)(nil), SpotDurationInMinutes:(int64)(nil), AssociatePublicIP:(bool)(nil), AdditionalSecurityGroups:[]string(nil), CloudLabels:map[string]string(nil), NodeLabels:map[string]string{"kops.k8s.io/instancegroup":"nodes-us-east-2c"}, FileAssets:[]kops.FileAssetSpec(nil), Tenancy:"", Kubelet:(kops.KubeletConfigSpec)(nil), Taints:[]string(nil), MixedInstancesPolicy:(kops.MixedInstancesPolicySpec)(nil), AdditionalUserData:[]kops.UserData(nil), SuspendProcesses:[]string(nil), ExternalLoadBalancers:[]kops.LoadBalancer(nil), DetailedInstanceMonitoring:(bool)(nil), IAM:(kops.IAMProfileSpec)(nil), SecurityGroupOverride:(string)(nil), InstanceProtection:(bool)(nil), SysctlParameters:[]string(nil), RollingUpdate:(kops.RollingUpdate)(nil), InstanceInterruptionBehavior:(string)(nil), CompressUserData:(bool)(nil), InstanceMetadata:(*kops.InstanceMetadataOptions)(nil)}

Validation failed: cannot load kubecfg settings for "dev.k8s.sajeer.in": context "dev.k8s.sajeer.in" does not exist 9. Anything else do we need to know? I followed below steps to setup k8s cluster on AWS kops https://github.com/ValaxyTech/DevOpsDemos/blob/master/Kubernetes/k8s-setup.md

ghost commented 3 years ago

Did you try to add --admin to kops update cluster dev.k8s.valaxy.in --yes? Otherweise the local connection information (in ~/.kube) will not get updated any more

On an existing cluster, you can export the settings with kops export kubecfg --admin

nambyats commented 3 years ago

Hi Michael,

Many thanks for your response. I tried running with --admin but the result is same for kops update cluster. I can see the master and nodes created in AWS console but the kops validate cluster is failing with error "cannot load kubecfg settings for "dev.k8s.sajeer.in": context "dev.k8s.sajeer.in" does not exist"

root@ip-/home/ubuntu# kops update cluster --name dev.k8s.sajeer.in --yes --admin I0203 11:12:20.379015 5853 dns.go:96] Private DNS: skipping DNS validation I0203 11:12:20.652904 5853 executor.go:111] Tasks: 0 done / 79 total; 44 can run W0203 11:12:20.750731 5853 vfs_castore.go:604] CA private key was not found I0203 11:12:20.763087 5853 keypair.go:195] Issuing new certificate: "etcd-clients-ca" I0203 11:12:20.798939 5853 keypair.go:195] Issuing new certificate: "master" W0203 11:12:20.858137 5853 vfs_castore.go:604] CA private key was not found I0203 11:12:20.858306 5853 keypair.go:195] Issuing new certificate: "ca" I0203 11:12:20.878066 5853 keypair.go:195] Issuing new certificate: "apiserver-aggregator-ca" I0203 11:12:20.898637 5853 keypair.go:195] Issuing new certificate: "etcd-manager-ca-main" I0203 11:12:20.918778 5853 keypair.go:195] Issuing new certificate: "etcd-peers-ca-events" I0203 11:12:21.060766 5853 keypair.go:195] Issuing new certificate: "etcd-manager-ca-events" I0203 11:12:21.080691 5853 keypair.go:195] Issuing new certificate: "etcd-peers-ca-main" I0203 11:12:23.321651 5853 executor.go:111] Tasks: 44 done / 79 total; 15 can run I0203 11:12:24.110323 5853 executor.go:111] Tasks: 59 done / 79 total; 18 can run I0203 11:12:24.445299 5853 executor.go:111] Tasks: 77 done / 79 total; 2 can run I0203 11:12:25.265486 5853 executor.go:137] Task "AutoscalingGroup/nodes-us-east-2c.dev.k8s.sajeer.in" not ready: waiting for the IAM Instance Profile to be propagated I0203 11:12:25.265708 5853 executor.go:137] Task "AutoscalingGroup/master-us-east-2c.masters.dev.k8s.sajeer.in" not ready: waiting for the IAM Instance Profile to be propagated I0203 11:12:25.265868 5853 executor.go:155] No progress made, sleeping before retrying 2 task(s) I0203 11:12:35.266209 5853 executor.go:111] Tasks: 77 done / 79 total; 2 can run I0203 11:12:36.716786 5853 executor.go:111] Tasks: 79 done / 79 total; 0 can run I0203 11:12:36.716989 5853 dns.go:156] Pre-creating DNS records I0203 11:12:37.170574 5853 update_cluster.go:313] Exporting kubecfg for cluster panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x2c56de4]

goroutine 1 [running]: k8s.io/kops/upup/pkg/fi/cloudup/awstasks.FindElasticLoadBalancerByNameTag(0x476a180, 0xc000972600, 0xc0004afb00, 0xc000de7da0, 0x11, 0xc00121db80, 0x15) upup/pkg/fi/cloudup/awstasks/dnsname.go:156 +0x84 k8s.io/kops/upup/pkg/fi/cloudup/awstasks.FindDNSName(0x476a180, 0xc000972600, 0xc0004afb00, 0x476a180, 0xc000972600, 0x38501, 0xc000e8d6c8) upup/pkg/fi/cloudup/awstasks/dnsname.go:135 +0x90 k8s.io/kops/pkg/commands.(CloudDiscoveryStatusStore).GetApiIngressStatus(0x660ae18, 0xc0004afb00, 0xa, 0xc000de7e00, 0x15, 0xc000e8d790, 0x16) pkg/commands/status_discovery.go:56 +0x10e k8s.io/kops/pkg/kubeconfig.BuildKubecfg(0xc0004afb00, 0x46d3c80, 0xc000e0e930, 0x472d8e0, 0xc000e11160, 0x469a980, 0x660ae18, 0x3aef6cfb4000, 0x0, 0x0, ...) pkg/kubeconfig/create_kubecfg.go:74 +0x258 main.RunUpdateCluster(0x46e69c0, 0xc0000540d0, 0xc00057ece0, 0x7ffdd78327ee, 0x11, 0x4695560, 0xc00000e020, 0xc0003e06e0, 0x5, 0x4131974, ...) cmd/kops/update_cluster.go:317 +0x17d8 main.NewCmdUpdateCluster.func1(0xc0002d42c0, 0xc000548cc0, 0x0, 0x4) cmd/kops/update_cluster.go:113 +0x109 k8s.io/kops/vendor/github.com/spf13/cobra.(Command).execute(0xc0002d42c0, 0xc000548c40, 0x4, 0x4, 0xc0002d42c0, 0xc000548c40) vendor/github.com/spf13/cobra/command.go:846 +0x2c2 k8s.io/kops/vendor/github.com/spf13/cobra.(Command).ExecuteC(0x65bfe40, 0x660ae18, 0x0, 0x0) vendor/github.com/spf13/cobra/command.go:950 +0x375 k8s.io/kops/vendor/github.com/spf13/cobra.(Command).Execute(...) vendor/github.com/spf13/cobra/command.go:887 main.Execute() cmd/kops/root.go:97 +0x8f main.main() cmd/kops/main.go:24 +0x25 root@ip-:/home/ubuntu# kops export kubecfg --admin panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x2c56de4]

goroutine 1 [running]: k8s.io/kops/upup/pkg/fi/cloudup/awstasks.FindElasticLoadBalancerByNameTag(0x476a180, 0xc000e34d80, 0xc0002cdb00, 0xc000e20c20, 0x11, 0xc0002d6160, 0x15) upup/pkg/fi/cloudup/awstasks/dnsname.go:156 +0x84 k8s.io/kops/upup/pkg/fi/cloudup/awstasks.FindDNSName(0x476a180, 0xc000e34d80, 0xc0002cdb00, 0x476a180, 0xc000e34d80, 0x1, 0xc000aff810) upup/pkg/fi/cloudup/awstasks/dnsname.go:135 +0x90 k8s.io/kops/pkg/commands.(CloudDiscoveryStatusStore).GetApiIngressStatus(0x660ae18, 0xc0002cdb00, 0xa, 0xc000e20d00, 0x15, 0xc000aff8d8, 0x16) pkg/commands/status_discovery.go:56 +0x10e k8s.io/kops/pkg/kubeconfig.BuildKubecfg(0xc0002cdb00, 0x46d3c80, 0xc000e284e0, 0x472d8e0, 0xc000e25dc0, 0x469a980, 0x660ae18, 0x3aef6cfb4000, 0x0, 0x0, ...) pkg/kubeconfig/create_kubecfg.go:74 +0x258 main.RunExportKubecfg(0x46e69c0, 0xc0000540d0, 0xc0005a9ce0, 0x4695560, 0xc00000e020, 0xc000486300, 0xc00021f390, 0x0, 0x1, 0x0, ...) cmd/kops/export_kubecfg.go:142 +0x412 main.NewCmdExportKubecfg.func1(0xc00028a840, 0xc00021f390, 0x0, 0x1) cmd/kops/export_kubecfg.go:78 +0x85 k8s.io/kops/vendor/github.com/spf13/cobra.(Command).execute(0xc00028a840, 0xc00021f380, 0x1, 0x1, 0xc00028a840, 0xc00021f380) vendor/github.com/spf13/cobra/command.go:846 +0x2c2 k8s.io/kops/vendor/github.com/spf13/cobra.(Command).ExecuteC(0x65bfe40, 0x660ae18, 0x0, 0x0) vendor/github.com/spf13/cobra/command.go:950 +0x375 k8s.io/kops/vendor/github.com/spf13/cobra.(Command).Execute(...) vendor/github.com/spf13/cobra/command.go:887 main.Execute() cmd/kops/root.go:97 +0x8f main.main() cmd/kops/main.go:24 +0x25 root@ip-:/home/ubuntu# kops validate cluster Validating cluster dev.k8s.sajeer.in

Validation failed: cannot load kubecfg settings for "dev.k8s.sajeer.in": context "dev.k8s.sajeer.in" does not exist root@ip-:/home/ubuntu# kops validate cluster Validating cluster dev.k8s.sajeer.in

rifelpet commented 3 years ago

@nambyats this should be fixed for an upcoming 1.19.1 release.

If you're able to confirm this that would be great. There is a linux amd64 kops build here you could use, or you could build the kops CLI yourself from the release-1.19 branch.

nambyats commented 3 years ago

@rifelpet ,

I am on Version 1.19.0 (git-04d36d7d92c72601efd918877fc180c846129ffb). Can you guide me on how I can transfer the build file to AWS instance to install it?

Ebdulmomen1 commented 3 years ago

i encountered the same problem, after upgrading to 1.19.1, i can confirm that it works.

olemarkus commented 3 years ago

Thanks for confirming.

/close

k8s-ci-robot commented 3 years ago

@olemarkus: Closing this issue.

In response to [this](https://github.com/kubernetes/kops/issues/10717#issuecomment-815519520): >Thanks for confirming. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
pradee6 commented 3 months ago

Validating cluster pradeepmatthiasawscloud.shop

W0826 14:59:38.531013 15994 validate_cluster.go:184] (will retry): unexpected error during validation: error listing nodes: Get "https://api.pradeepmatthiasawscloud.shop/api/v1/nodes": dial tcp 13.36.237.0:443: i/o timeout W0826 15:00:18.541420 15994 validate_cluster.go:184] (will retry): unexpected error during validation: error listing nodes: Get "https://api.pradeepmatthiasawscloud.shop/api/v1/nodes": dial tcp 13.36.237.0:443: i/o timeout

iam getting this error when run the command kops validate cluster --10m but all resources are creted in aws

pradee6 commented 3 months ago

any one help for this