Closed bowqtr closed 3 years ago
Please try using a newer AMI as the one you are using is quite old: https://kops.sigs.k8s.io/operations/images/#ubuntu-2004-focal
Hi, and thank you for assisting; I very much appreciate all the help I can get as this is very much out of my comfort zone... :(
Iβve updated to the ami-0c6b46c1a4827505d which is k8s-1.16-debian-buster-amd64-hvm-ebs-2020-04-27
But still get the same issueβ¦
Feb 21 14:03:09 ip-172-20-33-177 nodeup[625]: W0221 14:03:09.843445 625 main.go:138] got error running nodeup (will retry in 30s): error building loader: error finding containerd version
Really really grateful if somebody could let me know what the problem is...
I think docker has a problem as when I run docker version I get this....
admin@ip-xxxxx:~$ docker version Client: Docker Engine - Community Version: 19.03.4 API version: 1.40 Go version: go1.12.10 Git commit: 9013bf583a Built: Fri Oct 18 15:52:16 2019 OS/Arch: linux/amd64 Experimental: false Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? admin@ip-xxxx:~$
admin@ip-xxxxx:~$ docker system info Client: Debug Mode: false
Server: ERROR: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? errors pretty printing info admin@ip-xxxxx:~$
But I can't tell why the docker daemon is not running...or if it should be :(
In my cloud-init-output.log cloud-init-output.log
Syslog file syslog.txt
Hi. I meant an official Debian or Ubuntu image, not kope.io one. As of version 1.18, kOps recommends using Ubuntu 20.04.
Ok, will try an one like ami-0de4bebf376ac28f1 and see what happens.
Thanks,
Nick
On 21/02/2021, 16:06, Ciprian Hacman notifications@github.com wrote:
Hi. I meant an official Debian or Ubuntu image, not kope.io one. As of version 1.18, kOps recommends using Ubuntu 20.04.
β You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
π all,
I've just encountered this same issue this morning.
I'm running kops 1.18.3 to run k8s on AWS with the SpotInst integration. My master node is using the image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20210119.1
ami.
My cluster config looks like this
apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
creationTimestamp: "2020-01-06T09:23:14Z"
generation: 60
name: "****"
spec:
additionalPolicies:
master: |
"****"
node: |
"****"
api:
loadBalancer:
sslCertificate: "****"
type: Internal
authentication:
aws: {}
authorization:
rbac: {}
channel: stable
cloudConfig:
spotinstOrientation: balanced
spotinstProduct: Linux/UNIX (Amazon VPC)
cloudProvider: aws
configBase: "****"
dnsZone: "****"
encryptionConfig: true
etcdClusters:
- cpuRequest: 200m
etcdMembers:
- instanceGroup: master-eu-west-1a
name: a
memoryRequest: 100Mi
name: main
- cpuRequest: 100m
etcdMembers:
- instanceGroup: master-eu-west-1a
name: a
memoryRequest: 100Mi
name: events
fileAssets:
- content: |
"****"
name: aws-encryption-provider.yaml
path: /etc/kubernetes/manifests/aws-encryption-provider.yaml
roles:
- Master
iam:
allowContainerRegistry: true
legacy: false
kubeAPIServer:
authorizationMode: RBAC,Node
disableBasicAuth: true
enableAdmissionPlugins:
- NamespaceLifecycle
- LimitRanger
- ServiceAccount
- PersistentVolumeLabel
- DefaultStorageClass
- DefaultTolerationSeconds
- MutatingAdmissionWebhook
- ValidatingAdmissionWebhook
- NodeRestriction
- ResourceQuota
- PodSecurityPolicy
enableProfiling: false
featureGates:
TTLAfterFinished: "true"
kubeControllerManager:
featureGates:
TTLAfterFinished: "true"
kubeDNS:
provider: CoreDNS
kubelet:
anonymousAuth: false
kubernetesApiAccess:
- "****"
- "****"
kubernetesVersion: 1.18.15
masterInternalName: "****"
masterPublicName: "****"
networkCIDR: "****"
networkID: "****"
networking:
calico:
mtu: 8912
nonMasqueradeCIDR: "****"
sshAccess:
- "****"
subnets:
- cidr: 172.30.40.0/24
name: eu-west-1a
type: Private
zone: eu-west-1a
- cidr: 172.30.41.0/24
name: utility-eu-west-1a
type: Utility
zone: eu-west-1a
- cidr: 172.30.36.0/24
name: eu-west-1b
type: Private
zone: eu-west-1b
- cidr: 172.30.37.0/24
name: utility-eu-west-1b
type: Utility
zone: eu-west-1b
- cidr: 172.30.38.0/24
name: eu-west-1c
type: Private
zone: eu-west-1c
- cidr: 172.30.39.0/24
name: utility-eu-west-1c
type: Utility
zone: eu-west-1c
topology:
dns:
type: Public
masters: private
nodes: private
I was getting the same got error running nodeup (will retry in 30s): error building loader: error finding containerd version
preventing our dev cluster from spinning up.
After doing a systemctl restart kops-configuration / master node reboot, that seems to of resolved the issue for now so my cluster can spin in. Hopefully this extra info might help.
Ta
Hmm - I don't have systemctl on my master node in AWS or my local mac. How can I get that on?
Running locally β ~ kops version Version 1.19.0 β ~ kubectl version Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.6", GitCommit:"dff82dc0de47299ab66c83c626e08b245ab19037", GitTreeState:"clean", BuildDate:"2020-07-15T16:58:53Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"darwin/amd64"} Unable to connect to the server: dial tcp 34.251.103.65:443: i/o timeout
Sorry if a "newbie" question.....much of this detail is new to me!
Sorry manage to find a version of this on my node, but when I ran I got this...
admin@ip-172-20-xx-xx:/bin$ ./systemctl restart kops-configuration Failed to connect to bus: No such file or directory admin@ip-172-20-xx-xx:/bin$
Wondering if this is connected to some issue with docker....
admin@ip-172-20-xx-xx:/bin$ docker version Client: Version: 18.06.3-ce API version: 1.38 Go version: go1.10.3 Git commit: d7080c1 Built: Wed Feb 20 02:28:26 2019 OS/Arch: linux/amd64 Experimental: false Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.38/version: dial unix /var/run/docker.sock: connect: permission denied admin@ip-172-20-xx-xx:/bin$
When I FTP into the EC2 instance I can see a folder /var/run/docker.sock
admin@ip-172-20-xx-xx:/var/run$ ls -l total 44 -rw-r--r-- 1 root root 4 Feb 22 14:46 acpid.pid srw-rw-rw- 1 root root 0 Feb 22 14:46 acpid.socket -rw------- 1 root root 0 Feb 22 14:47 agetty.reload drwxr-xr-x 2 root root 80 Feb 22 14:46 blkid drwxr-xr-x 2 root root 140 Feb 22 14:47 cloud-init -rw-r--r-- 1 root root 4 Feb 22 14:46 crond.pid ---------- 1 root root 0 Feb 22 14:46 crond.reboot -rw-r--r-- 1 root root 4 Feb 22 14:46 dhclient.eth0.pid prw------- 1 root root 0 Feb 22 14:46 dmeventd-client prw------- 1 root root 0 Feb 22 14:46 dmeventd-server drwx------ 6 root root 140 Feb 22 14:47 docker -rw-r--r-- 1 root root 3 Feb 22 14:47 docker.pid srw-rw---- 1 root docker 0 Feb 22 14:46 docker.sock lrwxrwxrwx 1 root root 25 Feb 22 14:46 initctl -> /run/systemd/initctl/fifo drwxr-xr-x 2 root root 80 Feb 22 14:46 initramfs drwxrwxrwt 4 root root 100 Feb 22 14:46 lock drwxr-xr-x 2 root root 40 Feb 22 14:46 log drwx------ 2 root root 80 Feb 22 14:46 lvm -rw-r--r-- 1 root root 4 Feb 22 14:46 lvmetad.pid -r--r--r-- 1 root root 33 Feb 22 14:46 machine-id -rw-r--r-- 1 root root 80 Feb 22 16:13 motd.dynamic drwxr-xr-x 2 root root 60 Feb 22 14:46 mount drwxr-xr-x 2 root root 120 Feb 22 14:46 network -rw-r--r-- 1 root root 3 Feb 22 14:46 ntpd.pid drwxr-xr-x 2 root root 40 Feb 22 14:46 rpcbind -r--r--r-- 1 root root 0 Feb 22 14:46 rpcbind.lock srw-rw-rw- 1 root root 0 Feb 22 14:46 rpcbind.sock dr-xr-xr-x 11 root root 0 Feb 22 14:46 rpc_pipefs -rw-r--r-- 1 root root 3 Feb 22 14:46 rsyslogd.pid drwxrwxr-x 2 root utmp 40 Feb 22 14:46 screen drwxr-xr-x 2 root root 40 Feb 22 14:46 sendsigs.omit.d lrwxrwxrwx 1 root root 8 Feb 22 14:46 shm -> /dev/shm drwxr-xr-x 2 root root 40 Feb 22 14:46 sshd -rw-r--r-- 1 root root 4 Feb 22 14:46 sshd.pid drwxr-xr-x 2 root root 60 Feb 22 14:46 sysconfig drwxr-xr-x 17 root root 440 Feb 22 14:47 systemd drwxr-xr-x 2 root root 60 Feb 22 14:46 tmpfiles.d drwxr-xr-x 7 root root 160 Feb 22 14:47 udev drwxr-xr-x 2 root root 40 Feb 22 14:46 user -rw-rw-r-- 1 root utmp 3840 Feb 22 15:49 utmp -rw------- 1 root root 0 Feb 22 14:47 xtables.lock
Appreciate any other guidance
To understand what happens, would need a log from the kops-configuration service, but using the AMI that @omg-sr was using 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20210119.1
.
Please use kOps v1.19.1 when trying.
ok let me try that @hakman, I've just changed my AMI to k8s-1.16-debian-buster-amd64-hvm-ebs-2020-04-27 - ami-0c6b46c1a4827505d as I hoped it would be a smaller change I still got the same problem
Feb 22 16:54:15 ip-172-20-63-192 nodeup[615]: W0222 16:54:15.745175 615 main.go:138] got error running nodeup (will retry in 30s): error building loader: error finding containerd version
admin@ip-172-20-63-192:/bin$ ./systemctl restart kops-configuration Failed to connect to bus: No such file or directory admin@ip-172-20-63-192:/bin$ docker version Client: Docker Engine - Community Version: 19.03.4 API version: 1.40 Go version: go1.12.10 Git commit: 9013bf583a Built: Fri Oct 18 15:52:16 2019 OS/Arch: linux/amd64 Experimental: false Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? admin@ip-172-20-63-192:/bin$
I'll now try the Ubuntu image and see what happens
Ok have an instance running
ubuntu-eks/k8s_1.18/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20210119.1 - ami-0c7de20ac40893f86 Canonical, Ubuntu EKS Node OS (k8s_1.18), 20.04 LTS, amd64 focal image build on 2021-01-19 Root device type: ebs Virtualization type: hvm ENA Enabled: Yes
....in the end I see the same error: Feb 22 20:23:42 ip-172-20-61-25 nodeup[2500]: W0222 20:23:42.539148 2500 main.go:138] got error running nodeup (will retry in 30s): error building loader: error finding containerd version
When SSH in and type a few things:
ubuntu@ip-172-20-61-25:/bin$ docker version Client: Version: 19.03.8 API version: 1.40 Go version: go1.13.8 Git commit: afacb8b7f0 Built: Fri Dec 18 12:15:19 2020 OS/Arch: linux/amd64 Experimental: false Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.40/version: dial unix /var/run/docker.sock: connect: permission denied ubuntu@ip-172-20-61-25:/bin$
ubuntu@ip-172-20-61-25:/bin$ ./systemctl restart kops-configuration Failed to restart kops-configuration.service: Access denied See system logs and 'systemctl status kops-configuration.service' for details.
ubuntu@ip-172-20-61-25:/bin$ ./systemctl status kops-configuration.service gives.... β kops-configuration.service - Run kops bootstrap (nodeup) Loaded: loaded (/lib/systemd/system/kops-configuration.service; disabled; vendor preset: enabled) Active: activating (start) since Mon 2021-02-22 20:11:15 UTC; 12min ago Docs: https://github.com/kubernetes/kops Main PID: 2500 (nodeup) Tasks: 7 (limit: 9538) Memory: 276.3M CGroup: /system.slice/kops-configuration.service ββ2500 /opt/kops/bin/nodeup --conf=/opt/kops/conf/kube_env.yaml --v=8
Feb 22 20:23:42 ip-172-20-61-25 nodeup[2500]: I0222 20:23:42.539037 2500 task.go:97] task nodetasks.Package does not implement HasLifecycle Feb 22 20:23:42 ip-172-20-61-25 nodeup[2500]: I0222 20:23:42.539046 2500 task.go:97] task nodetasks.Package does not implement HasLifecycle Feb 22 20:23:42 ip-172-20-61-25 nodeup[2500]: I0222 20:23:42.539058 2500 task.go:97] task nodetasks.Package does not implement HasLifecycle Feb 22 20:23:42 ip-172-20-61-25 nodeup[2500]: I0222 20:23:42.539068 2500 task.go:97] task nodetasks.Package does not implement HasLifecycle Feb 22 20:23:42 ip-172-20-61-25 nodeup[2500]: I0222 20:23:42.539077 2500 task.go:97] task nodetasks.Package does not implement HasLifecycle Feb 22 20:23:42 ip-172-20-61-25 nodeup[2500]: I0222 20:23:42.539090 2500 update_service.go:42] UpdatePolicy not set in Cluster Spec; skipping creation of update-service Feb 22 20:23:42 ip-172-20-61-25 nodeup[2500]: I0222 20:23:42.539103 2500 volumes.go:40] Skipping the volume builder, no volumes defined for this instancegroup Feb 22 20:23:42 ip-172-20-61-25 nodeup[2500]: I0222 20:23:42.539114 2500 task.go:97] task nodetasks.File does not implement HasLifecycle Feb 22 20:23:42 ip-172-20-61-25 nodeup[2500]: I0222 20:23:42.539124 2500 task.go:97] task *nodetasks.File does not implement HasLifecycle Feb 22 20:23:42 ip-172-20-61-25 nodeup[2500]: W0222 20:23:42.539148 2500 main.go:138] got error running nodeup (will retry in 30s): error building loader: error finding containerd version
Is there anything else which could be causing this which we should be looking at? Maybe some other YAML file?
Thanks in advance :)
I can see that there is no file under the "containerd" folder - is that normal?
This is also my kube_env.yaml
Assets:
- 2ca2a3104d4cce26db128e3a0b7a042385df4f2c51bdbe740e067fdfaa2fcdd1@https://storage.googleapis.com/kubernetes-release/release/v1.19.1/bin/linux/amd64/kubelet
- da4de99d4e713ba0c0a5ef6efe1806fb09c41937968ad9da5c5f74b79b3b38f5@https://storage.googleapis.com/kubernetes-release/release/v1.19.1/bin/linux/amd64/kubectl
- 994fbfcdbb2eedcfa87e48d8edb9bb365f4e2747a7e47658482556c12fd9b2f5@https://storage.googleapis.com/k8s-artifacts-cni/release/v0.8.6/cni-plugins-linux-amd64-v0.8.6.tgz
ClusterName: k8s-aws.engineering.acme.com
ConfigBase: s3://acme-kops-state/k8s-aws.engineering.acme.com
InstanceGroupName: master-eu-west-1a
Tags:
- _automatic_upgrades
- _aws
channels:
- s3://acme-kops-state/k8s-aws.engineering.acme.com/addons/bootstrap-channel.yaml
etcdManifests:
- s3://acme-kops-state/k8s-aws.engineering.acme.com/manifests/etcd/main.yaml
- s3://acme-kops-state/k8s-aws.engineering.acme.com/manifests/etcd/events.yaml
protokubeImage:
hash: d89a4222576a8396a65251ca93537107a9698f4eb0608b4e10e0d33e2f9a8766
name: protokube:1.18.2
sources:
- https://artifacts.k8s.io/binaries/kops/1.18.2/images/protokube.tar.gz
- https://github.com/kubernetes/kops/releases/download/v1.18.2/images-protokube.tar.gz
- https://kubeupv2.s3.amazonaws.com/kops/1.18.2/images/protokube.tar.gz
staticManifests:
- key: kube-apiserver-healthcheck
path: manifests/static/kube-apiserver-healthcheck.yaml
Woudl it be helpful to see others?
From the file you pasted above, I can see that you are using kOps v1.18.2 and install kubernetes 1.19.1, which is not supported. You should download and use the kOps 1.19.1 binary from https://github.com/kubernetes/kops/releases/tag/v1.19.1. To upgrade the cluster you should follow: https://kops.sigs.k8s.io/operations/updates_and_upgrades/#automated-update
Thanks @hakman I think we're making some progress; as I newbie I appreciate you're help and my eyes are understand a lot more than I'm seeing.
It still doesn't work, but I now think its a config issue that needs to change with 1.19.1 I now get this...
ubuntu@ip-172-20-58-38:~$ docker version Client: Version: 19.03.8 API version: 1.40 Go version: go1.13.8 Git commit: afacb8b7f0 Built: Fri Dec 18 12:15:19 2020 OS/Arch: linux/amd64 Experimental: false Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.40/version: dial unix /var/run/docker.sock: connect: permission denied ubuntu@ip-172-20-58-38:~$ cd /bin ubuntu@ip-172-20-58-38:/bin$ ./systemctl restart kops-configuration Failed to restart kops-configuration.service: Access denied See system logs and 'systemctl status kops-configuration.service' for details. ubuntu@ip-172-20-58-38:/bin$ ./systemctl status kops-configuration.service β kops-configuration.service - Run kops bootstrap (nodeup) Loaded: loaded (/lib/systemd/system/kops-configuration.service; disabled; vendor preset: enabled) Active: activating (start) since Tue 2021-02-23 10:44:50 UTC; 53s ago Docs: https://github.com/kubernetes/kops Main PID: 2513 (nodeup) Tasks: 6 (limit: 9538) Memory: 272.9M CGroup: /system.slice/kops-configuration.service ββ2513 /opt/kops/bin/nodeup --conf=/opt/kops/conf/kube_env.yaml --v=8
Feb 23 10:45:27 ip-172-20-58-38 nodeup[2513]: [Socket] Feb 23 10:45:27 ip-172-20-58-38 nodeup[2513]: ListenStream=/var/run/docker.sock Feb 23 10:45:27 ip-172-20-58-38 nodeup[2513]: SocketMode=0660 Feb 23 10:45:27 ip-172-20-58-38 nodeup[2513]: SocketUser=root Feb 23 10:45:27 ip-172-20-58-38 nodeup[2513]: SocketGroup=docker Feb 23 10:45:27 ip-172-20-58-38 nodeup[2513]: [Install] Feb 23 10:45:27 ip-172-20-58-38 nodeup[2513]: WantedBy=sockets.target Feb 23 10:45:27 ip-172-20-58-38 nodeup[2513]: I0223 10:45:27.887292 2513 task.go:103] task *nodetasks.Service does not implement HasLifecycle Feb 23 10:45:27 ip-172-20-58-38 nodeup[2513]: I0223 10:45:27.887323 2513 assetstore.go:106] Matching assets for "^docker/": Feb 23 10:45:27 ip-172-20-58-38 nodeup[2513]: W0223 10:45:27.887351 2513 main.go:133] got error running nodeup (will retry in 30s): error building loader: unable to find any Docker binaries in assets ubuntu@ip-172-20-58-38:/bin$
This is my EC2 user data launch script user data EC2 launch.txt
I think this is getting closer, but although I've changed the kube_eml to add the new amd64 bits. this docker thing is confusing me. Is there an example as to what it should be for 1.19.1?
Many thanks in advance....sincerely appreciate the guidance.
No worries. Please add the output of kops get --name my.example.com -o yaml
.
Hi @hakman - attached the output; I know the instancegroups are wrong and need changing, was planning on doing that once I know I have a config for the main node to get up and running.
Many thanks!
Your user data doesn't match the cluster version, it does just partially. I am not sure how you ended up in that state. How did you upgrade the cluster?
Hi @hakman
kops edit cluster Can I make changes manually to fix?
I'm less clear on how to make changes through "kops edit ig" etc
....but happy to try and make them, so long as I know what to make.
Did you ever run something like this or similar?
kops upgrade cluster --yes
kops update cluster --yes
kops rolling-update cluster --instance-group master-eu-west-1a --cloudonly --yes
Kinda, my usual upgrade process was
kops edit cluster
kops update cluster βyes
kops rolling-update cluster
kops rolling-update cluster βyes
This may have been only partially correct, but that what I was handed over from the guy that left.
The rolling-update part no longer works as the cluster is down (guessing that's obvious)
Thanks
when I do a kops upgrade cluster, I now get this
β ~ kops upgrade cluster Using cluster from kubectl context: k8s.staging.acme.com
I0223 11:43:20.350760 65415 upgrade_cluster.go:197] Custom image (099720109477/ubuntu-eks/k8s_1.19/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20210211) has been provided for Instance Group "master-eu-west-1a"; not updating image
ITEM PROPERTY OLD NEW
Cluster KubernetesVersion 1.19.1 1.19.7
InstanceGroup/large-nodes Image kope.io/k8s-1.12-debian-stretch-amd64-hvm-ebs-2019-06-21 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20210119.1
InstanceGroup/nodes Image kope.io/k8s-1.16-debian-stretch-amd64-hvm-ebs-2020-01-17 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20210119.1
Must specify --yes to perform upgrade β ~
Should I do this with the --yes?
Look like this would update the IG which I have for small and large worker nodes, but would it solve the Docker issue in the main node?
Upgrade cluster is not needed now, but at least update cluster should tell you that there are changes that need to be applied.
Rolling update with --cloudonly
should work.
Ok, ran that and got this
β ~ kops rolling-update cluster --cloudonly
Using cluster from kubectl context: acme.com
NAME STATUS NEEDUPDATE READY MIN TARGET MAX
large-nodes Ready 0 0 0 0 3
master-eu-west-1a Ready 0 1 0 1 1
nodes Ready 0 0 0 0 3
No rolling-update required.
β ~
update cluster
not rolling-update cluster
. :)
Hmm, something is amiss
~ kops update cluster --cloudonly
Error: unknown flag: --cloudonly
Usage:
kops update cluster [flags]
Examples:
# After cluster has been edited or upgraded, configure it with:
kops update cluster k8s-cluster.example.com --yes --state=s3://my-state-store --yes --admin
Flags:
--admin duration[=18h0m0s] Also export a cluster admin user credential with the specified lifetime and add it to the cluster context
--allow-kops-downgrade Allow an older version of kops to update the cluster than last used
--create-kube-config Will control automatically creating the kube config file on your local filesystem (default true)
-h, --help help for cluster
--internal Use the cluster's internal DNS name. Implies --create-kube-config
--lifecycle-overrides strings comma separated list of phase overrides, example: SecurityGroups=Ignore,InternetGateway=ExistsAndWarnIfChanges
--out string Path to write any local output
--phase string Subset of tasks to run: assets, cluster, network, security
--ssh-public-key string SSH public key to use (deprecated: use kops create secret instead)
--target string Target - direct, terraform, cloudformation (default "direct")
--user string Existing user to add to the cluster context. Implies --create-kube-config
-y, --yes Create cloud resources, without --yes update is in dry run mode
Global Flags:
--add_dir_header If true, adds the file directory to the header of the log messages
--alsologtostderr log to standard error as well as files
--config string yaml config file (default is $HOME/.kops.yaml)
--log_backtrace_at traceLocation when logging hits line file:N, emit a stack trace (default :0)
--log_dir string If non-empty, write log files in this directory
--log_file string If non-empty, use this log file
--log_file_max_size uint Defines the maximum size a log file can grow to. Unit is megabytes. If the value is 0, the maximum file size is unlimited. (default 1800)
--logtostderr log to standard error instead of files (default true)
--name string Name of cluster. Overrides KOPS_CLUSTER_NAME environment variable
--skip_headers If true, avoid header prefixes in the log messages
--skip_log_headers If true, avoid headers when opening log files
--state string Location of state storage (kops 'config' file). Overrides KOPS_STATE_STORE environment variable
--stderrthreshold severity logs at or above this threshold go to stderr (default 2)
-v, --v Level number for the log level verbosity
--vmodule moduleSpec comma-separated list of pattern=N settings for file-filtered logging
unknown flag: --cloudonly
I did this, kops update cluster, should I now repeat with --yes?
~ kops update cluster
Using cluster from kubectl context: k8s.staging.acme.com
*********************************************************************************
A new kubernetes version is available: 1.19.7
Upgrading is recommended (try kops upgrade cluster)
More information: https://github.com/kubernetes/kops/blob/master/permalinks/upgrade_k8s.md#1.19.7
*********************************************************************************
*********************************************************************************
Kubelet anonymousAuth is currently turned on. This allows RBAC escalation and remote code execution possibilities.
It is highly recommended you turn it off by setting 'spec.kubelet.anonymousAuth' to 'false' via 'kops edit cluster'
See https://kops.sigs.k8s.io/security/#kubelet-api
*********************************************************************************
I0223 12:14:51.915048 66533 executor.go:111] Tasks: 0 done / 86 total; 46 can run
I0223 12:14:52.406811 66533 executor.go:111] Tasks: 46 done / 86 total; 18 can run
I0223 12:14:52.791025 66533 executor.go:111] Tasks: 64 done / 86 total; 19 can run
I0223 12:14:53.385868 66533 executor.go:111] Tasks: 83 done / 86 total; 3 can run
I0223 12:14:53.616314 66533 executor.go:111] Tasks: 86 done / 86 total; 0 can run
Will modify resources:
AutoscalingGroup/large-nodes.k8s.staging.acme.com
MaxSize 3 -> 2
MinSize 0 -> 2
AutoscalingGroup/master-eu-west-1a.masters.k8s.staging.acme.com
LaunchTemplate <nil> -> name:master-eu-west-1a.masters.k8s.staging.acme.com id:lt-0dd26de2bb906acb6
MinSize 0 -> 1
AutoscalingGroup/nodes.k8s.staging.acme.com
MaxSize 3 -> 2
MinSize 0 -> 2
LaunchTemplate/master-eu-west-1a.masters.k8s.staging.acme.com
ImageID ami-0c7de20ac40893f86 -> 099720109477/ubuntu-eks/k8s_1.19/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20210211
Must specify --yes to apply changes
β ~
Yes and after rolling update.
kops update cluster --yes
kops rolling-update cluster --cloudonly --yes
Hi @hakman
Ok got this...just waiting for my new IP to propagate to see if I can connect using LENS...
β ~ kops update cluster --yes
Using cluster from kubectl context: k8s.staging.acme.com
*********************************************************************************
A new kubernetes version is available: 1.19.7
Upgrading is recommended (try kops upgrade cluster)
More information: https://github.com/kubernetes/kops/blob/master/permalinks/upgrade_k8s.md#1.19.7
*********************************************************************************
*********************************************************************************
Kubelet anonymousAuth is currently turned on. This allows RBAC escalation and remote code execution possibilities.
It is highly recommended you turn it off by setting 'spec.kubelet.anonymousAuth' to 'false' via 'kops edit cluster'
See https://kops.sigs.k8s.io/security/#kubelet-api
*********************************************************************************
I0223 12:41:52.833248 67376 executor.go:111] Tasks: 0 done / 86 total; 46 can run
I0223 12:41:53.312642 67376 executor.go:111] Tasks: 46 done / 86 total; 18 can run
I0223 12:41:53.749426 67376 executor.go:111] Tasks: 64 done / 86 total; 19 can run
I0223 12:41:54.525779 67376 executor.go:111] Tasks: 83 done / 86 total; 3 can run
I0223 12:41:55.869205 67376 executor.go:111] Tasks: 86 done / 86 total; 0 can run
I0223 12:41:55.869891 67376 dns.go:156] Pre-creating DNS records
I0223 12:41:56.724568 67376 update_cluster.go:313] Exporting kubecfg for cluster
kops has set your kubectl context to k8s.staging.acme.com
W0223 12:41:56.888620 67376 update_cluster.go:337] Exported kubecfg with no user authentication; use --admin, --user or --auth-plugin flags with `kops export kubecfg`
Cluster changes have been applied to the cloud.
Changes may require instances to restart: kops rolling-update cluster
β ~ kops rolling-update cluster --cloudonly --yes
Using cluster from kubectl context: k8s.staging.acme.com
NAME STATUS NEEDUPDATE READY MIN TARGET MAX
large-nodes Ready 0 2 2 2 2
master-eu-west-1a NeedsUpdate 1 0 1 1 1
nodes Ready 0 2 2 2 2
W0223 12:42:29.127913 67405 instancegroups.go:415] Not validating cluster as cloudonly flag is set.
W0223 12:42:29.128068 67405 instancegroups.go:341] Not draining cluster nodes as 'cloudonly' flag is set.
I0223 12:42:29.128079 67405 instancegroups.go:521] Stopping instance "i-0fcb8c5c999ef60e3", in group "master-eu-west-1a.masters.k8s.staging.acme.com" (this may take a while).
I0223 12:42:29.252741 67405 instancegroups.go:383] waiting for 15s after terminating instance
W0223 12:42:44.254679 67405 instancegroups.go:415] Not validating cluster as cloudonly flag is set.
I0223 12:42:44.254758 67405 rollingupdate.go:208] Rolling update completed for cluster "k8s.staging.acme.com"!
Ok - I can't connect using LENS.
I also can't login as it used an older launch config in the autoscaling group; it has changed this launch config back to the original which has the AMI has been updated, but the key pair is the old key pair.
I have created a new launch config with the new AMI and correct key pair, call MAIN NODE (UBUNTU 1.19 AMD2) but how do I use kops or kubectl to ensure the auto-scaling group master-eu-west-1a using my new launch config with its new user data launch script?
Thanks
You should really read the docs, just saying. Manual changes will be rewritten by kOps and you should never do manual changes to your launch templates and user data. https://kops.sigs.k8s.io/cluster_spec/ You should use only edit cluster / ig and upgrade, update, rolling-update commands. Also you may try your luck in the Slack channel https://kubernetes.slack.com/messages/kops-users/. Good luck!
I get most of that, I just don't understand how my autoscaling group is linked to a particular launch config, or how I may be able to change that by issuing a new KOPS command. I don't see this listed in any of the config or even that cluster_spec URL, unless I'm so tired, I'm just missing it. Sorry for been a newbie, but grateful if you're able to help.
This should help: https://kops.sigs.k8s.io/tutorial/working-with-instancegroups/
Short version, minSize
and maxSize
control the ASG size. You get an ASG for each instace group.
I read that..but sorry, still lost
I have three instance groups
.ssh kops get ig
Using cluster from kubectl context: k8.acme.com
NAME ROLE MACHINETYPE MIN MAX ZONES
large-nodes Node m4.2xlarge 0 3 eu-west-1a,eu-west-1b,eu-west-1c
master-eu-west-1a Master m4.large 0 1 eu-west-1a
nodes Node m4.xlarge 0 3 eu-west-1a,eu-west-1b,eu-west-1c
β .ssh
And my instance groups have min / max as you can see above But in the instancegroup definition, I don't see a linkage to the launch configuration in the scaling group.
The launch config, contains the nodeup cloud-init for the EC2 instance, plus the SSH key pair. I changed that ... so trying to get the KOPS config to use the new launch configs.
This is where I'm getting lost : And I can't see a reference in the URL https://kops.sigs.k8s.io/tutorial/working-with-instancegroups/ with launch configurations in AWS.
If I change the launch config for the scaling group manually in AWS EC2 to the new launch config for that node, then after running KOPS it returns back to its prior launch config in AWS....so this must be stored somewhere, but I'm unsure where and how to change!? This is what is confusing me, as nothing seems obvious.
There is not link between launch config and IG, kOps manages the launch config, not you. Use kOps to edit the IGs and add the options that you want, which will be translated into a launch config when running an update.
So this is where I'm totally lost...as this is my maste rnode IG config when I type kops edit instancegroup master-eu-west-1a
But in this I see no reference to the launch configuration attached to the auto-scaling group in AWS. Which means that its launching with the old EC2 launch user data config, ssh keys etc
I can change the auto-scaling group manually in the EC2 dashboard, but this changes back, so something is amiss. But because I can't see a reference to this in the IG definition as below, I'm confused.
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: "2018-05-24T14:02:57Z"
generation: 1
labels:
kops.k8s.io/cluster: k8s-staging.acme.com
name: master-eu-west-1a
spec:
image: 099720109477/ubuntu-eks/k8s_1.19/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20210211
machineType: m4.large
maxPrice: "0.10"
maxSize: 1
minSize: 0
nodeLabels:
kops.k8s.io/instancegroup: master-eu-west-1a
role: Master
subnets:
- eu-west-1a
ok, I think I managed to resolve the issue by creating a new template instead of a config
The node still doesn't come up though - says its missing Docker binaries in assets
ubuntu@ip-172-20-34-36:~$ docker version
Client:
Version: 19.03.8
API version: 1.40
Go version: go1.13.8
Git commit: afacb8b7f0
Built: Fri Dec 18 12:15:19 2020
OS/Arch: linux/amd64
Experimental: false
Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.40/version: dial unix /var/run/docker.sock: connect: permission denied
ubuntu@ip-172-20-34-36:~$ cd /bin
ubuntu@ip-172-20-34-36:/bin$ ./systemctl restart kops-configuration
Failed to restart kops-configuration.service: Access denied
See system logs and 'systemctl status kops-configuration.service' for details.
ubuntu@ip-172-20-34-36:/bin$ ./systemctl status kops-configuration.service
β kops-configuration.service - Run kops bootstrap (nodeup)
Loaded: loaded (/lib/systemd/system/kops-configuration.service; disabled; vendor preset: enabled)
Active: activating (start) since Tue 2021-02-23 15:02:52 UTC; 5min ago
Docs: https://github.com/kubernetes/kops
Main PID: 2526 (nodeup)
Tasks: 6 (limit: 9538)
Memory: 274.6M
CGroup: /system.slice/kops-configuration.service
ββ2526 /opt/kops/bin/nodeup --conf=/opt/kops/conf/kube_env.yaml --v=8
Feb 23 15:07:36 ip-172-20-34-36 nodeup[2526]: [Socket]
Feb 23 15:07:36 ip-172-20-34-36 nodeup[2526]: ListenStream=/var/run/docker.sock
Feb 23 15:07:36 ip-172-20-34-36 nodeup[2526]: SocketMode=0660
Feb 23 15:07:36 ip-172-20-34-36 nodeup[2526]: SocketUser=root
Feb 23 15:07:36 ip-172-20-34-36 nodeup[2526]: SocketGroup=docker
Feb 23 15:07:36 ip-172-20-34-36 nodeup[2526]: [Install]
Feb 23 15:07:36 ip-172-20-34-36 nodeup[2526]: WantedBy=sockets.target
Feb 23 15:07:36 ip-172-20-34-36 nodeup[2526]: I0223 15:07:36.559453 2526 task.go:103] task *nodetasks.Service does not implement HasLifecycle
Feb 23 15:07:36 ip-172-20-34-36 nodeup[2526]: I0223 15:07:36.559487 2526 assetstore.go:106] Matching assets for "^docker/":
Feb 23 15:07:36 ip-172-20-34-36 nodeup[2526]: W0223 15:07:36.559516 2526 main.go:133] got error running nodeup (will retry in 30s): error building loader: unable to find any Docker binaries in assets
ubuntu@ip-172-20-34-36:/bin$
Is this because in the new 1.19.1 I need to change the launch script in the EC2 instance?
Attached are the cluster_spec, ig_spec, and kube_env YAML files which are created when the EC2 instance is launched.
I figuring that something in here is causing Docker to have an issue or there some other config problem, as Docker exists as a command to run although has a permissions issue dial unix /var/run/docker.sock: connect: permission denied but pondering whether this is what is causing the whole kubernates to fail.
Any ideas as I'm totally stuck.
From my point of view, this is not a support issue, it is just user error. You are trying to manually change things that you shouldn't touch and you insist doing that. I don't think anyone can help with that. If you want to learn, create a new cluster as described in the docs and you can see how things change when you edit/update/roll the cluster.
Hi @hakman I appreciate that perspective, but nothing had changed - I just tried to bring up the cluster one day and was getting the error. It surprised me! I have another cluster in another region (slightly different config) which still works ok. Ont his cluster in staging which is failing we originally saw this error....
865 main.go:138] got error running nodeup (will retry in 30s): error building loader: error finding containerd version
With you guidance, in changing the AMI image from debian to ubuntu, and then having to change more things to make it 1.19.1 compatible like adding arm64 into the yaml, and running some kops upgrade cluster, the new image spins up, but we still have docker issues with nodeup now its
2526 main.go:133] got error running nodeup (will retry in 30s): error building loader: unable to find any Docker binaries in assets
Not trying to define the error, just appreciate assistance to get it working. Especially if there have to be more changes to ensure docker is working in the new ubuntu AMI by changing the cluster_spec, ig_spec, and kube_env YAML which are specified manually in the EC2 launch instance which is why I included then.
Maybe something is needed in 1.19.x which wasn't needed in a prior version.
The guy that used to manage this has left the company, so I'm all alone. I'm not a kubernates or KOPS expert, hence why asking for support or assistance to help figure our why docker seems to be installed but why NODEUP is reporting error building loader: unable to find any Docker binaries in assets**
Truly any help since I'm not an expert in this would be very much appreciated!
You are using configs form kOps 1.18 with kOps 1.19 binaries. The only way to get there is by manually changing the user data. If you would just run all the steps for an upgrade, you would fix things.
Also, I think I asked for some times to use the ubuntu image that has no bundled Docker. This is where upgrade cluster --yes
would have helped.
FYI, kOps 1.19.1 installs Docker 19.03.15.
I've managed to get it back up and running; a sincere thanks to you and others who helped. I know it can be frustrating to help those who are less familiar but its sincerely appreciated.
THANK YOU!
Happy it all worked out. Hope this helped a little with getting to know k8s and kOps. Would appreciate if you can write a short summary of what was the solution before closing.
Will try too, but I think it was mainly what you said, that I needed to use a different AMI (so choose 099720109477/ubuntu-eks/k8s_1.19/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20210211) and then I have to adjust the user data of the launch script for references to amd64 in the yaml. Then I just let the kops upgrade cluster do it's thing. I then applied that to the main node, and then the worker nodes.
What's the best site to visually learn K8s on AWS for those of us who are happy to dip in, but not so familiar with all to complications? I don't mind learning through failure (its the human way to learn) but having something that assists with knowledge would be great that pretty clear, concise, doesn't assume too much etc. Any recommendations?
To learn more, I would suggest some course like https://www.udemy.com/course/certified-kubernetes-administrator-with-practice-tests/.
Thanks for all you help on thisβ¦.a further issue developer though with authentication.
Did something change in 1.19.7 that I need to make more changes for?
You can see the details of this hereβ¦https://stackoverflow.com/questions/66387892/jenkins-kubernetes-builds-fail-with-forbidden-user-systemanonymous-verb-get
Is anybody can assist since this is beyond my skillset, Iβd be externally grateful!
Thanks,
Nick
On 24/02/2021, 09:23, Ciprian Hacman notifications@github.com wrote:
Closed #10893.
β You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
1. What
kops
version are you running? The commandkops version
, will display this information. 1.19.92. What Kubernetes version are you running?
kubectl version
will print the version if a cluster is running or provide the Kubernetes version specified as akops
flag.Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.6", GitCommit:"dff82dc0de47299ab66c83c626e08b245ab19037", GitTreeState:"clean", BuildDate:"2020-07-15T16:58:53Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"darwin/amd64"} The connection to the server api.xxxxx.com was refused - did you specify the right host or port?
3. What cloud provider are you using? AWS
4. What commands did you run? What is the simplest way to reproduce this issue? Initiate my autoscaling nodes in AWS - I have 1 master node, and 2 worker nodes in each of the three availability zones
5. What happened after the commands executed? Nothing, my master node fails to come up
6. What did you expect to happen? My master mode would come up and I would be able to use LENS to connect on 443 to view the nodes/pods
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml
to display your cluster manifest. You may want to remove your cluster name and other sensitive information.8. Please run the commands with most verbose logging by adding the
-v 10
flag. Paste the logs into this report, or in a gist and provide the gist link here.My nodes not up so not sure what I can run
9. Anything else do we need to know?
When I view the system.log in the EC2 instance I see this I think the last error is significant, but I have no idea why this has all stopped when it was running smoothly.
however I can SSH to the instance and run "docker version"
SYSLOG: