kubernetes / kops

Kubernetes Operations (kOps) - Production Grade k8s Installation, Upgrades and Management
https://kops.sigs.k8s.io/
Apache License 2.0
15.96k stars 4.65k forks source link

NODEUP Error got error running nodeup (will retry in 30s): error building loader: error finding containerd version #10893

Closed bowqtr closed 3 years ago

bowqtr commented 3 years ago

1. What kops version are you running? The command kops version, will display this information. 1.19.9

2. What Kubernetes version are you running? kubectl version will print the version if a cluster is running or provide the Kubernetes version specified as a kops flag.

Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.6", GitCommit:"dff82dc0de47299ab66c83c626e08b245ab19037", GitTreeState:"clean", BuildDate:"2020-07-15T16:58:53Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"darwin/amd64"} The connection to the server api.xxxxx.com was refused - did you specify the right host or port?

3. What cloud provider are you using? AWS

4. What commands did you run? What is the simplest way to reproduce this issue? Initiate my autoscaling nodes in AWS - I have 1 master node, and 2 worker nodes in each of the three availability zones

5. What happened after the commands executed? Nothing, my master node fails to come up

6. What did you expect to happen? My master mode would come up and I would be able to use LENS to connect on 443 to view the nodes/pods

7. Please provide your cluster manifest. Execute kops get --name my.example.com -o yaml to display your cluster manifest. You may want to remove your cluster name and other sensitive information.

apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: "2018-05-24T14:02:57Z"
  generation: 16
  name: xxx.com
spec:
  api:
    dns: {}
  authorization:
    rbac: {}
  channel: stable
  cloudProvider: aws
  configBase: s3://xxx.com
  etcdClusters:
  - etcdMembers:
    - instanceGroup: master-eu-west-1a
      name: a
    name: main
  - etcdMembers:
    - instanceGroup: master-eu-west-1a
      name: a
    name: events
  iam:
    allowContainerRegistry: true
    legacy: false
  kubernetesApiAccess:
  - 0.0.0.0/0
  kubernetesVersion: 1.18.0
  masterInternalName: api.internal.xxx.com
  masterPublicName: api.xxx.com
  networkCIDR: 172.20.0.0/16
  networking:
    kubenet: {}
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
  - 0.0.0.0/0
  subnets:
  - cidr: 172.20.32.0/19
    name: eu-west-1a
    type: Public
    zone: eu-west-1a
  - cidr: 172.20.64.0/19
    name: eu-west-1b
    type: Public
    zone: eu-west-1b
  - cidr: 172.20.96.0/19
    name: eu-west-1c
    type: Public
    zone: eu-west-1c
  topology:
    dns:
      type: Public
    masters: public
    nodes: public

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2019-09-04T10:15:09Z"
  generation: 1
  labels:
    kops.k8s.io/cluster: xxx.com
  name: large-nodes
spec:
  image: kope.io/k8s-1.12-debian-stretch-amd64-hvm-ebs-2019-06-21
  machineType: m4.2xlarge
  maxPrice: "0.25"
  maxSize: 2
  minSize: 2
  nodeLabels:
    kops.k8s.io/instancegroup: large-nodes
  role: Node
  subnets:
  - eu-west-1a
  - eu-west-1b
  - eu-west-1c

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2018-05-24T14:02:57Z"
  labels:
    kops.k8s.io/cluster: xxx.com
  name: master-eu-west-1a
spec:
  image: kope.io/k8s-1.12-debian-stretch-amd64-hvm-ebs-2019-05-13
  machineType: m4.large
  maxPrice: "0.10"
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-eu-west-1a
  role: Master
  subnets:
  - eu-west-1a

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2018-05-24T14:02:57Z"
  generation: 1
  labels:
    kops.k8s.io/cluster: xxx.com
  name: nodes
spec:
  image: kope.io/k8s-1.16-debian-stretch-amd64-hvm-ebs-2020-01-17
  machineType: m4.xlarge
  maxPrice: "0.25"
  maxSize: 2
  minSize: 2
  nodeLabels:
    kops.k8s.io/instancegroup: nodes
  role: Node
  subnets:
  - eu-west-1a
  - eu-west-1b
  - eu-west-1c

8. Please run the commands with most verbose logging by adding the -v 10 flag. Paste the logs into this report, or in a gist and provide the gist link here.

My nodes not up so not sure what I can run

9. Anything else do we need to know?

When I view the system.log in the EC2 instance I see this I think the last error is significant, but I have no idea why this has all stopped when it was running smoothly.

however I can SSH to the instance and run "docker version"

admin@ip-172-20-55-65:~$ docker version
Client:
 Version:           18.06.3-ce
 API version:       1.38
 Go version:        go1.10.3
 Git commit:        d7080c1
 Built:             Wed Feb 20 02:28:26 2019
 OS/Arch:           linux/amd64
 Experimental:      false
Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.38/version: dial unix /var/run/docker.sock: connect: permission denied
admin@ip-172-20-55-65:~$ 

SYSLOG:

Feb 20 14:32:34 ip-172-20-55-65 cloud-init[743]: Downloading nodeup (https://artifacts.k8s.io/binaries/kops/1.18.2/linux/amd64/nodeup https://github.com/kubernetes/kops/releases/download/v1.18.2/linux-amd64-nodeup https://kubeupv2.s3.amazonaws.com/kops/1.18.2/linux/amd64/nodeup)
Feb 20 14:32:34 ip-172-20-55-65 cloud-init[743]: Attempting download with: curl -f --ipv4 --compressed -Lo nodeup --connect-timeout 20 --retry 6 --retry-delay 10 {url}
Feb 20 14:32:34 ip-172-20-55-65 cloud-init[743]:   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Feb 20 14:32:34 ip-172-20-55-65 cloud-init[743]:                                  Dload  Upload   Total   Spent    Left  Speed
Feb 20 14:32:34 ip-172-20-55-65 ntpd[563]: Soliciting pool server 64.62.190.177
Feb 20 14:32:34 ip-172-20-55-65 ntpd[563]: Soliciting pool server 163.237.218.19
Feb 20 14:32:34 ip-172-20-55-65 ntpd[563]: Soliciting pool server 162.159.200.123
Feb 20 14:32:34 ip-172-20-55-65 ntpd[563]: Soliciting pool server 144.172.118.20
Feb 20 14:32:35 ip-172-20-55-65 ntpd[563]: Soliciting pool server 85.91.1.180
Feb 20 14:32:35 ip-172-20-55-65 ntpd[563]: Soliciting pool server 192.81.135.252
Feb 20 14:32:35 ip-172-20-55-65 ntpd[563]: Soliciting pool server 198.60.22.240
Feb 20 14:32:35 ip-172-20-55-65 ntpd[563]: Soliciting pool server 216.229.0.50
Feb 20 14:32:35 ip-172-20-55-65 cloud-init[743]: #015  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0#015 35 75.3M   35 26.7M    0     0  30.3M      0  0:00:02 --:--:--  0:00:02 30.3M#015100 75.3M  100 75.3M    0     0  47.1M      0  0:00:01  0:00:01 --:--:-- 47.1M
Feb 20 14:32:36 ip-172-20-55-65 cloud-init[743]: == Downloaded https://artifacts.k8s.io/binaries/kops/1.18.2/linux/amd64/nodeup (SHA1 = 25e9b6ddc3bc4d1a272cd9e06acf3334025f40bde02ef1fa6496d8ac105ddc23) ==
Feb 20 14:32:36 ip-172-20-55-65 cloud-init[743]: Running nodeup
Feb 20 14:32:36 ip-172-20-55-65 cloud-init[743]: nodeup version 1.18.2 (git-84495481e4)
Feb 20 14:32:36 ip-172-20-55-65 cloud-init[743]: I0220 14:32:36.398023     820 install.go:194] Built service manifest "kops-configuration.service"
Feb 20 14:32:36 ip-172-20-55-65 cloud-init[743]: [Unit]
Feb 20 14:32:36 ip-172-20-55-65 cloud-init[743]: Description=Run kops bootstrap (nodeup)
Feb 20 14:32:36 ip-172-20-55-65 cloud-init[743]: Documentation=https://github.com/kubernetes/kops
Feb 20 14:32:36 ip-172-20-55-65 cloud-init[743]: [Service]
Feb 20 14:32:36 ip-172-20-55-65 cloud-init[743]: Environment="AWS_REGION=eu-west-1"
Feb 20 14:32:36 ip-172-20-55-65 cloud-init[743]: EnvironmentFile=/etc/environment
Feb 20 14:32:36 ip-172-20-55-65 cloud-init[743]: ExecStart=/opt/kops/bin/nodeup --conf=/opt/kops/conf/kube_env.yaml --v=8
Feb 20 14:32:36 ip-172-20-55-65 cloud-init[743]: Type=oneshot
Feb 20 14:32:36 ip-172-20-55-65 cloud-init[743]: [Install]
Feb 20 14:32:36 ip-172-20-55-65 cloud-init[743]: WantedBy=multi-user.target
Feb 20 14:32:36 ip-172-20-55-65 cloud-init[743]: I0220 14:32:36.398068     820 task.go:97] task *nodetasks.Service does not implement HasLifecycle
Feb 20 14:32:36 ip-172-20-55-65 cloud-init[743]: I0220 14:32:36.398082     820 install.go:68] No package task found; won't update packages
Feb 20 14:32:36 ip-172-20-55-65 cloud-init[743]: I0220 14:32:36.436973     820 topological_sort.go:64] Dependencies:
Feb 20 14:32:36 ip-172-20-55-65 cloud-init[743]: I0220 14:32:36.437028     820 topological_sort.go:66] #011Service/kops-configuration.service:#011[]
Feb 20 14:32:36 ip-172-20-55-65 cloud-init[743]: I0220 14:32:36.437079     820 executor.go:103] Tasks: 0 done / 1 total; 1 can run
Feb 20 14:32:36 ip-172-20-55-65 cloud-init[743]: I0220 14:32:36.437138     820 executor.go:174] Executing task "Service/kops-configuration.service": Service: kops-configuration.service
Feb 20 14:32:36 ip-172-20-55-65 cloud-init[743]: I0220 14:32:36.437307     820 changes.go:81] Field changed "Definition" actual="<nil>" expected="[Unit]\nDescription=Run kops bootstrap (nodeup)\nDocumentation=https://github.com/kubernetes/kops\n\n[Service]\nEnvironment=\"AWS_REGION=eu-west-1\" \nEnvironmentFile=/etc/environment\nExecStart=/opt/kops/bin/nodeup --conf=/opt/kops/conf/kube_env.yaml --v=8\nType=oneshot\n\n[Install]\nWantedBy=multi-user.target\n"
Feb 20 14:32:36 ip-172-20-55-65 cloud-init[743]: I0220 14:32:36.437378     820 changes.go:81] Field changed "Running" actual="false" expected="true"
Feb 20 14:32:36 ip-172-20-55-65 cloud-init[743]: I0220 14:32:36.437409     820 changes.go:81] Field changed "Enabled" actual="<nil>" expected="true"
Feb 20 14:32:36 ip-172-20-55-65 cloud-init[743]: I0220 14:32:36.437432     820 changes.go:81] Field changed "ManageState" actual="<nil>" expected="true"
Feb 20 14:32:36 ip-172-20-55-65 cloud-init[743]: I0220 14:32:36.437457     820 changes.go:81] Field changed "SmartRestart" actual="<nil>" expected="true"
Feb 20 14:32:36 ip-172-20-55-65 cloud-init[743]: I0220 14:32:36.437653     820 files.go:50] Writing file "/lib/systemd/system/kops-configuration.service"
Feb 20 14:32:36 ip-172-20-55-65 cloud-init[743]: I0220 14:32:36.437779     820 service.go:266] Reloading systemd configuration
Feb 20 14:32:36 ip-172-20-55-65 systemd[1]: Reloading.
Feb 20 14:32:36 ip-172-20-55-65 ntpd[563]: Soliciting pool server 44.4.53.2
Feb 20 14:32:36 ip-172-20-55-65 ntpd[563]: Soliciting pool server 45.79.109.111
Feb 20 14:32:36 ip-172-20-55-65 cloud-init[743]: I0220 14:32:36.572542     820 service.go:329] Restarting service "kops-configuration.service"
Feb 20 14:32:36 ip-172-20-55-65 systemd[1]: Starting Run kops bootstrap (nodeup)...
Feb 20 14:32:36 ip-172-20-55-65 nodeup[865]: nodeup version 1.18.2 (git-84495481e4)
Feb 20 14:32:36 ip-172-20-55-65 nodeup[865]: I0220 14:32:36.607480     865 http.go:78] Downloading "https://storage.googleapis.com/kubernetes-release/release/v1.19.1/bin/linux/amd64/kubelet"
Feb 20 14:32:37 ip-172-20-55-65 ntpd[563]: Soliciting pool server 2001:470:1f06:56f::2
Feb 20 14:32:37 ip-172-20-55-65 ntpd[563]: Soliciting pool server 103.149.144.100
Feb 20 14:32:38 ip-172-20-55-65 nodeup[865]: I0220 14:32:38.746598     865 files.go:100] Hash matched for "/var/cache/nodeup/sha256:2ca2a3104d4cce26db128e3a0b7a042385df4f2c51bdbe740e067fdfaa2fcdd1_https___storage_googleapis_com_kubernetes-release_release_v1_19_1_bin_linux_amd64_kubelet": sha256:2ca2a3104d4cce26db128e3a0b7a042385df4f2c51bdbe740e067fdfaa2fcdd1
Feb 20 14:32:38 ip-172-20-55-65 nodeup[865]: I0220 14:32:38.747215     865 assetstore.go:225] added asset "kubelet" for &{"/var/cache/nodeup/sha256:2ca2a3104d4cce26db128e3a0b7a042385df4f2c51bdbe740e067fdfaa2fcdd1_https___storage_googleapis_com_kubernetes-release_release_v1_19_1_bin_linux_amd64_kubelet"}
Feb 20 14:32:38 ip-172-20-55-65 nodeup[865]: I0220 14:32:38.761753     865 http.go:78] Downloading "https://storage.googleapis.com/kubernetes-release/release/v1.19.1/bin/linux/amd64/kubectl"
Feb 20 14:32:39 ip-172-20-55-65 nodeup[865]: I0220 14:32:39.817525     865 files.go:100] Hash matched for "/var/cache/nodeup/sha256:da4de99d4e713ba0c0a5ef6efe1806fb09c41937968ad9da5c5f74b79b3b38f5_https___storage_googleapis_com_kubernetes-release_release_v1_19_1_bin_linux_amd64_kubectl": sha256:da4de99d4e713ba0c0a5ef6efe1806fb09c41937968ad9da5c5f74b79b3b38f5
Feb 20 14:32:39 ip-172-20-55-65 nodeup[865]: I0220 14:32:39.817581     865 assetstore.go:225] added asset "kubectl" for &{"/var/cache/nodeup/sha256:da4de99d4e713ba0c0a5ef6efe1806fb09c41937968ad9da5c5f74b79b3b38f5_https___storage_googleapis_com_kubernetes-release_release_v1_19_1_bin_linux_amd64_kubectl"}
Feb 20 14:32:39 ip-172-20-55-65 nodeup[865]: I0220 14:32:39.817718     865 http.go:78] Downloading "https://storage.googleapis.com/k8s-artifacts-cni/release/v0.8.6/cni-plugins-linux-amd64-v0.8.6.tgz"
Feb 20 14:32:40 ip-172-20-55-65 nodeup[865]: I0220 14:32:40.731252     865 files.go:100] Hash matched for "/var/cache/nodeup/sha256:994fbfcdbb2eedcfa87e48d8edb9bb365f4e2747a7e47658482556c12fd9b2f5_https___storage_googleapis_com_k8s-artifacts-cni_release_v0_8_6_cni-plugins-linux-amd64-v0_8_6_tgz": sha256:994fbfcdbb2eedcfa87e48d8edb9bb365f4e2747a7e47658482556c12fd9b2f5
Feb 20 14:32:40 ip-172-20-55-65 nodeup[865]: I0220 14:32:40.731304     865 assetstore.go:225] added asset "cni-plugins-linux-amd64-v0.8.6.tgz" for &{"/var/cache/nodeup/sha256:994fbfcdbb2eedcfa87e48d8edb9bb365f4e2747a7e47658482556c12fd9b2f5_https___storage_googleapis_com_k8s-artifacts-cni_release_v0_8_6_cni-plugins-linux-amd64-v0_8_6_tgz"}
Feb 20 14:32:40 ip-172-20-55-65 nodeup[865]: I0220 14:32:40.731549     865 assetstore.go:296] running extract command [tar zxf /var/cache/nodeup/sha256:994fbfcdbb2eedcfa87e48d8edb9bb365f4e2747a7e47658482556c12fd9b2f5_https___storage_googleapis_com_k8s-artifacts-cni_release_v0_8_6_cni-plugins-linux-amd64-v0_8_6_tgz -C /var/cache/nodeup/extracted/sha256:994fbfcdbb2eedcfa87e48d8edb9bb365f4e2747a7e47658482556c12fd9b2f5_https___storage_googleapis_com_k8s-artifacts-cni_release_v0_8_6_cni-plugins-linux-amd64-v0_8_6_tgz.tmp-1613831560731356024]
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.470848     865 assetstore.go:335] added asset "bandwidth" for &{"/var/cache/nodeup/extracted/sha256:994fbfcdbb2eedcfa87e48d8edb9bb365f4e2747a7e47658482556c12fd9b2f5_https___storage_googleapis_com_k8s-artifacts-cni_release_v0_8_6_cni-plugins-linux-amd64-v0_8_6_tgz/bandwidth"}
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.471464     865 assetstore.go:335] added asset "bridge" for &{"/var/cache/nodeup/extracted/sha256:994fbfcdbb2eedcfa87e48d8edb9bb365f4e2747a7e47658482556c12fd9b2f5_https___storage_googleapis_com_k8s-artifacts-cni_release_v0_8_6_cni-plugins-linux-amd64-v0_8_6_tgz/bridge"}
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.471504     865 assetstore.go:335] added asset "dhcp" for &{"/var/cache/nodeup/extracted/sha256:994fbfcdbb2eedcfa87e48d8edb9bb365f4e2747a7e47658482556c12fd9b2f5_https___storage_googleapis_com_k8s-artifacts-cni_release_v0_8_6_cni-plugins-linux-amd64-v0_8_6_tgz/dhcp"}
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.471529     865 assetstore.go:335] added asset "firewall" for &{"/var/cache/nodeup/extracted/sha256:994fbfcdbb2eedcfa87e48d8edb9bb365f4e2747a7e47658482556c12fd9b2f5_https___storage_googleapis_com_k8s-artifacts-cni_release_v0_8_6_cni-plugins-linux-amd64-v0_8_6_tgz/firewall"}
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.471555     865 assetstore.go:335] added asset "flannel" for &{"/var/cache/nodeup/extracted/sha256:994fbfcdbb2eedcfa87e48d8edb9bb365f4e2747a7e47658482556c12fd9b2f5_https___storage_googleapis_com_k8s-artifacts-cni_release_v0_8_6_cni-plugins-linux-amd64-v0_8_6_tgz/flannel"}
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.471581     865 assetstore.go:335] added asset "host-device" for &{"/var/cache/nodeup/extracted/sha256:994fbfcdbb2eedcfa87e48d8edb9bb365f4e2747a7e47658482556c12fd9b2f5_https___storage_googleapis_com_k8s-artifacts-cni_release_v0_8_6_cni-plugins-linux-amd64-v0_8_6_tgz/host-device"}
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.471612     865 assetstore.go:335] added asset "host-local" for &{"/var/cache/nodeup/extracted/sha256:994fbfcdbb2eedcfa87e48d8edb9bb365f4e2747a7e47658482556c12fd9b2f5_https___storage_googleapis_com_k8s-artifacts-cni_release_v0_8_6_cni-plugins-linux-amd64-v0_8_6_tgz/host-local"}
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.471636     865 assetstore.go:335] added asset "ipvlan" for &{"/var/cache/nodeup/extracted/sha256:994fbfcdbb2eedcfa87e48d8edb9bb365f4e2747a7e47658482556c12fd9b2f5_https___storage_googleapis_com_k8s-artifacts-cni_release_v0_8_6_cni-plugins-linux-amd64-v0_8_6_tgz/ipvlan"}
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.471661     865 assetstore.go:335] added asset "loopback" for &{"/var/cache/nodeup/extracted/sha256:994fbfcdbb2eedcfa87e48d8edb9bb365f4e2747a7e47658482556c12fd9b2f5_https___storage_googleapis_com_k8s-artifacts-cni_release_v0_8_6_cni-plugins-linux-amd64-v0_8_6_tgz/loopback"}
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.471689     865 assetstore.go:335] added asset "macvlan" for &{"/var/cache/nodeup/extracted/sha256:994fbfcdbb2eedcfa87e48d8edb9bb365f4e2747a7e47658482556c12fd9b2f5_https___storage_googleapis_com_k8s-artifacts-cni_release_v0_8_6_cni-plugins-linux-amd64-v0_8_6_tgz/macvlan"}
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.471714     865 assetstore.go:335] added asset "portmap" for &{"/var/cache/nodeup/extracted/sha256:994fbfcdbb2eedcfa87e48d8edb9bb365f4e2747a7e47658482556c12fd9b2f5_https___storage_googleapis_com_k8s-artifacts-cni_release_v0_8_6_cni-plugins-linux-amd64-v0_8_6_tgz/portmap"}
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.471745     865 assetstore.go:335] added asset "ptp" for &{"/var/cache/nodeup/extracted/sha256:994fbfcdbb2eedcfa87e48d8edb9bb365f4e2747a7e47658482556c12fd9b2f5_https___storage_googleapis_com_k8s-artifacts-cni_release_v0_8_6_cni-plugins-linux-amd64-v0_8_6_tgz/ptp"}
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.471770     865 assetstore.go:335] added asset "sbr" for &{"/var/cache/nodeup/extracted/sha256:994fbfcdbb2eedcfa87e48d8edb9bb365f4e2747a7e47658482556c12fd9b2f5_https___storage_googleapis_com_k8s-artifacts-cni_release_v0_8_6_cni-plugins-linux-amd64-v0_8_6_tgz/sbr"}
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.471794     865 assetstore.go:335] added asset "static" for &{"/var/cache/nodeup/extracted/sha256:994fbfcdbb2eedcfa87e48d8edb9bb365f4e2747a7e47658482556c12fd9b2f5_https___storage_googleapis_com_k8s-artifacts-cni_release_v0_8_6_cni-plugins-linux-amd64-v0_8_6_tgz/static"}
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.471818     865 assetstore.go:335] added asset "tuning" for &{"/var/cache/nodeup/extracted/sha256:994fbfcdbb2eedcfa87e48d8edb9bb365f4e2747a7e47658482556c12fd9b2f5_https___storage_googleapis_com_k8s-artifacts-cni_release_v0_8_6_cni-plugins-linux-amd64-v0_8_6_tgz/tuning"}
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.471842     865 assetstore.go:335] added asset "vlan" for &{"/var/cache/nodeup/extracted/sha256:994fbfcdbb2eedcfa87e48d8edb9bb365f4e2747a7e47658482556c12fd9b2f5_https___storage_googleapis_com_k8s-artifacts-cni_release_v0_8_6_cni-plugins-linux-amd64-v0_8_6_tgz/vlan"}
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.504772     865 s3context.go:213] found bucket in region "eu-west-1"
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.504803     865 s3fs.go:290] Reading file "s3://acme-kops-state/xxx.com/cluster.spec"
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.553987     865 s3fs.go:290] Reading file "s3://acme-kops-state/xxx.com/instancegroup/master-eu-west-1a"
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.794232     865 command.go:606] Using supported docker storage "overlay2"
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.794298     865 command.go:181] Config tags: [_automatic_upgrades _aws]
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.794318     865 command.go:182] OS tags: [_debian_family _systemd]
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.794330     865 command.go:195] Building SecretStore at "s3://acme-kops-state/xxx.com/secrets"
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.794352     865 command.go:207] Building KeyStore at "s3://acme-kops-state/xxx.com/pki"
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.794374     865 command.go:646] Doing modprobe for module br_netfilter
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.796627     865 tree_walker.go:98] visit "nodeup/_automatic_upgrades"
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.796664     865 tree_walker.go:125] Descending into directory, as tag is present: "nodeup/_automatic_upgrades"
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.796678     865 tree_walker.go:98] visit "nodeup/_automatic_upgrades/_debian_family"
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.796693     865 tree_walker.go:125] Descending into directory, as tag is present: "nodeup/_automatic_upgrades/_debian_family"
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.796706     865 tree_walker.go:98] visit "nodeup/_automatic_upgrades/_debian_family/files"
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.796724     865 tree_walker.go:98] visit "nodeup/_automatic_upgrades/_debian_family/files/etc"
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.796743     865 tree_walker.go:98] visit "nodeup/_automatic_upgrades/_debian_family/files/etc/apt"
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.796762     865 tree_walker.go:98] visit "nodeup/_automatic_upgrades/_debian_family/files/etc/apt/apt.conf.d"
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.796781     865 tree_walker.go:98] visit "nodeup/_automatic_upgrades/_debian_family/files/etc/apt/apt.conf.d/20auto-upgrades"
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.796798     865 tree_walker.go:98] visit "nodeup/_automatic_upgrades/_debian_family/packages"
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.796815     865 tree_walker.go:98] visit "nodeup/_automatic_upgrades/_debian_family/packages/unattended-upgrades"
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.796834     865 tree_walker.go:98] visit "nodeup/resources"
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.796850     865 tree_walker.go:98] visit "nodeup/resources/_lyft_vpc_cni"
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.796867     865 tree_walker.go:121] Skipping directory "nodeup/resources/_lyft_vpc_cni" as tag "_lyft_vpc_cni" not present
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.796882     865 tree_walker.go:98] visit "nodeup/_automatic_upgrades"
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.796895     865 tree_walker.go:125] Descending into directory, as tag is present: "nodeup/_automatic_upgrades"
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.796906     865 tree_walker.go:98] visit "nodeup/_automatic_upgrades/_debian_family"
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.796921     865 tree_walker.go:125] Descending into directory, as tag is present: "nodeup/_automatic_upgrades/_debian_family"
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.796934     865 tree_walker.go:98] visit "nodeup/_automatic_upgrades/_debian_family/files"
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.796953     865 tree_walker.go:98] visit "nodeup/_automatic_upgrades/_debian_family/files/etc"
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.796969     865 tree_walker.go:98] visit "nodeup/_automatic_upgrades/_debian_family/files/etc/apt"
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.796986     865 tree_walker.go:98] visit "nodeup/_automatic_upgrades/_debian_family/files/etc/apt/apt.conf.d"
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.797005     865 tree_walker.go:98] visit "nodeup/_automatic_upgrades/_debian_family/files/etc/apt/apt.conf.d/20auto-upgrades"
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.797027     865 loader.go:250] path "nodeup/_automatic_upgrades/_debian_family/files/etc/apt/apt.conf.d/20auto-upgrades" -> task File: "/etc/apt/apt.conf.d/20auto-upgrades"
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.797058     865 tree_walker.go:98] visit "nodeup/_automatic_upgrades/_debian_family/packages"
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.797076     865 tree_walker.go:98] visit "nodeup/_automatic_upgrades/_debian_family/packages/unattended-upgrades"
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.797103     865 tree_walker.go:98] visit "nodeup/resources"
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.797120     865 tree_walker.go:98] visit "nodeup/resources/_lyft_vpc_cni"
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.797136     865 tree_walker.go:121] Skipping directory "nodeup/resources/_lyft_vpc_cni" as tag "_lyft_vpc_cni" not present
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.797153     865 task.go:97] task *nodetasks.Package does not implement HasLifecycle
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.797171     865 task.go:97] task *nodetasks.File does not implement HasLifecycle
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.797184     865 task.go:97] task *nodetasks.Service does not implement HasLifecycle
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.797203     865 task.go:97] task *nodetasks.Package does not implement HasLifecycle
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.797212     865 task.go:97] task *nodetasks.Package does not implement HasLifecycle
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.797221     865 task.go:97] task *nodetasks.Package does not implement HasLifecycle
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.797230     865 task.go:97] task *nodetasks.Package does not implement HasLifecycle
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.797242     865 task.go:97] task *nodetasks.Package does not implement HasLifecycle
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.797250     865 task.go:97] task *nodetasks.Package does not implement HasLifecycle
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.797267     865 update_service.go:42] UpdatePolicy not set in Cluster Spec; skipping creation of update-service
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.797281     865 volumes.go:40] Skipping the volume builder, no volumes defined for this instancegroup
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.797292     865 task.go:97] task *nodetasks.File does not implement HasLifecycle
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: I0220 14:32:41.797302     865 task.go:97] task *nodetasks.File does not implement HasLifecycle
Feb 20 14:32:41 ip-172-20-55-65 nodeup[865]: W0220 14:32:41.797319     865 main.go:138] got error running nodeup (will retry in 30s): error building loader: error finding containerd version
hakman commented 3 years ago

Please try using a newer AMI as the one you are using is quite old: https://kops.sigs.k8s.io/operations/images/#ubuntu-2004-focal

bowqtr commented 3 years ago

Hi, and thank you for assisting; I very much appreciate all the help I can get as this is very much out of my comfort zone... :(

I’ve updated to the ami-0c6b46c1a4827505d which is k8s-1.16-debian-buster-amd64-hvm-ebs-2020-04-27

But still get the same issue…

Feb 21 14:03:09 ip-172-20-33-177 nodeup[625]: W0221 14:03:09.843445 625 main.go:138] got error running nodeup (will retry in 30s): error building loader: error finding containerd version

Really really grateful if somebody could let me know what the problem is...

I think docker has a problem as when I run docker version I get this....

admin@ip-xxxxx:~$ docker version Client: Docker Engine - Community Version: 19.03.4 API version: 1.40 Go version: go1.12.10 Git commit: 9013bf583a Built: Fri Oct 18 15:52:16 2019 OS/Arch: linux/amd64 Experimental: false Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? admin@ip-xxxx:~$

admin@ip-xxxxx:~$ docker system info Client: Debug Mode: false

Server: ERROR: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? errors pretty printing info admin@ip-xxxxx:~$

But I can't tell why the docker daemon is not running...or if it should be :(

In my cloud-init-output.log cloud-init-output.log

Syslog file syslog.txt

hakman commented 3 years ago

Hi. I meant an official Debian or Ubuntu image, not kope.io one. As of version 1.18, kOps recommends using Ubuntu 20.04.

bowqtr commented 3 years ago

Ok, will try an one like ami-0de4bebf376ac28f1 and see what happens.

Thanks,

Nick

On 21/02/2021, 16:06, Ciprian Hacman notifications@github.com wrote:

Hi. I meant an official Debian or Ubuntu image, not kope.io one. As of version 1.18, kOps recommends using Ubuntu 20.04.

β€” You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

omg-sr commented 3 years ago

πŸ‘‹ all,

I've just encountered this same issue this morning.

I'm running kops 1.18.3 to run k8s on AWS with the SpotInst integration. My master node is using the image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20210119.1 ami.

My cluster config looks like this

apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: "2020-01-06T09:23:14Z"
  generation: 60
  name: "****"
spec:
  additionalPolicies:
    master: |
      "****"
    node: |
      "****"
  api:
    loadBalancer:
      sslCertificate: "****"
      type: Internal
  authentication:
    aws: {}
  authorization:
    rbac: {}
  channel: stable
  cloudConfig:
    spotinstOrientation: balanced
    spotinstProduct: Linux/UNIX (Amazon VPC)
  cloudProvider: aws
  configBase: "****"
  dnsZone: "****"
  encryptionConfig: true
  etcdClusters:
  - cpuRequest: 200m
    etcdMembers:
    - instanceGroup: master-eu-west-1a
      name: a
    memoryRequest: 100Mi
    name: main
  - cpuRequest: 100m
    etcdMembers:
    - instanceGroup: master-eu-west-1a
      name: a
    memoryRequest: 100Mi
    name: events
  fileAssets:
  - content: |
      "****"
    name: aws-encryption-provider.yaml
    path: /etc/kubernetes/manifests/aws-encryption-provider.yaml
    roles:
    - Master
  iam:
    allowContainerRegistry: true
    legacy: false
  kubeAPIServer:
    authorizationMode: RBAC,Node
    disableBasicAuth: true
    enableAdmissionPlugins:
    - NamespaceLifecycle
    - LimitRanger
    - ServiceAccount
    - PersistentVolumeLabel
    - DefaultStorageClass
    - DefaultTolerationSeconds
    - MutatingAdmissionWebhook
    - ValidatingAdmissionWebhook
    - NodeRestriction
    - ResourceQuota
    - PodSecurityPolicy
    enableProfiling: false
    featureGates:
      TTLAfterFinished: "true"
  kubeControllerManager:
    featureGates:
      TTLAfterFinished: "true"
  kubeDNS:
    provider: CoreDNS
  kubelet:
    anonymousAuth: false
  kubernetesApiAccess:
  - "****"
  - "****"
  kubernetesVersion: 1.18.15
  masterInternalName: "****"
  masterPublicName: "****"
  networkCIDR: "****"
  networkID: "****"
  networking:
    calico:
      mtu: 8912
  nonMasqueradeCIDR: "****"
  sshAccess:
  - "****"
  subnets:
  - cidr: 172.30.40.0/24
    name: eu-west-1a
    type: Private
    zone: eu-west-1a
  - cidr: 172.30.41.0/24
    name: utility-eu-west-1a
    type: Utility
    zone: eu-west-1a
  - cidr: 172.30.36.0/24
    name: eu-west-1b
    type: Private
    zone: eu-west-1b
  - cidr: 172.30.37.0/24
    name: utility-eu-west-1b
    type: Utility
    zone: eu-west-1b
  - cidr: 172.30.38.0/24
    name: eu-west-1c
    type: Private
    zone: eu-west-1c
  - cidr: 172.30.39.0/24
    name: utility-eu-west-1c
    type: Utility
    zone: eu-west-1c
  topology:
    dns:
      type: Public
    masters: private
    nodes: private

I was getting the same got error running nodeup (will retry in 30s): error building loader: error finding containerd version preventing our dev cluster from spinning up.

After doing a systemctl restart kops-configuration / master node reboot, that seems to of resolved the issue for now so my cluster can spin in. Hopefully this extra info might help.

Ta

bowqtr commented 3 years ago

Hmm - I don't have systemctl on my master node in AWS or my local mac. How can I get that on?

Running locally ➜ ~ kops version Version 1.19.0 ➜ ~ kubectl version Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.6", GitCommit:"dff82dc0de47299ab66c83c626e08b245ab19037", GitTreeState:"clean", BuildDate:"2020-07-15T16:58:53Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"darwin/amd64"} Unable to connect to the server: dial tcp 34.251.103.65:443: i/o timeout

Sorry if a "newbie" question.....much of this detail is new to me!

bowqtr commented 3 years ago

Sorry manage to find a version of this on my node, but when I ran I got this...

admin@ip-172-20-xx-xx:/bin$ ./systemctl restart kops-configuration Failed to connect to bus: No such file or directory admin@ip-172-20-xx-xx:/bin$

bowqtr commented 3 years ago

Wondering if this is connected to some issue with docker....

admin@ip-172-20-xx-xx:/bin$ docker version Client: Version: 18.06.3-ce API version: 1.38 Go version: go1.10.3 Git commit: d7080c1 Built: Wed Feb 20 02:28:26 2019 OS/Arch: linux/amd64 Experimental: false Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.38/version: dial unix /var/run/docker.sock: connect: permission denied admin@ip-172-20-xx-xx:/bin$

When I FTP into the EC2 instance I can see a folder /var/run/docker.sock

admin@ip-172-20-xx-xx:/var/run$ ls -l total 44 -rw-r--r-- 1 root root 4 Feb 22 14:46 acpid.pid srw-rw-rw- 1 root root 0 Feb 22 14:46 acpid.socket -rw------- 1 root root 0 Feb 22 14:47 agetty.reload drwxr-xr-x 2 root root 80 Feb 22 14:46 blkid drwxr-xr-x 2 root root 140 Feb 22 14:47 cloud-init -rw-r--r-- 1 root root 4 Feb 22 14:46 crond.pid ---------- 1 root root 0 Feb 22 14:46 crond.reboot -rw-r--r-- 1 root root 4 Feb 22 14:46 dhclient.eth0.pid prw------- 1 root root 0 Feb 22 14:46 dmeventd-client prw------- 1 root root 0 Feb 22 14:46 dmeventd-server drwx------ 6 root root 140 Feb 22 14:47 docker -rw-r--r-- 1 root root 3 Feb 22 14:47 docker.pid srw-rw---- 1 root docker 0 Feb 22 14:46 docker.sock lrwxrwxrwx 1 root root 25 Feb 22 14:46 initctl -> /run/systemd/initctl/fifo drwxr-xr-x 2 root root 80 Feb 22 14:46 initramfs drwxrwxrwt 4 root root 100 Feb 22 14:46 lock drwxr-xr-x 2 root root 40 Feb 22 14:46 log drwx------ 2 root root 80 Feb 22 14:46 lvm -rw-r--r-- 1 root root 4 Feb 22 14:46 lvmetad.pid -r--r--r-- 1 root root 33 Feb 22 14:46 machine-id -rw-r--r-- 1 root root 80 Feb 22 16:13 motd.dynamic drwxr-xr-x 2 root root 60 Feb 22 14:46 mount drwxr-xr-x 2 root root 120 Feb 22 14:46 network -rw-r--r-- 1 root root 3 Feb 22 14:46 ntpd.pid drwxr-xr-x 2 root root 40 Feb 22 14:46 rpcbind -r--r--r-- 1 root root 0 Feb 22 14:46 rpcbind.lock srw-rw-rw- 1 root root 0 Feb 22 14:46 rpcbind.sock dr-xr-xr-x 11 root root 0 Feb 22 14:46 rpc_pipefs -rw-r--r-- 1 root root 3 Feb 22 14:46 rsyslogd.pid drwxrwxr-x 2 root utmp 40 Feb 22 14:46 screen drwxr-xr-x 2 root root 40 Feb 22 14:46 sendsigs.omit.d lrwxrwxrwx 1 root root 8 Feb 22 14:46 shm -> /dev/shm drwxr-xr-x 2 root root 40 Feb 22 14:46 sshd -rw-r--r-- 1 root root 4 Feb 22 14:46 sshd.pid drwxr-xr-x 2 root root 60 Feb 22 14:46 sysconfig drwxr-xr-x 17 root root 440 Feb 22 14:47 systemd drwxr-xr-x 2 root root 60 Feb 22 14:46 tmpfiles.d drwxr-xr-x 7 root root 160 Feb 22 14:47 udev drwxr-xr-x 2 root root 40 Feb 22 14:46 user -rw-rw-r-- 1 root utmp 3840 Feb 22 15:49 utmp -rw------- 1 root root 0 Feb 22 14:47 xtables.lock

Appreciate any other guidance

hakman commented 3 years ago

To understand what happens, would need a log from the kops-configuration service, but using the AMI that @omg-sr was using 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20210119.1. Please use kOps v1.19.1 when trying.

bowqtr commented 3 years ago

ok let me try that @hakman, I've just changed my AMI to k8s-1.16-debian-buster-amd64-hvm-ebs-2020-04-27 - ami-0c6b46c1a4827505d as I hoped it would be a smaller change I still got the same problem

Feb 22 16:54:15 ip-172-20-63-192 nodeup[615]: W0222 16:54:15.745175 615 main.go:138] got error running nodeup (will retry in 30s): error building loader: error finding containerd version

admin@ip-172-20-63-192:/bin$ ./systemctl restart kops-configuration Failed to connect to bus: No such file or directory admin@ip-172-20-63-192:/bin$ docker version Client: Docker Engine - Community Version: 19.03.4 API version: 1.40 Go version: go1.12.10 Git commit: 9013bf583a Built: Fri Oct 18 15:52:16 2019 OS/Arch: linux/amd64 Experimental: false Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? admin@ip-172-20-63-192:/bin$

I'll now try the Ubuntu image and see what happens

bowqtr commented 3 years ago

Ok have an instance running

ubuntu-eks/k8s_1.18/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20210119.1 - ami-0c7de20ac40893f86 Canonical, Ubuntu EKS Node OS (k8s_1.18), 20.04 LTS, amd64 focal image build on 2021-01-19 Root device type: ebs Virtualization type: hvm ENA Enabled: Yes

....in the end I see the same error: Feb 22 20:23:42 ip-172-20-61-25 nodeup[2500]: W0222 20:23:42.539148 2500 main.go:138] got error running nodeup (will retry in 30s): error building loader: error finding containerd version

When SSH in and type a few things:

ubuntu@ip-172-20-61-25:/bin$ docker version Client: Version: 19.03.8 API version: 1.40 Go version: go1.13.8 Git commit: afacb8b7f0 Built: Fri Dec 18 12:15:19 2020 OS/Arch: linux/amd64 Experimental: false Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.40/version: dial unix /var/run/docker.sock: connect: permission denied ubuntu@ip-172-20-61-25:/bin$

ubuntu@ip-172-20-61-25:/bin$ ./systemctl restart kops-configuration Failed to restart kops-configuration.service: Access denied See system logs and 'systemctl status kops-configuration.service' for details.

ubuntu@ip-172-20-61-25:/bin$ ./systemctl status kops-configuration.service gives.... ● kops-configuration.service - Run kops bootstrap (nodeup) Loaded: loaded (/lib/systemd/system/kops-configuration.service; disabled; vendor preset: enabled) Active: activating (start) since Mon 2021-02-22 20:11:15 UTC; 12min ago Docs: https://github.com/kubernetes/kops Main PID: 2500 (nodeup) Tasks: 7 (limit: 9538) Memory: 276.3M CGroup: /system.slice/kops-configuration.service └─2500 /opt/kops/bin/nodeup --conf=/opt/kops/conf/kube_env.yaml --v=8

Feb 22 20:23:42 ip-172-20-61-25 nodeup[2500]: I0222 20:23:42.539037 2500 task.go:97] task nodetasks.Package does not implement HasLifecycle Feb 22 20:23:42 ip-172-20-61-25 nodeup[2500]: I0222 20:23:42.539046 2500 task.go:97] task nodetasks.Package does not implement HasLifecycle Feb 22 20:23:42 ip-172-20-61-25 nodeup[2500]: I0222 20:23:42.539058 2500 task.go:97] task nodetasks.Package does not implement HasLifecycle Feb 22 20:23:42 ip-172-20-61-25 nodeup[2500]: I0222 20:23:42.539068 2500 task.go:97] task nodetasks.Package does not implement HasLifecycle Feb 22 20:23:42 ip-172-20-61-25 nodeup[2500]: I0222 20:23:42.539077 2500 task.go:97] task nodetasks.Package does not implement HasLifecycle Feb 22 20:23:42 ip-172-20-61-25 nodeup[2500]: I0222 20:23:42.539090 2500 update_service.go:42] UpdatePolicy not set in Cluster Spec; skipping creation of update-service Feb 22 20:23:42 ip-172-20-61-25 nodeup[2500]: I0222 20:23:42.539103 2500 volumes.go:40] Skipping the volume builder, no volumes defined for this instancegroup Feb 22 20:23:42 ip-172-20-61-25 nodeup[2500]: I0222 20:23:42.539114 2500 task.go:97] task nodetasks.File does not implement HasLifecycle Feb 22 20:23:42 ip-172-20-61-25 nodeup[2500]: I0222 20:23:42.539124 2500 task.go:97] task *nodetasks.File does not implement HasLifecycle Feb 22 20:23:42 ip-172-20-61-25 nodeup[2500]: W0222 20:23:42.539148 2500 main.go:138] got error running nodeup (will retry in 30s): error building loader: error finding containerd version

Is there anything else which could be causing this which we should be looking at? Maybe some other YAML file?

Thanks in advance :)

bowqtr commented 3 years ago

I can see that there is no file under the "containerd" folder - is that normal?

Screenshot 2021-02-22 at 20 31 05

bowqtr commented 3 years ago

This is also my kube_env.yaml

Assets:
- 2ca2a3104d4cce26db128e3a0b7a042385df4f2c51bdbe740e067fdfaa2fcdd1@https://storage.googleapis.com/kubernetes-release/release/v1.19.1/bin/linux/amd64/kubelet
- da4de99d4e713ba0c0a5ef6efe1806fb09c41937968ad9da5c5f74b79b3b38f5@https://storage.googleapis.com/kubernetes-release/release/v1.19.1/bin/linux/amd64/kubectl
- 994fbfcdbb2eedcfa87e48d8edb9bb365f4e2747a7e47658482556c12fd9b2f5@https://storage.googleapis.com/k8s-artifacts-cni/release/v0.8.6/cni-plugins-linux-amd64-v0.8.6.tgz
ClusterName: k8s-aws.engineering.acme.com
ConfigBase: s3://acme-kops-state/k8s-aws.engineering.acme.com
InstanceGroupName: master-eu-west-1a
Tags:
- _automatic_upgrades
- _aws
channels:
- s3://acme-kops-state/k8s-aws.engineering.acme.com/addons/bootstrap-channel.yaml
etcdManifests:
- s3://acme-kops-state/k8s-aws.engineering.acme.com/manifests/etcd/main.yaml
- s3://acme-kops-state/k8s-aws.engineering.acme.com/manifests/etcd/events.yaml
protokubeImage:
  hash: d89a4222576a8396a65251ca93537107a9698f4eb0608b4e10e0d33e2f9a8766
  name: protokube:1.18.2
  sources:
  - https://artifacts.k8s.io/binaries/kops/1.18.2/images/protokube.tar.gz
  - https://github.com/kubernetes/kops/releases/download/v1.18.2/images-protokube.tar.gz
  - https://kubeupv2.s3.amazonaws.com/kops/1.18.2/images/protokube.tar.gz
staticManifests:
- key: kube-apiserver-healthcheck
  path: manifests/static/kube-apiserver-healthcheck.yaml

Woudl it be helpful to see others?

hakman commented 3 years ago

From the file you pasted above, I can see that you are using kOps v1.18.2 and install kubernetes 1.19.1, which is not supported. You should download and use the kOps 1.19.1 binary from https://github.com/kubernetes/kops/releases/tag/v1.19.1. To upgrade the cluster you should follow: https://kops.sigs.k8s.io/operations/updates_and_upgrades/#automated-update

bowqtr commented 3 years ago

Thanks @hakman I think we're making some progress; as I newbie I appreciate you're help and my eyes are understand a lot more than I'm seeing.

It still doesn't work, but I now think its a config issue that needs to change with 1.19.1 I now get this...

ubuntu@ip-172-20-58-38:~$ docker version Client: Version: 19.03.8 API version: 1.40 Go version: go1.13.8 Git commit: afacb8b7f0 Built: Fri Dec 18 12:15:19 2020 OS/Arch: linux/amd64 Experimental: false Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.40/version: dial unix /var/run/docker.sock: connect: permission denied ubuntu@ip-172-20-58-38:~$ cd /bin ubuntu@ip-172-20-58-38:/bin$ ./systemctl restart kops-configuration Failed to restart kops-configuration.service: Access denied See system logs and 'systemctl status kops-configuration.service' for details. ubuntu@ip-172-20-58-38:/bin$ ./systemctl status kops-configuration.service ● kops-configuration.service - Run kops bootstrap (nodeup) Loaded: loaded (/lib/systemd/system/kops-configuration.service; disabled; vendor preset: enabled) Active: activating (start) since Tue 2021-02-23 10:44:50 UTC; 53s ago Docs: https://github.com/kubernetes/kops Main PID: 2513 (nodeup) Tasks: 6 (limit: 9538) Memory: 272.9M CGroup: /system.slice/kops-configuration.service └─2513 /opt/kops/bin/nodeup --conf=/opt/kops/conf/kube_env.yaml --v=8

Feb 23 10:45:27 ip-172-20-58-38 nodeup[2513]: [Socket] Feb 23 10:45:27 ip-172-20-58-38 nodeup[2513]: ListenStream=/var/run/docker.sock Feb 23 10:45:27 ip-172-20-58-38 nodeup[2513]: SocketMode=0660 Feb 23 10:45:27 ip-172-20-58-38 nodeup[2513]: SocketUser=root Feb 23 10:45:27 ip-172-20-58-38 nodeup[2513]: SocketGroup=docker Feb 23 10:45:27 ip-172-20-58-38 nodeup[2513]: [Install] Feb 23 10:45:27 ip-172-20-58-38 nodeup[2513]: WantedBy=sockets.target Feb 23 10:45:27 ip-172-20-58-38 nodeup[2513]: I0223 10:45:27.887292 2513 task.go:103] task *nodetasks.Service does not implement HasLifecycle Feb 23 10:45:27 ip-172-20-58-38 nodeup[2513]: I0223 10:45:27.887323 2513 assetstore.go:106] Matching assets for "^docker/": Feb 23 10:45:27 ip-172-20-58-38 nodeup[2513]: W0223 10:45:27.887351 2513 main.go:133] got error running nodeup (will retry in 30s): error building loader: unable to find any Docker binaries in assets ubuntu@ip-172-20-58-38:/bin$

This is my EC2 user data launch script user data EC2 launch.txt

I think this is getting closer, but although I've changed the kube_eml to add the new amd64 bits. this docker thing is confusing me. Is there an example as to what it should be for 1.19.1?

Many thanks in advance....sincerely appreciate the guidance.

hakman commented 3 years ago

No worries. Please add the output of kops get --name my.example.com -o yaml.

bowqtr commented 3 years ago

Hi @hakman - attached the output; I know the instancegroups are wrong and need changing, was planning on doing that once I know I have a config for the main node to get up and running.

cluster yaml output.txt

Many thanks!

hakman commented 3 years ago

Your user data doesn't match the cluster version, it does just partially. I am not sure how you ended up in that state. How did you upgrade the cluster?

bowqtr commented 3 years ago

Hi @hakman

kops edit cluster Can I make changes manually to fix?

I'm less clear on how to make changes through "kops edit ig" etc

....but happy to try and make them, so long as I know what to make.

hakman commented 3 years ago

Did you ever run something like this or similar?

kops upgrade cluster --yes
kops update cluster --yes
kops rolling-update cluster --instance-group master-eu-west-1a --cloudonly --yes 
bowqtr commented 3 years ago

Kinda, my usual upgrade process was

kops edit cluster
kops update cluster β€”yes
kops rolling-update cluster
kops rolling-update cluster β€”yes

This may have been only partially correct, but that what I was handed over from the guy that left.

The rolling-update part no longer works as the cluster is down (guessing that's obvious)

Thanks

bowqtr commented 3 years ago

when I do a kops upgrade cluster, I now get this

➜ ~ kops upgrade cluster Using cluster from kubectl context: k8s.staging.acme.com

I0223 11:43:20.350760   65415 upgrade_cluster.go:197] Custom image (099720109477/ubuntu-eks/k8s_1.19/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20210211) has been provided for Instance Group "master-eu-west-1a"; not updating image
ITEM                            PROPERTY                OLD                                                             NEW
Cluster                         KubernetesVersion       1.19.1                                                          1.19.7
InstanceGroup/large-nodes       Image                   kope.io/k8s-1.12-debian-stretch-amd64-hvm-ebs-2019-06-21        099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20210119.1
InstanceGroup/nodes             Image                   kope.io/k8s-1.16-debian-stretch-amd64-hvm-ebs-2020-01-17        099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20210119.1

Must specify --yes to perform upgrade ➜ ~

Should I do this with the --yes?

bowqtr commented 3 years ago

Look like this would update the IG which I have for small and large worker nodes, but would it solve the Docker issue in the main node?

hakman commented 3 years ago

Upgrade cluster is not needed now, but at least update cluster should tell you that there are changes that need to be applied. Rolling update with --cloudonly should work.

bowqtr commented 3 years ago

Ok, ran that and got this

➜  ~ kops rolling-update cluster --cloudonly
Using cluster from kubectl context: acme.com

NAME                    STATUS  NEEDUPDATE      READY   MIN     TARGET  MAX
large-nodes             Ready   0               0       0       0       3
master-eu-west-1a       Ready   0               1       0       1       1
nodes                   Ready   0               0       0       0       3

No rolling-update required.
➜  ~ 
hakman commented 3 years ago

update cluster not rolling-update cluster. :)

bowqtr commented 3 years ago

Hmm, something is amiss

~ kops update cluster --cloudonly 
Error: unknown flag: --cloudonly
Usage:
  kops update cluster [flags]

Examples:
  # After cluster has been edited or upgraded, configure it with:
  kops update cluster k8s-cluster.example.com --yes --state=s3://my-state-store --yes --admin

Flags:
      --admin duration[=18h0m0s]      Also export a cluster admin user credential with the specified lifetime and add it to the cluster context
      --allow-kops-downgrade          Allow an older version of kops to update the cluster than last used
      --create-kube-config            Will control automatically creating the kube config file on your local filesystem (default true)
  -h, --help                          help for cluster
      --internal                      Use the cluster's internal DNS name. Implies --create-kube-config
      --lifecycle-overrides strings   comma separated list of phase overrides, example: SecurityGroups=Ignore,InternetGateway=ExistsAndWarnIfChanges
      --out string                    Path to write any local output
      --phase string                  Subset of tasks to run: assets, cluster, network, security
      --ssh-public-key string         SSH public key to use (deprecated: use kops create secret instead)
      --target string                 Target - direct, terraform, cloudformation (default "direct")
      --user string                   Existing user to add to the cluster context. Implies --create-kube-config
  -y, --yes                           Create cloud resources, without --yes update is in dry run mode

Global Flags:
      --add_dir_header                   If true, adds the file directory to the header of the log messages
      --alsologtostderr                  log to standard error as well as files
      --config string                    yaml config file (default is $HOME/.kops.yaml)
      --log_backtrace_at traceLocation   when logging hits line file:N, emit a stack trace (default :0)
      --log_dir string                   If non-empty, write log files in this directory
      --log_file string                  If non-empty, use this log file
      --log_file_max_size uint           Defines the maximum size a log file can grow to. Unit is megabytes. If the value is 0, the maximum file size is unlimited. (default 1800)
      --logtostderr                      log to standard error instead of files (default true)
      --name string                      Name of cluster. Overrides KOPS_CLUSTER_NAME environment variable
      --skip_headers                     If true, avoid header prefixes in the log messages
      --skip_log_headers                 If true, avoid headers when opening log files
      --state string                     Location of state storage (kops 'config' file). Overrides KOPS_STATE_STORE environment variable
      --stderrthreshold severity         logs at or above this threshold go to stderr (default 2)
  -v, --v Level                          number for the log level verbosity
      --vmodule moduleSpec               comma-separated list of pattern=N settings for file-filtered logging

unknown flag: --cloudonly
bowqtr commented 3 years ago

I did this, kops update cluster, should I now repeat with --yes?

 ~ kops update cluster           
Using cluster from kubectl context: k8s.staging.acme.com

*********************************************************************************

A new kubernetes version is available: 1.19.7
Upgrading is recommended (try kops upgrade cluster)

More information: https://github.com/kubernetes/kops/blob/master/permalinks/upgrade_k8s.md#1.19.7

*********************************************************************************

*********************************************************************************

Kubelet anonymousAuth is currently turned on. This allows RBAC escalation and remote code execution possibilities.
It is highly recommended you turn it off by setting 'spec.kubelet.anonymousAuth' to 'false' via 'kops edit cluster'

See https://kops.sigs.k8s.io/security/#kubelet-api

*********************************************************************************

I0223 12:14:51.915048   66533 executor.go:111] Tasks: 0 done / 86 total; 46 can run
I0223 12:14:52.406811   66533 executor.go:111] Tasks: 46 done / 86 total; 18 can run
I0223 12:14:52.791025   66533 executor.go:111] Tasks: 64 done / 86 total; 19 can run
I0223 12:14:53.385868   66533 executor.go:111] Tasks: 83 done / 86 total; 3 can run
I0223 12:14:53.616314   66533 executor.go:111] Tasks: 86 done / 86 total; 0 can run
Will modify resources:
  AutoscalingGroup/large-nodes.k8s.staging.acme.com
        MaxSize                  3 -> 2
        MinSize                  0 -> 2

  AutoscalingGroup/master-eu-west-1a.masters.k8s.staging.acme.com
        LaunchTemplate           <nil> -> name:master-eu-west-1a.masters.k8s.staging.acme.com id:lt-0dd26de2bb906acb6
        MinSize                  0 -> 1

  AutoscalingGroup/nodes.k8s.staging.acme.com
        MaxSize                  3 -> 2
        MinSize                  0 -> 2

  LaunchTemplate/master-eu-west-1a.masters.k8s.staging.acme.com
        ImageID                  ami-0c7de20ac40893f86 -> 099720109477/ubuntu-eks/k8s_1.19/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20210211

Must specify --yes to apply changes
➜  ~ 
hakman commented 3 years ago

Yes and after rolling update.

hakman commented 3 years ago
kops update cluster --yes
kops rolling-update cluster --cloudonly --yes 
bowqtr commented 3 years ago

Hi @hakman

Ok got this...just waiting for my new IP to propagate to see if I can connect using LENS...

➜  ~ kops update cluster --yes
Using cluster from kubectl context: k8s.staging.acme.com

*********************************************************************************

A new kubernetes version is available: 1.19.7
Upgrading is recommended (try kops upgrade cluster)

More information: https://github.com/kubernetes/kops/blob/master/permalinks/upgrade_k8s.md#1.19.7

*********************************************************************************

*********************************************************************************

Kubelet anonymousAuth is currently turned on. This allows RBAC escalation and remote code execution possibilities.
It is highly recommended you turn it off by setting 'spec.kubelet.anonymousAuth' to 'false' via 'kops edit cluster'

See https://kops.sigs.k8s.io/security/#kubelet-api

*********************************************************************************

I0223 12:41:52.833248   67376 executor.go:111] Tasks: 0 done / 86 total; 46 can run
I0223 12:41:53.312642   67376 executor.go:111] Tasks: 46 done / 86 total; 18 can run
I0223 12:41:53.749426   67376 executor.go:111] Tasks: 64 done / 86 total; 19 can run
I0223 12:41:54.525779   67376 executor.go:111] Tasks: 83 done / 86 total; 3 can run
I0223 12:41:55.869205   67376 executor.go:111] Tasks: 86 done / 86 total; 0 can run
I0223 12:41:55.869891   67376 dns.go:156] Pre-creating DNS records
I0223 12:41:56.724568   67376 update_cluster.go:313] Exporting kubecfg for cluster
kops has set your kubectl context to k8s.staging.acme.com
W0223 12:41:56.888620   67376 update_cluster.go:337] Exported kubecfg with no user authentication; use --admin, --user or --auth-plugin flags with `kops export kubecfg`

Cluster changes have been applied to the cloud.

Changes may require instances to restart: kops rolling-update cluster

➜  ~ kops rolling-update cluster --cloudonly --yes 
Using cluster from kubectl context: k8s.staging.acme.com

NAME            STATUS      NEEDUPDATE  READY   MIN TARGET  MAX
large-nodes     Ready       0       2   2   2   2
master-eu-west-1a   NeedsUpdate 1       0   1   1   1
nodes           Ready       0       2   2   2   2
W0223 12:42:29.127913   67405 instancegroups.go:415] Not validating cluster as cloudonly flag is set.
W0223 12:42:29.128068   67405 instancegroups.go:341] Not draining cluster nodes as 'cloudonly' flag is set.
I0223 12:42:29.128079   67405 instancegroups.go:521] Stopping instance "i-0fcb8c5c999ef60e3", in group "master-eu-west-1a.masters.k8s.staging.acme.com" (this may take a while).
I0223 12:42:29.252741   67405 instancegroups.go:383] waiting for 15s after terminating instance
W0223 12:42:44.254679   67405 instancegroups.go:415] Not validating cluster as cloudonly flag is set.
I0223 12:42:44.254758   67405 rollingupdate.go:208] Rolling update completed for cluster "k8s.staging.acme.com"!
bowqtr commented 3 years ago

Ok - I can't connect using LENS.

I also can't login as it used an older launch config in the autoscaling group; it has changed this launch config back to the original which has the AMI has been updated, but the key pair is the old key pair.

I have created a new launch config with the new AMI and correct key pair, call MAIN NODE (UBUNTU 1.19 AMD2) but how do I use kops or kubectl to ensure the auto-scaling group master-eu-west-1a using my new launch config with its new user data launch script?

Thanks

hakman commented 3 years ago

You should really read the docs, just saying. Manual changes will be rewritten by kOps and you should never do manual changes to your launch templates and user data. https://kops.sigs.k8s.io/cluster_spec/ You should use only edit cluster / ig and upgrade, update, rolling-update commands. Also you may try your luck in the Slack channel https://kubernetes.slack.com/messages/kops-users/. Good luck!

bowqtr commented 3 years ago

I get most of that, I just don't understand how my autoscaling group is linked to a particular launch config, or how I may be able to change that by issuing a new KOPS command. I don't see this listed in any of the config or even that cluster_spec URL, unless I'm so tired, I'm just missing it. Sorry for been a newbie, but grateful if you're able to help.

hakman commented 3 years ago

This should help: https://kops.sigs.k8s.io/tutorial/working-with-instancegroups/ Short version, minSize and maxSize control the ASG size. You get an ASG for each instace group.

bowqtr commented 3 years ago

I read that..but sorry, still lost

I have three instance groups

 .ssh kops get ig                                                     
Using cluster from kubectl context: k8.acme.com

NAME            ROLE    MACHINETYPE MIN MAX ZONES
large-nodes     Node    m4.2xlarge  0   3   eu-west-1a,eu-west-1b,eu-west-1c
master-eu-west-1a   Master  m4.large    0   1   eu-west-1a
nodes           Node    m4.xlarge   0   3   eu-west-1a,eu-west-1b,eu-west-1c
➜  .ssh 

And my instance groups have min / max as you can see above But in the instancegroup definition, I don't see a linkage to the launch configuration in the scaling group.

The launch config, contains the nodeup cloud-init for the EC2 instance, plus the SSH key pair. I changed that ... so trying to get the KOPS config to use the new launch configs.

This is where I'm getting lost : And I can't see a reference in the URL https://kops.sigs.k8s.io/tutorial/working-with-instancegroups/ with launch configurations in AWS.

If I change the launch config for the scaling group manually in AWS EC2 to the new launch config for that node, then after running KOPS it returns back to its prior launch config in AWS....so this must be stored somewhere, but I'm unsure where and how to change!? This is what is confusing me, as nothing seems obvious.

hakman commented 3 years ago

There is not link between launch config and IG, kOps manages the launch config, not you. Use kOps to edit the IGs and add the options that you want, which will be translated into a launch config when running an update.

bowqtr commented 3 years ago

So this is where I'm totally lost...as this is my maste rnode IG config when I type kops edit instancegroup master-eu-west-1a

But in this I see no reference to the launch configuration attached to the auto-scaling group in AWS. Which means that its launching with the old EC2 launch user data config, ssh keys etc

I can change the auto-scaling group manually in the EC2 dashboard, but this changes back, so something is amiss. But because I can't see a reference to this in the IG definition as below, I'm confused.

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2018-05-24T14:02:57Z"
  generation: 1
  labels:
    kops.k8s.io/cluster: k8s-staging.acme.com
  name: master-eu-west-1a
spec:
  image: 099720109477/ubuntu-eks/k8s_1.19/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20210211
  machineType: m4.large
  maxPrice: "0.10"
  maxSize: 1
  minSize: 0
  nodeLabels:
    kops.k8s.io/instancegroup: master-eu-west-1a
  role: Master
  subnets:
  - eu-west-1a
hakman commented 3 years ago

Try using https://kops.sigs.k8s.io/secrets/#create-secret or https://kops.sigs.k8s.io/cluster_spec/#sshkeyname.

bowqtr commented 3 years ago

ok, I think I managed to resolve the issue by creating a new template instead of a config

The node still doesn't come up though - says its missing Docker binaries in assets

ubuntu@ip-172-20-34-36:~$ docker version
Client:
 Version:           19.03.8
 API version:       1.40
 Go version:        go1.13.8
 Git commit:        afacb8b7f0
 Built:             Fri Dec 18 12:15:19 2020
 OS/Arch:           linux/amd64
 Experimental:      false
Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.40/version: dial unix /var/run/docker.sock: connect: permission denied

ubuntu@ip-172-20-34-36:~$ cd /bin
ubuntu@ip-172-20-34-36:/bin$ ./systemctl restart kops-configuration
Failed to restart kops-configuration.service: Access denied
See system logs and 'systemctl status kops-configuration.service' for details.

ubuntu@ip-172-20-34-36:/bin$ ./systemctl status kops-configuration.service
● kops-configuration.service - Run kops bootstrap (nodeup)
     Loaded: loaded (/lib/systemd/system/kops-configuration.service; disabled; vendor preset: enabled)
     Active: activating (start) since Tue 2021-02-23 15:02:52 UTC; 5min ago
       Docs: https://github.com/kubernetes/kops
   Main PID: 2526 (nodeup)
      Tasks: 6 (limit: 9538)
     Memory: 274.6M
     CGroup: /system.slice/kops-configuration.service
             └─2526 /opt/kops/bin/nodeup --conf=/opt/kops/conf/kube_env.yaml --v=8

Feb 23 15:07:36 ip-172-20-34-36 nodeup[2526]: [Socket]
Feb 23 15:07:36 ip-172-20-34-36 nodeup[2526]: ListenStream=/var/run/docker.sock
Feb 23 15:07:36 ip-172-20-34-36 nodeup[2526]: SocketMode=0660
Feb 23 15:07:36 ip-172-20-34-36 nodeup[2526]: SocketUser=root
Feb 23 15:07:36 ip-172-20-34-36 nodeup[2526]: SocketGroup=docker
Feb 23 15:07:36 ip-172-20-34-36 nodeup[2526]: [Install]
Feb 23 15:07:36 ip-172-20-34-36 nodeup[2526]: WantedBy=sockets.target
Feb 23 15:07:36 ip-172-20-34-36 nodeup[2526]: I0223 15:07:36.559453    2526 task.go:103] task *nodetasks.Service does not implement HasLifecycle
Feb 23 15:07:36 ip-172-20-34-36 nodeup[2526]: I0223 15:07:36.559487    2526 assetstore.go:106] Matching assets for "^docker/":
Feb 23 15:07:36 ip-172-20-34-36 nodeup[2526]: W0223 15:07:36.559516    2526 main.go:133] got error running nodeup (will retry in 30s): error building loader: unable to find any Docker binaries in assets
ubuntu@ip-172-20-34-36:/bin$ 

Is this because in the new 1.19.1 I need to change the launch script in the EC2 instance?

Attached are the cluster_spec, ig_spec, and kube_env YAML files which are created when the EC2 instance is launched.

I figuring that something in here is causing Docker to have an issue or there some other config problem, as Docker exists as a command to run although has a permissions issue dial unix /var/run/docker.sock: connect: permission denied but pondering whether this is what is causing the whole kubernates to fail.

Any ideas as I'm totally stuck.

hakman commented 3 years ago

From my point of view, this is not a support issue, it is just user error. You are trying to manually change things that you shouldn't touch and you insist doing that. I don't think anyone can help with that. If you want to learn, create a new cluster as described in the docs and you can see how things change when you edit/update/roll the cluster.

bowqtr commented 3 years ago

Hi @hakman I appreciate that perspective, but nothing had changed - I just tried to bring up the cluster one day and was getting the error. It surprised me! I have another cluster in another region (slightly different config) which still works ok. Ont his cluster in staging which is failing we originally saw this error....

865 main.go:138] got error running nodeup (will retry in 30s): error building loader: error finding containerd version

With you guidance, in changing the AMI image from debian to ubuntu, and then having to change more things to make it 1.19.1 compatible like adding arm64 into the yaml, and running some kops upgrade cluster, the new image spins up, but we still have docker issues with nodeup now its

2526 main.go:133] got error running nodeup (will retry in 30s): error building loader: unable to find any Docker binaries in assets

Not trying to define the error, just appreciate assistance to get it working. Especially if there have to be more changes to ensure docker is working in the new ubuntu AMI by changing the cluster_spec, ig_spec, and kube_env YAML which are specified manually in the EC2 launch instance which is why I included then.

Maybe something is needed in 1.19.x which wasn't needed in a prior version.

The guy that used to manage this has left the company, so I'm all alone. I'm not a kubernates or KOPS expert, hence why asking for support or assistance to help figure our why docker seems to be installed but why NODEUP is reporting error building loader: unable to find any Docker binaries in assets**

Truly any help since I'm not an expert in this would be very much appreciated!

hakman commented 3 years ago

You are using configs form kOps 1.18 with kOps 1.19 binaries. The only way to get there is by manually changing the user data. If you would just run all the steps for an upgrade, you would fix things. Also, I think I asked for some times to use the ubuntu image that has no bundled Docker. This is where upgrade cluster --yes would have helped. FYI, kOps 1.19.1 installs Docker 19.03.15.

bowqtr commented 3 years ago

I've managed to get it back up and running; a sincere thanks to you and others who helped. I know it can be frustrating to help those who are less familiar but its sincerely appreciated.

THANK YOU!

Screenshot 2021-02-23 at 18 03 45

hakman commented 3 years ago

Happy it all worked out. Hope this helped a little with getting to know k8s and kOps. Would appreciate if you can write a short summary of what was the solution before closing.

bowqtr commented 3 years ago

Will try too, but I think it was mainly what you said, that I needed to use a different AMI (so choose 099720109477/ubuntu-eks/k8s_1.19/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20210211) and then I have to adjust the user data of the launch script for references to amd64 in the yaml. Then I just let the kops upgrade cluster do it's thing. I then applied that to the main node, and then the worker nodes.

What's the best site to visually learn K8s on AWS for those of us who are happy to dip in, but not so familiar with all to complications? I don't mind learning through failure (its the human way to learn) but having something that assists with knowledge would be great that pretty clear, concise, doesn't assume too much etc. Any recommendations?

hakman commented 3 years ago

To learn more, I would suggest some course like https://www.udemy.com/course/certified-kubernetes-administrator-with-practice-tests/.

bowqtr commented 3 years ago

Thanks for all you help on this….a further issue developer though with authentication.

Did something change in 1.19.7 that I need to make more changes for?

You can see the details of this here…https://stackoverflow.com/questions/66387892/jenkins-kubernetes-builds-fail-with-forbidden-user-systemanonymous-verb-get

Is anybody can assist since this is beyond my skillset, I’d be externally grateful!

Thanks,

Nick

On 24/02/2021, 09:23, Ciprian Hacman notifications@github.com wrote:

Closed #10893.

β€” You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.