kubernetes-sigs / kind

Kubernetes IN Docker - local clusters for testing Kubernetes
https://kind.sigs.k8s.io/
Apache License 2.0
13.4k stars 1.55k forks source link

Unable to create a cluster inside LXD container #455

Closed GrosLalo closed 5 years ago

GrosLalo commented 5 years ago

What happened:

I have been following the instructions mentioned https://kind.sigs.k8s.io/docs/user/quick-start/ and I am unable to get the cluster created with kind. The process to kind create cluster fails at the level of Starting control-plane

What you expected to happen:

I expected the same outcome as described on https://kind.sigs.k8s.io/docs/user/quick-start/

How to reproduce it (as minimally and precisely as possible):

For the given environment below, i just ran the command kind create cluster --loglevel debug and then observed the following issues:

Preflight verification error:

[preflight] The system verification failed. Printing the output from the verification:
KERNEL_VERSION: 5.0.0-13-generic
DOCKER_VERSION: 18.06.3-ce
DOCKER_GRAPH_DRIVER: overlay2
OS: Linux
CGROUPS_CPU: enabled
CGROUPS_CPUACCT: enabled
CGROUPS_CPUSET: enabled
CGROUPS_DEVICES: enabled
CGROUPS_FREEZER: enabled
CGROUPS_MEMORY: enabled
        [WARNING SystemVerification]: [unsupported kernel release: 5.0.0-13-generic, failed to parse kernel config: unable to load kernel module: "configs", output: "modprobe: ERROR: ../libkmod/libkmod.c:586 kmod$
search_moddep() could not open moddep file '/lib/modules/5.0.0-13-generic/modules.dep.bin'\nmodprobe: FATAL: Module configs not found in directory /lib/modules/5.0.0-13-generic\n", err: exit status 1]

However, given that pre-flight errors are ignored during that stage, i assume that the above kernel version has not been deemed to be a problem. So, subsequent problems worth noting are:

[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.
I0426 10:24:30.849958     752 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 20000 milliseconds
I0426 10:24:50.847667     752 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 19497 milliseconds
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.

Anything else we need to know?:

Environment:

WARNING: No swap limit support WARNING: bridge-nf-call-iptables is disabled WARNING: bridge-nf-call-ip6tables is disabled


- OS:

NAME="Ubuntu" VERSION="18.04.2 LTS (Bionic Beaver)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 18.04.2 LTS" VERSION_ID="18.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=bionic UBUNTU_CODENAME=bionic

aojea commented 5 years ago

@GrosLalo seems you are using a proxy

HTTP Proxy: http://10.144.1.10:8080 HTTPS Proxy: http://10.144.1.10:8080

Please configure the NO_PROXY env variable with localhost and your docker subnet ( I assume is 172.17.0.0/16) and try again

GrosLalo commented 5 years ago

@aojea: I am getting the following error when setting the NO_PROXY:

[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s                                                        
I0426 11:47:27.345616     750 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds                                                                                              
I0426 11:47:27.846864     750 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds                                                                                              
I0426 11:47:28.346869     750 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds                                                                                              
I0426 11:47:28.846899     750 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds 

I then tried to run the journalctl command in the control-plane container and obtained this:

Apr 26 12:22:55 kind-control-plane kubelet[4908]: I0426 12:22:55.917788    4908 server.go:999] Started kubelet
Apr 26 12:22:55 kind-control-plane kubelet[4908]: I0426 12:22:55.917841    4908 server.go:137] Starting to listen on 0.0.0.0:10250
Apr 26 12:22:55 kind-control-plane kubelet[4908]: E0426 12:22:55.918090    4908 event.go:212] Unable to write event: 'Post https://172.17.0.2:6443/api/v1/namespaces/default/events: dial tcp 172.17.0.2:6443: connec
t: connection refused' (may retry after sleeping)
Apr 26 12:22:55 kind-control-plane kubelet[4908]: I0426 12:22:55.918328    4908 fs_resource_analyzer.go:66] Starting FS ResourceAnalyzer
Apr 26 12:22:55 kind-control-plane kubelet[4908]: I0426 12:22:55.918350    4908 status_manager.go:152] Starting to sync pod status with apiserver
Apr 26 12:22:55 kind-control-plane kubelet[4908]: I0426 12:22:55.918371    4908 kubelet.go:1829] Starting kubelet main sync loop.
Apr 26 12:22:55 kind-control-plane kubelet[4908]: I0426 12:22:55.918390    4908 kubelet.go:1846] skipping pod synchronization - [container runtime status check may not have completed yet PLEG is not healthy: pleg 
has yet to be successful]
Apr 26 12:22:55 kind-control-plane kubelet[4908]: I0426 12:22:55.918473    4908 server.go:333] Adding debug handlers to kubelet server.
Apr 26 12:22:55 kind-control-plane kubelet[4908]: I0426 12:22:55.918475    4908 volume_manager.go:248] Starting Kubelet Volume Manager
Apr 26 12:22:55 kind-control-plane kubelet[4908]: I0426 12:22:55.918497    4908 desired_state_of_world_populator.go:130] Desired state populator starts to run
Apr 26 12:22:55 kind-control-plane kubelet[4908]: W0426 12:22:55.921436    4908 cni.go:203] Unable to update cni config: No networks found in /etc/cni/net.d
Apr 26 12:22:55 kind-control-plane kubelet[4908]: E0426 12:22:55.925886    4908 kubelet.go:2192] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin 
is not ready: cni config uninitialized
Apr 26 12:22:55 kind-control-plane kubelet[4908]: W0426 12:22:55.935038    4908 manager.go:349] Could not configure a source for OOM detection, disabling OOM events: open /dev/kmsg: no such file or directory
Apr 26 12:22:55 kind-control-plane kubelet[4908]: I0426 12:22:55.950959    4908 kubelet_node_status.go:278] Setting node annotation to enable volume controller attach/detach
Apr 26 12:22:55 kind-control-plane kubelet[4908]: I0426 12:22:55.952644    4908 cpu_manager.go:155] [cpumanager] starting with none policy
Apr 26 12:22:55 kind-control-plane kubelet[4908]: I0426 12:22:55.952660    4908 cpu_manager.go:156] [cpumanager] reconciling every 10s
Apr 26 12:22:55 kind-control-plane kubelet[4908]: I0426 12:22:55.952670    4908 policy_none.go:42] [cpumanager] none policy: Start
Apr 26 12:22:55 kind-control-plane kubelet[4908]: F0426 12:22:55.953144    4908 kubelet.go:1384] Failed to start ContainerManager [open /proc/sys/vm/overcommit_memory: permission denied, open /proc/sys/kernel/pani
c: permission denied, open /proc/sys/kernel/panic_on_oops: permission denied]
Apr 26 12:22:55 kind-control-plane systemd[1]: kubelet.service: Main process exited, code=exited, status=255/n/a
Apr 26 12:22:55 kind-control-plane systemd[1]: kubelet.service: Failed with result 'exit-code'.

For instance for the above issues where we have permission denied (e.g. /proc/sys/vm/overcommit_memory), i checked that the default user root can open those:

root@kind-control-plane:/# cat /proc/sys/vm/overcommit_memory 
0

Any ideas?

aojea commented 5 years ago

Do you have enough memory? free -m

GrosLalo commented 5 years ago

Yes plenty of memory. 14GB available

BenTheElder commented 5 years ago
Apr 26 12:22:55 kind-control-plane kubelet[4908]: F0426 12:22:55.953144    4908 kubelet.go:1384] Failed to start ContainerManager [open /proc/sys/vm/overcommit_memory: permission denied, open /proc/sys/kernel/pani
c: permission denied, open /proc/sys/kernel/panic_on_oops: permission denied]
Apr 26 12:22:55 kind-control-plane systemd[1]: kubelet.service: Main process exited, code=exited, status=255/n/a

That is definitely the relevant part.

Are you running with selinux or apparmor by any chance? normally I'd expect a --privileged container to be able to ready these (note kubelet is only reading in the default mode)

i checked that the default user root can open those

the question is can a container open those? IE from docker run --privileged -it --rm ubuntu

GrosLalo commented 5 years ago

I can run docker run --privileged -it --rm ubuntu on the host. Perhaps I should mention that the host is an LXD container. But the host is configured correctly (to at least run the above-mentioned --privileged command). E.g:

d run -it --privileged --rm ubuntu
root@4ee4aacd1f88:/# 

Would there be some other test i could perform to see if my host (i.e. the LXD container) is inadequate? Or, other test to see if something is buggy in kind on my setup?

Thanks in advance.

BenTheElder commented 5 years ago

Can you access those paths from within a the privileged container?

Not working inside an LXD container is not surprising. Kubernetes still needs access to the host and the container is probably too restrictive.

On Fri, Apr 26, 2019, 22:34 GrosLalo notifications@github.com wrote:

I can run docker run --privileged -it --rm ubuntu on the host. Perhaps I should mention that the host is an LXD container. But the host is configured correctly (to at least run the above-mentioned --privileged command). E.g:

d run -it --privileged --rm ubuntu root@4ee4aacd1f88:/#

Would there be some other test i could perform to see if my host (i.e. the LXD container) is inadequate? Or, other test to see if something is buggy in kind on my setup?

Thanks in advance.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kubernetes-sigs/kind/issues/455#issuecomment-487256915, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHADK24WAMUIVLTAPEQRMDPSPQXXANCNFSM4HIUPXRQ .

BenTheElder commented 5 years ago

I.E. the test would be:

d run -it --privileged --rm ubuntu
root@4ee4aacd1f88:/# cat /proc/sys/kernel/panic_on_oops
...
root@4ee4aacd1f88:/# cat /proc/sys/vm/overcommit_memory
...
BenTheElder commented 5 years ago

Per the docs at https://help.ubuntu.com/lts/serverguide/lxd.html.en the permissions are likely too restrictive even when trivial "privileged" setting is enabled.

These guides cover some more settings including rw proc and sys, and may be relevant: https://github.com/charmed-kubernetes/bundle/wiki/Deploying-on-LXD#the-profile https://github.com/corneliusweig/kubernetes-lxd

GrosLalo commented 5 years ago

Yes did those tests inside the container and they do not report anything suspicious:

root@kind-control-plane:/# cat /proc/sys/vm/overcommit_memory
0
root@kind-control-plane:/# 
GrosLalo commented 5 years ago

@BenTheElder : The links helped. I realized that the apparmor was impacting. Configuring the container to This issue can be closed.

BenTheElder commented 5 years ago

Excellent! Will note this for the next user, thanks for figuring it out and reporting back! :-)

aojea commented 5 years ago

@GrosLalo it will be nice if you can explain what changes are needed so other users can benefit from your experience

ghost commented 2 years ago

Having the same problem and I'm not sure what you guys are talking about but its not fixed and its likely because /proc/sys/vm/overcommit_memory is being opened rw and it's ro because, it's an LXD container running under shared tenancy and it would be silly for any tenant to just arbitrarily turn this on for the whole system. The /dev/kmsg problem is a separate issue entirely but I got passed it simply linking /dev/kmsg to /dev/console but I'm just saying because I'm not even sure why you linked it; yes both are characteristic of what you can expect to run into trying to run k8s on an lxd container; depends on how the lxd container is set up but I think the implication almost always shared tenancy, can't really imagine why one would in any case but that's definitely the case for Njal.la

yes overcommit_memory is readable

root@mmx-sv-1-1:~/k8s/kubeadm# cat /proc/sys/vm/overcommit_memory
0

no it's not writable

root@mmx-sv-1-1:~/k8s/kubeadm# echo 1> /proc/sys/vm/overcommit_memory
-bash: /proc/sys/vm/overcommit_memory: Permission denied
kubelet
I0427 18:42:07.294984    3123 server.go:446] "Kubelet version" kubeletVersion="v1.23.5"
I0427 18:42:07.295508    3123 server.go:606] "Standalone mode, no API client"
I0427 18:42:07.295760    3123 server.go:662] "Failed to get the kubelet's cgroup. Kubelet system container metrics may be missing." err="cpu and memory cgroup hierarchy not unified.  cpu: /user.slice, memory: /user.slice/user-0.slice/session-19012.scope"
W0427 18:42:07.441128    3123 fs.go:220] stat failed on /dev/rbd1 with error: no such file or directory
I0427 18:42:07.462861    3123 server.go:494] "No api server defined - no events will be sent to API server"
I0427 18:42:07.462924    3123 server.go:693] "--cgroups-per-qos enabled, but --cgroup-root was not specified.  defaulting to /"
I0427 18:42:07.464830    3123 container_manager_linux.go:281] "Container manager verified user specified cgroup-root exists" cgroupRoot=[]
I0427 18:42:07.465107    3123 container_manager_linux.go:286] "Creating Container Manager object based on Node Config" nodeConfig={RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName: ContainerRuntime:docker CgroupsPerQOS:true CgroupRoot:/ CgroupDriver:cgroupfs KubeletRootDir:/var/lib/kubelet ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupName: ReservedSystemCPUs: EnforceNodeAllocatable:map[pods:{}] KubeReserved:map[] SystemReserved:map[] HardEvictionThresholds:[{Signal:memory.available Operator:LessThan Value:{Quantity:100Mi Percentage:0} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.1} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.inodesFree Operator:LessThan Value:{Quantity:<nil> Percentage:0.05} GracePeriod:0s MinReclaim:<nil>} {Signal:imagefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.15} GracePeriod:0s MinReclaim:<nil>}]} QOSReserved:map[] ExperimentalCPUManagerPolicy:none ExperimentalCPUManagerPolicyOptions:map[] ExperimentalTopologyManagerScope:container ExperimentalCPUManagerReconcilePeriod:10s ExperimentalMemoryManagerPolicy:None ExperimentalMemoryManagerReservedMemory:[] ExperimentalPodPidsLimit:-1 EnforceCPULimits:true CPUCFSQuotaPeriod:100ms ExperimentalTopologyManagerPolicy:none}
I0427 18:42:07.465231    3123 topology_manager.go:133] "Creating topology manager with policy per scope" topologyPolicyName="none" topologyScopeName="container"
I0427 18:42:07.465279    3123 container_manager_linux.go:321] "Creating device plugin manager" devicePluginEnabled=true
I0427 18:42:07.465377    3123 state_mem.go:36] "Initialized new in-memory state store"
I0427 18:42:07.666259    3123 server.go:799] "Failed to ApplyOOMScoreAdj" err="write /proc/self/oom_score_adj: permission denied"
I0427 18:42:07.666362    3123 kubelet.go:313] "Using dockershim is deprecated, please consider using a full-fledged CRI implementation"
I0427 18:42:07.666477    3123 client.go:80] "Connecting to docker on the dockerEndpoint" endpoint="unix:///var/run/docker.sock"
I0427 18:42:07.666531    3123 client.go:99] "Start docker client with request timeout" timeout="2m0s"
I0427 18:42:07.689625    3123 docker_service.go:571] "Hairpin mode is set but kubenet is not enabled, falling back to HairpinVeth" hairpinMode=promiscuous-bridge
I0427 18:42:07.689712    3123 docker_service.go:243] "Hairpin mode is set" hairpinMode=hairpin-veth
I0427 18:42:07.703239    3123 docker_service.go:258] "Docker cri networking managed by the network plugin" networkPluginName="kubernetes.io/no-op"
I0427 18:42:07.727016    3123 docker_service.go:264] "Docker Info" dockerInfo=&{ID:5TPK:ALOO:5AUM:GKD6:G3V2:UPVO:WPT7:ZSQA:VS5O:BDNA:ZMK6:6RJQ Containers:0 ContainersRunning:0 ContainersPaused:0 ContainersStopped:0 Images:0 Driver:btrfs DriverStatus:[[Build Version Btrfs v5.4.1 ] [Library Version 102]] SystemStatus:[] Plugins:{Volume:[local] Network:[bridge host ipvlan macvlan null overlay] Authorization:[] Log:[awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog]} MemoryLimit:true SwapLimit:false KernelMemory:true KernelMemoryTCP:true CPUCfsPeriod:true CPUCfsQuota:true CPUShares:true CPUSet:true PidsLimit:true IPv4Forwarding:true BridgeNfIptables:true BridgeNfIP6tables:true Debug:false NFd:25 OomKillDisable:true NGoroutines:34 SystemTime:2022-04-27T18:42:07.705303502Z LoggingDriver:json-file CgroupDriver:cgroupfs CgroupVersion:1 NEventsListener:0 KernelVersion:5.4.0-96-generic OperatingSystem:Ubuntu 20.04.4 LTS OSVersion:20.04 OSType:linux Architecture:x86_64 IndexServerAddress:https://index.docker.io/v1/ RegistryConfig:0xc000152a10 NCPU:1 MemTotal:1536000000 GenericResources:[] DockerRootDir:/var/lib/docker HTTPProxy: HTTPSProxy: NoProxy: Name:mmx-sv-1-1 Labels:[] ExperimentalBuild:false ServerVersion:20.10.14 ClusterStore: ClusterAdvertise: Runtimes:map[io.containerd.runc.v2:{Path:runc Args:[] Shim:<nil>} io.containerd.runtime.v1.linux:{Path:runc Args:[] Shim:<nil>} runc:{Path:runc Args:[] Shim:<nil>}] DefaultRuntime:runc Swarm:{NodeID: NodeAddr: LocalNodeState:inactive ControlAvailable:false Error: RemoteManagers:[] Nodes:0 Managers:0 Cluster:<nil> Warnings:[]} LiveRestoreEnabled:false Isolation: InitBinary:docker-init ContainerdCommit:{ID:3df54a852345ae127d1fa3092b95168e4a88e2f8 Expected:3df54a852345ae127d1fa3092b95168e4a88e2f8} RuncCommit:{ID:v1.0.3-0-gf46b6ba Expected:v1.0.3-0-gf46b6ba} InitCommit:{ID:de40ad0 Expected:de40ad0} SecurityOptions:[name=apparmor name=seccomp,profile=default] ProductLicense: DefaultAddressPools:[] Warnings:[WARNING: No swap limit support]}
I0427 18:42:07.727095    3123 docker_service.go:279] "Setting cgroupDriver" cgroupDriver="cgroupfs"
I0427 18:42:07.781940    3123 kubelet.go:422] "Kubelet is running in standalone mode, will skip API server sync"
I0427 18:42:07.816287    3123 kuberuntime_manager.go:249] "Container runtime initialized" containerRuntime="docker" version="20.10.14" apiVersion="1.41.0"
I0427 18:42:07.816749    3123 volume_host.go:75] "KubeClient is nil. Skip initialization of CSIDriverLister"
W0427 18:42:07.817320    3123 csi_plugin.go:189] kubernetes.io/csi: kubeclient not set, assuming standalone kubelet
W0427 18:42:07.817372    3123 csi_plugin.go:268] Skipping CSINode initialization, kubelet running in standalone mode
I0427 18:42:07.817972    3123 server.go:1231] "Started kubelet"
E0427 18:42:07.818842    3123 kubelet.go:1351] "Image garbage collection failed once. Stats initialization may not have completed yet" err="failed to get imageFs info: unable to find data in memory cache"
I0427 18:42:07.818918    3123 kubelet.go:1461] "No API server defined - no node status update will be sent"
I0427 18:42:07.820899    3123 fs_resource_analyzer.go:67] "Starting FS ResourceAnalyzer"
I0427 18:42:07.827882    3123 server.go:150] "Starting to listen" address="0.0.0.0" port=10250
I0427 18:42:07.829990    3123 server.go:177] "Starting to listen read-only" address="0.0.0.0" port=10255
I0427 18:42:07.832574    3123 volume_manager.go:291] "Starting Kubelet Volume Manager"
I0427 18:42:07.832905    3123 desired_state_of_world_populator.go:147] "Desired state populator starts to run"
I0427 18:42:07.862412    3123 server.go:410] "Adding debug handlers to kubelet server"
I0427 18:42:07.955695    3123 kubelet_network_linux.go:57] "Initialized protocol iptables rules." protocol=IPv4
I0427 18:42:07.990693    3123 kubelet_network_linux.go:57] "Initialized protocol iptables rules." protocol=IPv6
I0427 18:42:07.990754    3123 status_manager.go:155] "Kubernetes client is nil, not starting status manager"
I0427 18:42:07.990780    3123 kubelet.go:1977] "Starting kubelet main sync loop"
E0427 18:42:07.990926    3123 kubelet.go:2001] "Skipping pod synchronization" err="[container runtime status check may not have completed yet, PLEG is not healthy: pleg has yet to be successful]"
I0427 18:42:08.040298    3123 reconciler.go:157] "Reconciler: start to sync state"
E0427 18:42:08.044270    3123 manager.go:1123] Failed to create existing container: /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod7b1e277fe04ab5a7cc183178f1390653.slice/crio-c7f48c13e416593348a8af9ce329d167c097094a590e9f656dbadc303e2e88c6.scope: Error finding container c7f48c13e416593348a8af9ce329d167c097094a590e9f656dbadc303e2e88c6: Status 404 returned error &{%!s(*http.body=&{0xc000a32780 <nil> <nil> false false {0 0} false false false <nil>}) {%!s(int32=0) %!s(uint32=0)} %!s(bool=false) <nil> %!s(func(error) error=0x840ee0) %!s(func() error=0x840fe0)}
E0427 18:42:08.050245    3123 manager.go:1123] Failed to create existing container: /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod4c8aa9f48623623000a647abc411dc89.slice/crio-f0f4112818332324729fe67ab9d82c76135c02e73cdc0d1a39c21aba6af8b679.scope: Error finding container f0f4112818332324729fe67ab9d82c76135c02e73cdc0d1a39c21aba6af8b679: Status 404 returned error &{%!s(*http.body=&{0xc000a32a68 <nil> <nil> false false {0 0} false false false <nil>}) {%!s(int32=0) %!s(uint32=0)} %!s(bool=false) <nil> %!s(func(error) error=0x840ee0) %!s(func() error=0x840fe0)}
W0427 18:42:08.064917    3123 fs.go:599] Unable to get btrfs mountpoint IDs: stat failed on /dev/rbd1 with error: no such file or directory
E0427 18:42:08.066476    3123 manager.go:1123] Failed to create existing container: /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podcc201ef9fcb4aee2702f7bc84f26bc75.slice/crio-d11d1443aff23756853b381315dd081924d6ed95505a829f89adbe164e934258.scope: Error finding container d11d1443aff23756853b381315dd081924d6ed95505a829f89adbe164e934258: Status 404 returned error &{%!s(*http.body=&{0xc000a33278 <nil> <nil> false false {0 0} false false false <nil>}) {%!s(int32=0) %!s(uint32=0)} %!s(bool=false) <nil> %!s(func(error) error=0x840ee0) %!s(func() error=0x840fe0)}
W0427 18:42:08.083684    3123 fs.go:599] Unable to get btrfs mountpoint IDs: stat failed on /dev/rbd1 with error: no such file or directory
E0427 18:42:08.085737    3123 manager.go:1123] Failed to create existing container: /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod2d4e01f79cbfae2a75046c3cc9635c82.slice/crio-ce43f8914e57a4d763d1b6f16725e5903d2cf7eeb4623ab652b0a7f9d45fc4c9.scope: Error finding container ce43f8914e57a4d763d1b6f16725e5903d2cf7eeb4623ab652b0a7f9d45fc4c9: Status 404 returned error &{%!s(*http.body=&{0xc000a33a40 <nil> <nil> false false {0 0} false false false <nil>}) {%!s(int32=0) %!s(uint32=0)} %!s(bool=false) <nil> %!s(func(error) error=0x840ee0) %!s(func() error=0x840fe0)}
E0427 18:42:08.092983    3123 kubelet.go:2001] "Skipping pod synchronization" err="container runtime status check may not have completed yet"
W0427 18:42:08.116499    3123 fs.go:599] Unable to get btrfs mountpoint IDs: stat failed on /dev/rbd1 with error: no such file or directory
W0427 18:42:08.165700    3123 fs.go:599] Unable to get btrfs mountpoint IDs: stat failed on /dev/rbd1 with error: no such file or directory
E0427 18:42:08.227954    3123 container_manager_linux.go:104] "Unable to ensure the docker processes run in the desired containers" err="[errors moving \"dockerd\" pid: failed to apply oom score -999 to PID 1785: write /proc/1785/oom_score_adj: permission denied, errors moving \"containerd\" pid: failed to apply oom score -999 to PID 1777: write /proc/1777/oom_score_adj: permission denied]"
E0427 18:42:08.286951    3123 manager.go:1123] Failed to create existing container: /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod2d4e01f79cbfae2a75046c3cc9635c82.slice/crio-ce43f8914e57a4d763d1b6f16725e5903d2cf7eeb4623ab652b0a7f9d45fc4c9.scope: Error finding container ce43f8914e57a4d763d1b6f16725e5903d2cf7eeb4623ab652b0a7f9d45fc4c9: Status 404 returned error &{%!s(*http.body=&{0xc00031c840 <nil> <nil> false false {0 0} false false false <nil>}) {%!s(int32=0) %!s(uint32=0)} %!s(bool=false) <nil> %!s(func(error) error=0x840ee0) %!s(func() error=0x840fe0)}
E0427 18:42:08.290334    3123 manager.go:1123] Failed to create existing container: /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod7b1e277fe04ab5a7cc183178f1390653.slice/crio-c7f48c13e416593348a8af9ce329d167c097094a590e9f656dbadc303e2e88c6.scope: Error finding container c7f48c13e416593348a8af9ce329d167c097094a590e9f656dbadc303e2e88c6: Status 404 returned error &{%!s(*http.body=&{0xc00031d560 <nil> <nil> false false {0 0} false false false <nil>}) {%!s(int32=0) %!s(uint32=0)} %!s(bool=false) <nil> %!s(func(error) error=0x840ee0) %!s(func() error=0x840fe0)}
E0427 18:42:08.293177    3123 kubelet.go:2001] "Skipping pod synchronization" err="container runtime status check may not have completed yet"
E0427 18:42:08.293281    3123 manager.go:1123] Failed to create existing container: /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podcc201ef9fcb4aee2702f7bc84f26bc75.slice/crio-d11d1443aff23756853b381315dd081924d6ed95505a829f89adbe164e934258.scope: Error finding container d11d1443aff23756853b381315dd081924d6ed95505a829f89adbe164e934258: Status 404 returned error &{%!s(*http.body=&{0xc00031df20 <nil> <nil> false false {0 0} false false false <nil>}) {%!s(int32=0) %!s(uint32=0)} %!s(bool=false) <nil> %!s(func(error) error=0x840ee0) %!s(func() error=0x840fe0)}
E0427 18:42:08.296170    3123 manager.go:1123] Failed to create existing container: /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod4c8aa9f48623623000a647abc411dc89.slice/crio-f0f4112818332324729fe67ab9d82c76135c02e73cdc0d1a39c21aba6af8b679.scope: Error finding container f0f4112818332324729fe67ab9d82c76135c02e73cdc0d1a39c21aba6af8b679: Status 404 returned error &{%!s(*http.body=&{0xc000b14468 <nil> <nil> false false {0 0} false false false <nil>}) {%!s(int32=0) %!s(uint32=0)} %!s(bool=false) <nil> %!s(func(error) error=0x840ee0) %!s(func() error=0x840fe0)}
W0427 18:42:08.305712    3123 manager.go:1176] Failed to process watch event {EventType:0 Name:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod2d4e01f79cbfae2a75046c3cc9635c82.slice/crio-ce43f8914e57a4d763d1b6f16725e5903d2cf7eeb4623ab652b0a7f9d45fc4c9.scope WatchSource:0}: Error finding container ce43f8914e57a4d763d1b6f16725e5903d2cf7eeb4623ab652b0a7f9d45fc4c9: Status 404 returned error &{%!s(*http.body=&{0xc000b14858 <nil> <nil> false false {0 0} false false false <nil>}) {%!s(int32=0) %!s(uint32=0)} %!s(bool=false) <nil> %!s(func(error) error=0x840ee0) %!s(func() error=0x840fe0)}
W0427 18:42:08.308772    3123 manager.go:1176] Failed to process watch event {EventType:0 Name:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod4c8aa9f48623623000a647abc411dc89.slice/crio-f0f4112818332324729fe67ab9d82c76135c02e73cdc0d1a39c21aba6af8b679.scope WatchSource:0}: Error finding container f0f4112818332324729fe67ab9d82c76135c02e73cdc0d1a39c21aba6af8b679: Status 404 returned error &{%!s(*http.body=&{0xc000b14b40 <nil> <nil> false false {0 0} false false false <nil>}) {%!s(int32=0) %!s(uint32=0)} %!s(bool=false) <nil> %!s(func(error) error=0x840ee0) %!s(func() error=0x840fe0)}
W0427 18:42:08.311869    3123 manager.go:1176] Failed to process watch event {EventType:0 Name:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod7b1e277fe04ab5a7cc183178f1390653.slice/crio-c7f48c13e416593348a8af9ce329d167c097094a590e9f656dbadc303e2e88c6.scope WatchSource:0}: Error finding container c7f48c13e416593348a8af9ce329d167c097094a590e9f656dbadc303e2e88c6: Status 404 returned error &{%!s(*http.body=&{0xc000b14e28 <nil> <nil> false false {0 0} false false false <nil>}) {%!s(int32=0) %!s(uint32=0)} %!s(bool=false) <nil> %!s(func(error) error=0x840ee0) %!s(func() error=0x840fe0)}
W0427 18:42:08.315022    3123 manager.go:1176] Failed to process watch event {EventType:0 Name:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podcc201ef9fcb4aee2702f7bc84f26bc75.slice/crio-d11d1443aff23756853b381315dd081924d6ed95505a829f89adbe164e934258.scope WatchSource:0}: Error finding container d11d1443aff23756853b381315dd081924d6ed95505a829f89adbe164e934258: Status 404 returned error &{%!s(*http.body=&{0xc000b15170 <nil> <nil> false false {0 0} false false false <nil>}) {%!s(int32=0) %!s(uint32=0)} %!s(bool=false) <nil> %!s(func(error) error=0x840ee0) %!s(func() error=0x840fe0)}
I0427 18:42:08.342375    3123 cpu_manager.go:213] "Starting CPU manager" policy="none"
I0427 18:42:08.342452    3123 cpu_manager.go:214] "Reconciling" reconcilePeriod="10s"
I0427 18:42:08.342531    3123 state_mem.go:36] "Initialized new in-memory state store"
I0427 18:42:08.343022    3123 state_mem.go:88] "Updated default CPUSet" cpuSet=""
I0427 18:42:08.343083    3123 state_mem.go:96] "Updated CPUSet assignments" assignments=map[]
I0427 18:42:08.343108    3123 policy_none.go:49] "None policy: Start"
I0427 18:42:08.346001    3123 memory_manager.go:168] "Starting memorymanager" policy="None"
I0427 18:42:08.346094    3123 state_mem.go:35] "Initializing new in-memory state store"
I0427 18:42:08.346469    3123 state_mem.go:75] "Updated machine memory state"
W0427 18:42:08.346560    3123 fs.go:599] Unable to get btrfs mountpoint IDs: stat failed on /dev/rbd1 with error: no such file or directory
E0427 18:42:08.347846    3123 container_manager_linux.go:457] "Updating kernel flag failed (Hint: enable KubeletInUserNamespace feature flag to ignore the error)" err="open /proc/sys/vm/overcommit_memory: permission denied" flag="vm/overcommit_memory"
E0427 18:42:08.348059    3123 container_manager_linux.go:457] "Updating kernel flag failed (Hint: enable KubeletInUserNamespace feature flag to ignore the error)" err="open /proc/sys/kernel/panic: permission denied" flag="kernel/panic"
E0427 18:42:08.348177    3123 container_manager_linux.go:457] "Updating kernel flag failed (Hint: enable KubeletInUserNamespace feature flag to ignore the error)" err="open /proc/sys/kernel/panic_on_oops: permission denied" flag="kernel/panic_on_oops"
E0427 18:42:08.348312    3123 kubelet.go:1431] "Failed to start ContainerManager" err="[open /proc/sys/vm/overcommit_memory: permission denied, open /proc/sys/kernel/panic: permission denied, open /proc/sys/kernel/panic_on_oops: permission denied]"

Also I would just merely point out that the line number specified in the kubelet.go file points to a line being the source code that actually reports the error which isn't all that helpful:

https://github.com/kubernetes/kubernetes/blob/v1.23.5/pkg/kubelet/kubelet.go#L1431

but I guess just in case you wanna see some error reporting code hey why not but thats sorta like the golang equiv to the old object oriented reporting the outer exception but not the inner. Was hoping to see how the file was being opened (RO/RW) but I donno where to find it so that's on you

root@mmx-sv-1-1:~/k8s/kubeadm# kubelet --version
Kubernetes v1.23.5

trying to initialize kubelet with

apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
nodeRegistration:
  criSocket: "unix:///var/run/crio/crio.sock"
  localAPIEndpoint:
    advertiseAddress: "100.68.0.1"
    bindPort: 6443
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
featureGates:
  KubeletInUserNamespace: true
cgroupDriver: "systemd"
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
networking:
  podSubnet: "100.64.16.0/20"
  serviceSubnet: "100.64.32.0/20"
  dnsDomain: "mmx-sv-1-1.insecurity.cloud"

containerd toml:

disabled_plugins = []

[plugins."io.containerd.grpc.v1.cri"]
  restrict_oom_score_adj = true

[plugins."io.containerd.grpc.v1.cri".containerd]
  snapshotter = "fuse-overlayfs"
BenTheElder commented 2 years ago

Hi, we do not have the bandwidth to actively support / develop / test LXD environment. To my knowledge Kubernetes does not either.

Most of our users run docker or podman in VMs, or on their developer machines.


it's an LXD container running under shared tenancy and it would be silly for any tenant to just arbitrarily turn this on for the whole system.

You should probably not try to run kubernetes/kind under shared tenancy unless that tenancy is done by way of VMs.

Also I would just merely point out that the line number specified in the kubelet.go file points to a line being the source code that actually reports the error which isn't all that helpful:

https://github.com/kubernetes/kubernetes/blob/v1.23.5/pkg/kubelet/kubelet.go#L1431

but I guess just in case you wanna see some error reporting code hey why not but thats sorta like the golang equiv to the old object oriented reporting the outer exception but not the inner. Was hoping to see how the file was being opened (RO/RW) but I donno where to find it so that's on you

Kubelet behavior is not something we own, we do our best to make KIND meet kubelet's expectations. Go's error management style is certainly out of scope for this project.

The /dev/kmsg problem is a separate issue entirely but I got passed it simply linking /dev/kmsg to /dev/console but I'm just saying because I'm not even sure why you linked it; yes both are characteristic of what you can expect to run into trying to run k8s on an lxd container; depends on how the lxd container is set up but I think the implication almost always shared tenancy, can't really imagine why one would in any case but that's definitely the case for Njal.la

Yes, the /dev/kmem symlink is a bad hack, we should probably drop it or just create an empty file there.

ghost commented 2 years ago

would be nice if that were more apparent from the start I wouldn't have wasted my time, and yeah I'm not really a fan of the idea of a "LXD vps" either but it's the way nja.la does things and to be fair it's probably what you would consider a modern openvz vps but yeah I know what you mean, still though it's what I got and there's nothing between then and now that really said it just simply won't work, quite the opposite given the number of feature gates and options that would seem to indicate that it's possible to do it. You guys might consider making that behavior a little more definitive though like an option that you have to override to try to run it on LXD.

Yeah I know it's nice to run on a kvm vm if you have one I have a kubic VM running

BenTheElder commented 2 years ago

We've accepted LXD related fixes in the past and known it to work, this is the first we've heard in a long time otherwise. But we don't particularly have the resources to follow up ourselves.

I would say Kubernetes in general hasn't make statements about LXD for the same reasons.

The project is not monolithic, the KIND maintainers do not own kubelet's behavior so we can't directly change those requirements. The host kernel etc must be kubernetes compatible, which generally is the case on Linux but it's possible not to be.