pingcap / tidb-operator

TiDB operator creates and manages TiDB clusters running in Kubernetes.
https://docs.pingcap.com/tidb-in-kubernetes/
Apache License 2.0
1.23k stars 498 forks source link

tkctl support for containerd based K8s setup #3552

Open rachitmanit opened 3 years ago

rachitmanit commented 3 years ago

Feature Request

tkctl tool does not work in K8s+containerd setup.

tkctl debug basic-pd-0 --image=pingcap/tidb-debug:latest --launcher-image=pingcap/debug-launcher:latest --docker-socketl=/var/run/containerd/containerd.sock

Error: F1203 14:51:23.617165 74503 helpers.go:114] error: pod ran to completion

This functionality will be much required going forward: K8s Dockershim Deprecation

DanielZhangQD commented 3 years ago

@rachitmanit The tkctl tool has not been maintained for a long time, maybe you can try the function of kubectl here.

rachitmanit commented 3 years ago

@DanielZhangQD thanks for suggesting an alternative. I found error using this.

Setup: I had setup K8s using Kind. (I have enabled featureGates EphemeralContainers: true, following: https://kind.sigs.k8s.io/docs/user/quick-start/#enable-feature-gates-in-your-cluster)

I found that kubectl debug is currently using capabilities of: https://github.com/aylei/kubectl-debug As by default debug command in unknown to kubectl as per version:

Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.3", GitCommit:"1e11e4a2108024935ecfcb2912226cedeafd99df", GitTreeState:"clean", BuildDate:"2020-10-14T12:50:19Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}

Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.1", GitCommit:"206bcadf021e76c27513500ca24182692aabd17e", GitTreeState:"clean", BuildDate:"2020-09-14T07:30:52Z", GoVersion:"go1.15", Compiler:"gc", Platform:"linux/amd64"}

kubectl cluster-info

Kubernetes master is running at https://127.0.0.1:40253
KubeDNS is running at https://127.0.0.1:40253/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

kubectl debug basic-pd-0 --image=localhost:5000/busybox:1.26.2 -n tidb-cluster -a --agent-image localhost:5000/aylei/debug-agent

Agent Pod info: [Name:debug-agent-pod-05003aff-3b92-11eb-8477-02010a224f62, Namespace:default, Image:localhost:5000/aylei/debug-agent, HostPort:10027, ContainerPort:10027]
Waiting for pod debug-agent-pod-05003aff-3b92-11eb-8477-02010a224f62 to run...
Start deleting agent pod basic-pd-0
error execute remote, unable to upgrade connection: Failed to construct RuntimeManager.  Error- failed to dial "/run/containerd/containerd.sock"- context deadline exceeded
error: unable to upgrade connection: Failed to construct RuntimeManager.  Error- failed to dial "/run/containerd/containerd.sock"- context deadline exceeded

Can you help here what could be the issue?

rachitmanit commented 3 years ago

Additional logs from the pod:

kubectl -n default logs -f debug-agent-pod-92ea2e04-3b9f-11eb-987b-02010a224f62

+ /usr/bin/nsenter -m/proc/1/ns/mnt -- fusermount -u /var/lib/lxc/lxcfs
+ true
+ /usr/bin/nsenter -m/proc/1/ns/mnt -- '[' -L /etc/mtab ]
nsenter: setns(): can't reassociate to namespace 'mnt': Operation not permitted
+ sed -i '/^lxcfs \/var\/lib\/lxc\/lxcfs fuse.lxcfs/d' /etc/mtab
+ /usr/bin/nsenter -m/proc/1/ns/mnt -- mkdir -p /var/lib/lxc/lxcfs
nsenter: setns(): can't reassociate to namespace 'mnt': Operation not permitted
+ LXCFS_USR=/usr/bin/lxcfs
+ LXCFS=/usr/local/bin/lxcfs
+ /usr/bin/nsenter -m/proc/1/ns/mnt -- '[' -f /usr/bin/lxcfs ]
nsenter: setns(): can't reassociate to namespace 'mnt': Operation not permitted
+ grep -q io.containerd.runtime.v1.linux /proc/0/cmdline
+ exec /usr/bin/nsenter -m/proc/1/ns/mnt -- /usr/local/bin/lxcfs -p /run/lxcfs-1.pid /var/lib/lxc/lxcfs/
nsenter: setns(): can't reassociate to namespace 'mnt': Operation not permitted
grep: /proc/0/cmdline: No such file or directory
+ /bin/debug-agent
No config file provided.  Using all default values.
2020/12/11 10:56:52 server.go:38: Listening on 0.0.0.0:10027
2020/12/11 10:56:53 server.go:70: receive debug request
2020/12/11 10:57:03 server.go:132: Failed to construct RuntimeManager.  Error: failed to dial "/run/containerd/containerd.sock": context deadline exceeded
rachitmanit commented 3 years ago

Got it. This is an alpha feature in current version. Command that worked

kubectl alpha debug -it basic-pd-0 --image=localhost:5000/pingcap/tidb-debug -n tidb-cluster

DanielZhangQD commented 3 years ago

@rachitmanit Would you like to submit PR to implement this feature?