Closed mind-mind closed 3 years ago
I tried your YAML file 1.yaml, it has the same problem. I think command might have a problem. I will try another YAML file without a command to prove it.
Using commands did work in my tests. As far as I remember, all files in /evaluation worked fine. It might be tty: true
or the security context that causes the migration to fail. Have you checked the kubelet logs on both machines?
I did not check the logs yet. I will check and come back. I think I figure it out. Thank you. I will test and come back.
Using commands did work in my tests. As far as I remember, all files in /evaluation worked fine. It might be
tty: true
or the security context that causes the migration to fail. Have you checked the kubelet logs on both machines?
I test simple.yaml on evaluation, but got this.
error: unable to recognize "simple.yaml": no matches for kind "MigratingPod" in version "podmig.schrej.net/v1"
I guess I might install something to use MigratingPod right?
I already check kubelet logs on both machines
For me, I think it normal but doesn't work.
This is 1.yaml file, but 2.yaml has no problem on my side before. However, 2.yaml has the same problem now.
The source logs is 1 2 3 4 .. 16
Here is kubelet log in the destination pod.
failed to try resolving symlinks in path "/var/log/pods/default_simple-migration-28_4212e1bd-24a0-4c6f-9c0e-054d3561fef1/count/6.log": lstat /var/log/pods/default_simple-migration-28_4212e1bd-24a0-4c6f-9c0e-054d3561fef1/count/6.log: no such file or directory
worker 1
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Wed 2021-05-26 14:41:44 UTC; 56min ago
Docs: https://kubernetes.io/docs/home/
Main PID: 5878 (kubelet)
Tasks: 14 (limit: 1140)
CGroup: /system.slice/kubelet.service
└─5878 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/e
May 26 15:37:09 w1 kubelet[5878]: I0526 15:37:09.533550 5878 kuberuntime_manager.go:841] Should we migrate?Pe
May 26 15:37:15 w1 kubelet[5878]: I0526 15:37:15.663664 5878 kuberuntime_manager.go:841] Should we migrate?Pe
May 26 15:37:16 w1 kubelet[5878]: I0526 15:37:16.668478 5878 kuberuntime_manager.go:841] Should we migrate?Pe
May 26 15:37:22 w1 kubelet[5878]: I0526 15:37:22.458500 5878 kuberuntime_manager.go:841] Should we migrate?Ru
May 26 15:37:40 w1 kubelet[5878]: I0526 15:37:40.961927 5878 kuberuntime_manager.go:841] Should we migrate?Ru
May 26 15:37:41 w1 kubelet[5878]: I0526 15:37:41.696242 5878 kubelet.go:1505] Checkpoint the firstime running
May 26 15:37:41 w1 kubelet[5878]: E0526 15:37:41.696913 5878 remote_runtime.go:289] CheckpointContainer "8390
May 26 15:37:41 w1 kubelet[5878]: I0526 15:37:41.697614 5878 kuberuntime_manager.go:841] Should we migrate?Ru
May 26 15:37:48 w1 kubelet[5878]: I0526 15:37:48.216159 5878 kubelet.go:1505] Checkpoint the firstime running
May 26 15:37:48 w1 kubelet[5878]: E0526 15:37:48.217212 5878 remote_runtime.go:289] CheckpointContainer "8390
Worker2
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Wed 2021-05-26 14:41:43 UTC; 56min ago
Docs: https://kubernetes.io/docs/home/
Main PID: 5993 (kubelet)
Tasks: 13 (limit: 1140)
CGroup: /system.slice/kubelet.service
└─5993 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfi
May 26 15:37:41 w2 kubelet[5993]: W0526 15:37:41.058713 5993 watcher.go:87] Error while processing event
May 26 15:37:41 w2 kubelet[5993]: W0526 15:37:41.060322 5993 watcher.go:87] Error while processing event
May 26 15:37:41 w2 kubelet[5993]: W0526 15:37:41.060554 5993 watcher.go:87] Error while processing event
May 26 15:37:41 w2 kubelet[5993]: W0526 15:37:41.060765 5993 watcher.go:87] Error while processing event
May 26 15:37:41 w2 kubelet[5993]: I0526 15:37:41.565270 5993 kuberuntime_manager.go:841] Should we migrat
May 26 15:37:47 w2 kubelet[5993]: E0526 15:37:47.309993 5993 remote_runtime.go:306] RestoreContainer "b3a
May 26 15:37:47 w2 kubelet[5993]: I0526 15:37:47.915065 5993 topology_manager.go:219] [topologymanager] R
May 26 15:37:47 w2 kubelet[5993]: I0526 15:37:47.915884 5993 kuberuntime_manager.go:841] Should we migrat
May 26 15:37:49 w2 kubelet[5993]: E0526 15:37:49.964690 5993 remote_runtime.go:306] RestoreContainer "2b3
May 26 15:37:50 w2 kubelet[5993]: I0526 15:37:50.615102 5993 kuberuntime_manager.go:841] Should we migrat
but
make run
2021-05-26T15:25:27.781Z INFO controller-runtime.metrics metrics server is starting to listen {"addr": ":8081"}
2021-05-26T15:25:27.782Z INFO setup starting manager
2021-05-26T15:25:27.883Z INFO controller-runtime.manager starting metrics server{"path": "/metrics"}
2021-05-26T15:25:27.883Z INFO controller Starting EventSource {"reconcilerGroup": "podmig.dcn.ssu.ac.kr", "reconcilerKind": "Podmigration", "controller": "podmigration", "source": "kind source: /, Kind="}
2021-05-26T15:25:27.984Z INFO controller Starting Controller {"reconcilerGroup": "podmig.dcn.ssu.ac.kr", "reconcilerKind": "Podmigration", "controller": "podmigration"}
2021-05-26T15:25:27.984Z INFO controller Starting workers {"reconcilerGroup": "podmig.dcn.ssu.ac.kr", "reconcilerKind": "Podmigration", "controller": "podmigration", "worker count": 1}
2021-05-26T15:37:40.947Z INFO controllers.Podmigration {"podmigration": "default/simple-migration-controller-18", "print test": {"sourcePod":"simple","destHost":"w2","selector":{"matchLabels":{"podmig":"dcn"}},"template":{"metadata":{"creationTimestamp":null},"spec":{"containers":[]}},"action":"live-migration"}}
2021-05-26T15:37:40.949Z INFO controllers.Podmigration {"podmigration": "default/simple-migration-controller-18", "annotations ": ""}
2021-05-26T15:37:40.949Z INFO controllers.Podmigration {"podmigration": "default/simple-migration-controller-18", "number of existing pod ": 0}
2021-05-26T15:37:40.949Z INFO controllers.Podmigration {"podmigration": "default/simple-migration-controller-18", "desired pod ": {"namespace": "default", "name": ""}}
2021-05-26T15:37:40.949Z INFO controllers.Podmigration {"podmigration": "default/simple-migration-controller-18", "number of desired pod ": 0}
2021-05-26T15:37:40.950Z INFO controllers.Podmigration {"podmigration": "default/simple-migration-controller-18", "number of actual running pod ": 0}
2021-05-26T15:37:40.974Z INFO controllers.Podmigration {"podmigration": "default/simple-migration-controller-18", "Live-migration": "Step 1 - Check source pod is exist or not - completed"}
2021-05-26T15:37:40.974Z INFO controllers.Podmigration {"podmigration": "default/simple-migration-controller-18", "sourcePod ok ": {"apiVersion": "v1", "kind": "Pod", "namespace": "default", "name": "simple"}}
2021-05-26T15:37:40.974Z INFO controllers.Podmigration {"podmigration": "default/simple-migration-controller-18", "sourcePod status ": "Running"}
2021-05-26T15:37:40.981Z INFO controllers.Podmigration {"podmigration": "default/simple-migration-controller-18", "Live-migration": "Step 2 - checkpoint source Pod - completed"}
2021-05-26T15:37:40.981Z INFO controllers.Podmigration {"podmigration": "default/simple-migration-controller-18", "live-migration pod": "count"}
2021-05-26T15:37:40.981Z INFO controllers.Podmigration {"podmigration": "default/simple-migration-controller-18", "Live-migration": "checkpointPath/var/lib/kubelet/migration/kkk/simple"}
2021-05-26T15:37:40.981Z INFO controllers.Podmigration {"podmigration": "default/simple-migration-controller-18", "Live-migration": "Step 3 - Wait until checkpoint info are created - completed"}
2021-05-26T15:37:40.988Z INFO controllers.Podmigration {"podmigration": "default/simple-migration-controller-18", "Live-migration": "Step 4 - Restore destPod from sourcePod's checkpointed info - completed"}
2021-05-26T15:37:48.210Z INFO controllers.Podmigration {"podmigration": "default/simple-migration-controller-18", "Live-migration": "Step 4.1 - Check whether if newPod is Running or not - completedsimple-migration-28Running"}
2021-05-26T15:37:48.210Z INFO controllers.Podmigration {"podmigration": "default/simple-migration-controller-18", "Live-migration": "Step 4.1 - Check whether if newPod is Running or not - completed"}
2021-05-26T15:37:48.216Z INFO controllers.Podmigration {"podmigration": "default/simple-migration-controller-18", "Live-migration": "Step 6 - Delete the source pod - completed"}
2021-05-26T15:37:48.216Z DEBUG controller Successfully Reconciled {"reconcilerGroup": "podmig.dcn.ssu.ac.kr", "reconcilerKind": "Podmigration", "controller": "podmigration", "name": "simple-migration-controller-18", "namespace": "default"}
go run ./api-server/cmd/main.go
2021-05-26T15:25:43.176Z INFO podmigration-cp.run starting api-server manager
2021-05-26T15:25:43.177Z INFO api-server Starting api-server {"interface": "0.0.0.0", "port": ":5000"}
&{simple-migration-controller-18 w2 0 &LabelSelector{MatchLabels:map[string]string{podmig: dcn,},MatchExpressions:[]LabelSelectorRequirement{},} live-migration simple {{ 0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] [] []} {[] [] [] [] <nil> <nil> map[] <nil> false false false <nil> nil [] nil [] [] <nil> nil [] <nil> <nil> <nil> map[] [] <nil> }} <nil>}
simple
I found a problem now. Thank you
Oh sorry, I already had an answer half typed but got distracted. Glad that you've solved it in the meantime!
Well, I got another problem actually, After fixing this, I got
May 28 06:17:29 w2 kubelet[7436]: E0528 06:17:29.576763 7436 remote_runtime.go:306] RestoreContainer “090010a838376a329cfe2668559c46ab1d2a64306108a75f50daec136ea7efe0” from runtime service failed: rpc error: code = Unknown desc = failed to restore container: failed to start containerd task “090010a838376a329cfe2668559c46ab1d2a64306108a75f50daec136ea7efe0": OCI runtime restore failed: open /var/lib/kubelet/migration/ooo/video/vlc/descriptors.json: no such file or directory: unknown
May 28 06:17:30 w2 kubelet[7436]: I0528 06:17:30.452018 7436 topology_manager.go:219] [topologymanager] RemoveContainer - Container ID: 6b2d7355cd9271dec30c75d75b344edf519f11c04d6da69f4e15142daeaac79b
May 28 06:17:30 w2 kubelet[7436]: I0528 06:17:30.452769 7436 topology_manager.go:219] [topologymanager] RemoveContainer - Container ID: 090010a838376a329cfe2668559c46ab1d2a64306108a75f50daec136ea7efe0
May 28 06:17:30 w2 kubelet[7436]: I0528 06:17:30.453081 7436 kuberuntime_manager.go:841] Should we migrate?Runningtrue
Also, I told with Tuong. He recommends using
$ kubectl checkpoint simple /var/lib/kubelet/migration/xxx
checkpoint and I got this
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x122f8da]
goroutine 1 [running]:
k8s.io/client-go/kubernetes.NewForConfig(0x0, 0x0, 0x14f5141, 0x58)
/home/ubuntu/kubernetes/staging/src/k8s.io/client-go/kubernetes/clientset.go:371 +0x3a
main.(*MigrateArgs).Run(0xc000361230, 0xc00035ea00, 0xc000355020)
/home/ubuntu/podmigration-operator/kubectl-plugin/checkpoint-command/checkpoint_command.go:88 +0x73
main.NewPluginCmd.func1(0xc00035ea00, 0xc000355020, 0x2, 0x2)
/home/ubuntu/podmigration-operator/kubectl-plugin/checkpoint-command/checkpoint_command.go:61 +0xd3
github.com/spf13/cobra.(*Command).execute(0xc00035ea00, 0xc000114160, 0x2, 0x2, 0xc00035ea00, 0xc000114160)
/home/ubuntu/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:830 +0x2c2
github.com/spf13/cobra.(*Command).ExecuteC(0xc00035ea00, 0x0, 0xffffffff, 0xc000102058)
/home/ubuntu/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:914 +0x30b
github.com/spf13/cobra.(*Command).Execute(...)
/home/ubuntu/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:864
main.main()
/home/ubuntu/podmigration-operator/kubectl-plugin/checkpoint-command/checkpoint_command.go:130 +0x2a
I don't know what should I fix now.
May 28 14:49:35 w1 kubelet[5195]: E0528 14:49:35.089879 5195 remote_runtime.go:289] CheckpointContainer "b0541936954521367fdcd022b54e9e44e2350469daf549616841bbf2263173c5" from runtime service failed: rpc error: code = Unknown desc = failed to checkpoint container: /usr/local/bin/runc did not terminate sucessfully: criu failed: type NOTIFY errno 0 path= /run/containerd/io.containerd.runtime.v1.linux/k8s.io/b0541936954521367fdcd022b54e9e44e2350469daf549616841bbf2263173c5/criu-dump.log: unknown
Seems like the problem with CRIU
I don't know what to say. I think Ubuntu still has a problem with CRIU. I don' know why you guys can use it. I test your work on Debian 10. it works without any problems.
Hi again,
I still have an issue with a YAML file. I can test pod migrating with your YAML file and it works without any problem, but my YAML file has a problem.
I think the one thing that I know. Your YAML file doesn't have a "command" on the YAML file, but I'm not sure why my YAML doesn't work. Maybe because of this or another thing.
I think I got some point about if I want to use the volume on the YAML file should config something, but I didn't use volumes, and it still doesn't work anyway.
It has the status "CrashLoopBackOff." in the destination pod after migrating. Any Idea about this?