kubernetes / enhancements

Enhancements tracking repo for Kubernetes
Apache License 2.0
3.33k stars 1.44k forks source link

Forensic Container Checkpointing #2008

Open adrianreber opened 3 years ago

adrianreber commented 3 years ago

Enhancement Description

adrianreber commented 3 years ago

/sig node

kikisdeliveryservice commented 3 years ago

Discussion Link: N/A (or... at multiple conferences during the last years when presenting CRIU and container migration, there was always the question when will we see container migration in Kubernetes)

Responsible SIGs: maybe node

We recommend actively socializing your KEP with the appropriate sig to gain visibility, consensus and also for scheduling. Also as you are not sure of what SIG will sponsor this, reaching out to the SIGs to get clarity on that will be helpful to move your KEP forward.

kikisdeliveryservice commented 3 years ago

Hi @adrianreber

Any updates on whether this will be included in 1.20?

Enhancements Freeze is October 6th and by that time we require:

The KEP must be merged in an implementable state The KEP must have test plans The KEP must have graduation criteria The KEP must have an issue in the milestone

Best, Kirsten

adrianreber commented 3 years ago

Hello @kikisdeliveryservice

Any updates on whether this will be included in 1.20?

Sorry, but how would I decide this? There has not been a lot of feedback on the corresponding KEP which makes it really difficult for me to answer that question. On the other hand, maybe the missing feedback is a good sign that it will take some more time. So probably this will not be included in 1.20.

kikisdeliveryservice commented 3 years ago

Normally the sig would give a clear signal that it would be included. That would be by : reviewing the KEP, agreeing to the milestone proposals in the KEP etc.. I'd encourage you to keep in touch with them and start the 1.21 conversation early if this does not end up getting reviewed/merged properly by October 6th.

Best, Kirsten

adrianreber commented 3 years ago

@kikisdeliveryservice Thanks for the guidance. Will do.

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot commented 3 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

fejta-bot commented 3 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community. /close

k8s-ci-robot commented 3 years ago

@fejta-bot: Closing this issue.

In response to [this](https://github.com/kubernetes/enhancements/issues/2008#issuecomment-786124982): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >Send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
adrianreber commented 3 years ago

/reopen /remove-lifecycle rotten

k8s-ci-robot commented 3 years ago

@adrianreber: Reopened this issue.

In response to [this](https://github.com/kubernetes/enhancements/issues/2008#issuecomment-786135980): >/reopen >/remove-lifecycle rotten Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

adrianreber commented 3 years ago

/remove-lifecycle stale

Still working on it.

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

adrianreber commented 2 years ago

/remove-lifecycle rotten

adrianreber commented 2 years ago

/remove-lifecycle stale

Priyankasaggu11929 commented 2 years ago

Hello @adrianreber 👋, 1.24 Enhancements team here.

Just checking in as we approach enhancements freeze on 18:00pm PT on Thursday Feb 3rd, 2022. This enhancement is targeting for stage alpha for 1.24, is this correct?

Here’s where this enhancement currently stands:

Looks like for this one, we would just need to update the following:

At the moment, the status of this enhancement is track as at risk. Please keep the issue description up-to-date with appropriate stages. Thank you!

adrianreber commented 2 years ago

@Priyankasaggu11929 Thanks for the KEP feedback. I tried to update the KEP to address the open issues you listed.

Priyankasaggu11929 commented 2 years ago

@adrianreber, thanks so much for the quickly updates the PR. 🚀

rhockenbury commented 2 years ago

With #1990 merged, I've updated this enhancement to tracked for the 1.24 cycle. All set for enhancements freeze. Thanks!

nate-double-u commented 2 years ago

Hi @adrianreber :wave: 1.24 Docs lead here.

This enhancement is marked as Needs Docs for the 1.24 release.

Please follow the steps detailed in the documentation to open a PR against the dev-1.24 branch in the k/website repo. This PR can be just a placeholder at this time and must be created before Thursday, March 31st, 2022 @ 18:00 PDT.

Also, if needed take a look at Documenting for a release to familiarize yourself with the docs requirement for the release.

Thanks!

adrianreber commented 2 years ago

@nate-double-u documentation PR available at https://github.com/kubernetes/website/pull/31753

valaparthvi commented 2 years ago

Hi @adrianreber :wave: 1.24 Release Comms team here.

We have an opt-in process for the feature blog delivery. If you would like to publish a feature blog for this issue in this cycle, then please opt in on this tracking sheet.

The deadline for submissions and the feature blog freeze is scheduled for 01:00 UTC Wednesday 23rd March 2022 / 18:00 PDT Tuesday 22nd March 2022. Other important dates for delivery and review are listed here: https://github.com/kubernetes/sig-release/tree/master/releases/release-1.24#timeline.

For reference, here is the blog for 1.23.

Please feel free to reach out any time to me or on the #release-comms channel with questions or comments.

Thanks!

rhockenbury commented 2 years ago

Hi @adrianreber

I'm checking in as we approach 1.24 code freeze at 01:00 UTC Wednesday 30th March 2022.

Please ensure the following items are completed:

For this KEP, it looks like just k/k#104907 needs to be merged. Are there any other PRs that you think we should be tracking that would be subject to the 1.24 code freeze?

Let me know if you have any questions.

adrianreber commented 2 years ago

@rhockenbury There are no other PRs that need to be tracked.

rhockenbury commented 2 years ago

Friendly reminder to try to merge k/k#104907 before code freeze at 01:00 UTC Wednesday 30th March 2022.

adrianreber commented 2 years ago

KEP update PR for 1.25 https://github.com/kubernetes/enhancements/pull/3264

parul5sahoo commented 2 years ago

Hello @adrianreber👋, 1.25 Enhancements team here.

Just checking in as we approach enhancements freeze on 18:00 PST on Thursday June 16, 2022.

For note, This enhancement is targeting for stage alpha for 1.25 (correct me, if otherwise)

Here's where this enhancement currently stands:

Looks like for this one, we would just need to update the following:

For note, the status of this enhancement is marked as at risk. Please keep the issue description up-to-date with appropriate stages as well. Thank you!

adrianreber commented 2 years ago

@parul5sahoo see #3406 for the updated test plan

parul5sahoo commented 2 years ago

Hello @adrianreber 👋, 1.25 Enhancements team here.

Just checking in as we approach enhancements freeze on 18:00 PST on Thursday June 23, 2022.

For note, This enhancement is targeting for stage alpha for 1.25 (correct me, if otherwise)

Here’s where this enhancement currently stands:

With all the KEP requirements in place, this enhancement is all good for the upcoming enhancements freeze once that PR gets merged. 🚀

For note, the status of this enhancement is marked as at risk and will be marked as tracked as soon as the PR gets merged. Please keep the issue description up-to-date with appropriate stages as well. Thank you!

parul5sahoo commented 2 years ago

Hello @adrianreber , the KEP is marked tracked and is ready for Enhacements freeze :rocket:

adrianreber commented 1 year ago

Opened docs PR at https://github.com/kubernetes/website/pull/34940

rhockenbury commented 1 year ago

👋 Hey @adrianreber,

Enhancements team checking in as we approach 1.25 code freeze at 01:00 UTC on Wednesday, 3rd August 2022.

Please ensure the following items are completed by code freeze: [ ] All PRs to the Kubernetes repo that are related to your enhancement are linked in the above issue description (for tracking purposes). [x] All PRs are fully merged by the code freeze deadline.

Looks like there is one merged PR in k/k. Let me know if I missed any other PRs that need to be tracked.

As always, we are here to help should questions come up. Thanks!!

adrianreber commented 1 year ago

@rhockenbury https://github.com/kubernetes/kubernetes/pull/104907 is the only PR and it is merged.

sftim commented 1 year ago

I think it'd be useful if the kubelet annotated Pods with the timestamp of their last checkpoint.

If we add native support for restores, we could additionally manage a Pod annotation with the timestamp of the last restore, and perhaps with some details of the checkpoint data that was restored.

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

adrianreber commented 1 year ago

/remove-lifecycle stale

adrianreber commented 1 year ago

I think it'd be useful if the kubelet annotated Pods with the timestamp of their last checkpoint.

If we add native support for restores, we could additionally manage a Pod annotation with the timestamp of the last restore, and perhaps with some details of the checkpoint data that was restored.

If restored and checkpointed is now tracked at least by CRI-O on the container level: https://github.com/cri-o/cri-o/pull/6464

scarlet25151 commented 1 year ago

Hi @adrianreber, I've recently seen your great work on FOSDEM here I have two question would like to discuss:

  1. since for now we only can support checkpoint after we scheduled it to one exact node and we must know where the pods container, is there any design we can call kube-apiserver to do the checkpoint?

for this, I think there would be some approach like we just add some annotation like checkpoint.kubernetes.io/checkpoint-container=<container_name> and stuffs to letting kubelet to handle the checkpoint automatically.

  1. I've notice that in cri-o, you've implement the interface for RestoreContainer(context.Context, *Container, string, string) error, however in containerd, I saw the createContainer interface was leveraged to handle restore situation, if my understanding is correct, is there any reason that implemented in this way? Or is any plan that to have another symmetry interface Restore in containerd?

Appreciated for your answer

adrianreber commented 1 year ago

Hi @adrianreber, I've recently seen your great work on FOSDEM

Thanks.

here I have two question would like to discuss:

1. since for now we only can support checkpoint after we scheduled it to one exact node and we must know where the pods container, is there any design we can call kube-apiserver to do the checkpoint?

One of the main reasons to make it as a kubelet only API endpoint in the beginning is that we wanted to be careful as checkpointing is something new in Kubernetes. One of the problems is that now there is the possibility to have all memory pages, including potential sensitive information, on disk. It can only be access by root, but the checkpoint can be moved to some other location and the sensitive information could leak. One thing we are currently exploring is to encrypt the checkpoint to avoid this. We are still looking at what is the best way to expose this at the apiserver level.

for this, I think there would be some approach like we just add some annotation like checkpoint.kubernetes.io/checkpoint-container=<container_name> and stuffs to letting kubelet to handle the checkpoint automatically.

I do not really understand what you are suggesting.

2. I've notice that in cri-o, you've implement the interface for `RestoreContainer(context.Context, *Container, string, string) error`, however in containerd, I saw the createContainer interface was leveraged to handle restore situation, if my understanding is correct, is there any reason that implemented in this way? Or is any plan that to have another symmetry interface `Restore` in containerd?

Not sure at this point. The current PR to expose the CRI checkpoint changes in containerd (https://github.com/containerd/containerd/pull/6965) is open for almost 10 months and there has not been much feedback. One of the problems is that the checkpoint archive format is not standardized and although there is a proposal (https://github.com/opencontainers/image-spec/issues/962) there is not much happening there.

If you want to checkpoint container from Kubernetes CRI-O is currently the best CRI implementation.

Jeffwan commented 1 year ago

@adrianreber

Thanks for the details. We are very interested in this story and we will definitely help feature testing in containerd and contribute if there's chance. One quick question on the restore process, seems in 1.25 it's not directly implemented in kubernetes. We hope the pod can be created from the restore which leverages capabilities from contained layer. What's the known issues or bottleneck from your perspective?

adrianreber commented 1 year ago

One quick question on the restore process, seems in 1.25 it's not directly implemented in kubernetes.

If you look at https://kubernetes.io/blog/2022/12/05/forensic-container-checkpointing-alpha/ you can see how it is possible to restore containers in Kubernetes by adding the checkpoint archive to an OCI image. This way you can tell Kubernetes to create a container from that checkpoint image and the resulting container will be a restore.

We hope the pod can be created from the restore which leverages capabilities from contained layer.

Not sure what you mean here.

Jeffwan commented 1 year ago

We hope the pod can be created from the restore which leverages capabilities from contained layer. Not sure what you mean here.

I can add more details. The way in https://kubernetes.io/blog/2022/12/05/forensic-container-checkpointing-alpha/ is an implicit way and kubernetes actually don't know the magic and it relies on the underneath container runtime to detect the image spec.

The other user journey could be explicit way. Kubelet can detect the perceive the snapshot and eventually invoke some restore path through CRI. It leaves the flexibility at the kubernetes layer to do lots of things, for example, schedule to node already with original image and it can only apply diff to get started. Have you evaluated explicit way in your original design?

apiVersion: v1
kind: Pod
metadata:
  namePrefix: example-
...
  annotations:
    app.kubernetes.io/snapshot-image: xxxxxx
...
adrianreber commented 1 year ago

We hope the pod can be created from the restore which leverages capabilities from contained layer. Not sure what you mean here.

I can add more details. The way in https://kubernetes.io/blog/2022/12/05/forensic-container-checkpointing-alpha/ is an implicit way and kubernetes actually don't know the magic and it relies on the underneath container runtime to detect the image spec.

The other user journey could be explicit way. Kubelet can detect the perceive the snapshot and eventually invoke some restore path through CRI. I leaves the flexibility at the kubernetes layer to do lots of things, for example, schedule to node already with original image and it can only apply diff to get started.

I see no difference between the two ways you described. The checkpoint OCI image is only the checkpoint data and nothing else. The base image, the image the container is based on, is not part of it. As implemented in CRI-O, the base image will be pulled from the registry if missing. So I see no difference based on what you are describing. The automatic early pulling of the base image would not be possible, that is correct.

The other reason to do it implicitly is, that adding additional interfaces to the CRI takes a lot of time and as it was possible to solve it without an additional CRI call, it seemed the easier solution.

If we would be talking about checkpointing and restoring pods, I think it would be necessary to have and explicit interface in the CRI. For containers I do not think it is necessary.

Have you evaluated explicit way in your original design?

It feels like I have implemented almost everything to test it out initially :wink:

Jiaxuan-C commented 1 year ago

Hi! @adrianreber I am interested in your checkpoint/restore in Kubernetes project, so I recently attempted to use this feature, but encountered some issues. Could you please help me? it 's very important to me.

Description: I followed your demo video https://fosdem.org/2023/schedule/event/container_kubernetes_criu/ and official documentation https://kubernetes.io/blog/2022/12/05/forensic-container-checkpointing-alpha/ to perform checkpoint and restore operations on your container "quay.io/adrianreber/counter". Initially, everything appeared to be working fine, but when I tried to restore the counter container on another node in my k8s cluster using a YAML file, the restored container would enter an error state within 1 second of entering the running state. Like this: image

What I did: I attempted to debug kubelet and discovered that after the Pod was restored, an old Pod's cgroup file (such as "kubepods-besteffort-pod969bc448_d138_4131_ad8d_344d1cb78b40.slice") was generated in the "/sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice" directory. However, the Pod associated with this cgroup was not running on the destination node, so kubelet would delete the directory, causing the restored container process to exit and resulting in a Pod Error.

My question is: why did this issue occur, and could it be a version compatibility issue?

My versions: Ubuntu 22.04 kubelet 1.26.0 cri-o 1.26.3 criu 3.17.1 (https://build.opensuse.org/project/show/devel:tools:criu)

adrianreber commented 1 year ago

@Qiubabimuniuniu are you using cgroup v1 or v2 on your systems? There might be still a bug in CRI-O when using checkpoint/restore on cgroup v2 systems.

Jiaxuan-C commented 1 year ago

@adrianreber I'm using cgroup v2. Thank you very much!!

I will try switching to use cgroup v1 and see if the problem can be resolved. Recently, I have been attempting to modify the kubelet and containerd source code to support "checkpoint/restore in Kubernetes". However, I encountered the same issue after completing the development. Additionally, while trying your project, I found that when using cri-o, I also encountered the same problem when performing checkpoint and restore operations. This problem has been bothering me for a long time. Thank you very much for your solution.

adrianreber commented 1 year ago

Thank you very much for your solution.

It is not really a solution. I should fix CRI-O to correctly work with checkpoint and restore on cgroup v2 systems.