rkt / rkt

[Project ended] rkt is a pod-native container engine for Linux. It is composable, secure, and built on standards.
Apache License 2.0
8.82k stars 884 forks source link

Garbage collection during the pod execution #2933

Open tmrts opened 8 years ago

tmrts commented 8 years ago

With our move towards exposing application level operations and dynamic pods (see #2375 #2867 #2932), we should consider whether we need to modify our garbage collection model to be run during the execution.

Introduction of app-level operations means that we allow users to run pets instead of cattles, which are basically long-running pods, and when applications are ejected/removed out of a pod, after a while, uncollected garbage might be problematic.

cc @euank @yifan-gu @coreos/rkt-maintainers

yifan-gu commented 8 years ago

So for the rktnetes, I am not sure we really want to redesign the gc, as the kubelet will enforce it's gc policy, and all it needs is the RemoveContainer interface.

I think we can keep today's gc for cleaning up at pod levels?

tmrts commented 8 years ago

As previously discussed in #2932, gc by entrypoints seems to be enough for our purposes.

After our discussions AFAICT we don't need any changes to garbage collection semantics in rkt.

sgotti commented 8 years ago

@tmrts @yifan-gu Just trying to imagine how the workflow will be (considering that rkt is not based on a central daemon model):

I was under the impression that extending the current rkt pod lifecycle also to single apps will help on handling all possible problems like the ones handled at the pod level by the pod lifecycle.

Additionally not doing this will tie the per app handling to an external model enforced by the k8s interface and this won't work when doing per app management with "just" rkt.

yifan-gu commented 8 years ago

@sgotti Good question especially when an app is a crashloop but the pod is still running.... I guess we will need k8s to handle this as it is the one who creates the crashloop.

In other cases, we can let rkt gc to remove them when it removes the whole pod.

sgotti commented 8 years ago

@yifan-gu that case covers an application crashing, but what will happen when rkt has some problems preparing/starting/stopping the app? My impression is that using pods like lifecycle (with the assumption that an app when failed cannot be restarted but needs to be trashed and recreated) will ease and prpbably required by the rkt model.

Perhaps I'm missing something and if this is really needed will be discovered when implementing/testing #2932 .

euank commented 8 years ago

Perhaps we should strongly encourage (and default?) to having a data directory specific to Kubernetes. This would allow the rktnetes code to nuke any pod it can't recognize vs the current state of things where we have to carefully classify "owned" vs "not owned" pods and avoid nuking ones we don't know about.

sgotti commented 8 years ago

@euank I think that this issue should also cover how to handle garbage for pods not managed by k8s.

Instead I see this related #3029 (my understanding was that a missing pod manifest can't help distinguishing a k8s managed pod from other pods). I'm not sure how to handle upgrades if the new version changes the datadir since it'll leave old k8s pods unmanaged.