Garbage collection during the pod execution

tmrts commented 8 years ago

With our move towards exposing application level operations and dynamic pods (see #2375 #2867 #2932), we should consider whether we need to modify our garbage collection model to be run during the execution.

Introduction of app-level operations means that we allow users to run pets instead of cattles, which are basically long-running pods, and when applications are ejected/removed out of a pod, after a while, uncollected garbage might be problematic.

cc @euank @yifan-gu @coreos/rkt-maintainers

yifan-gu commented 8 years ago

So for the rktnetes, I am not sure we really want to redesign the gc, as the kubelet will enforce it's gc policy, and all it needs is the RemoveContainer interface.

I think we can keep today's gc for cleaning up at pod levels?

tmrts commented 8 years ago

As previously discussed in #2932, gc by entrypoints seems to be enough for our purposes.

After our discussions AFAICT we don't need any changes to garbage collection semantics in rkt.

sgotti commented 8 years ago

@tmrts @yifan-gu Just trying to imagine how the workflow will be (considering that rkt is not based on a central daemon model):

Who will handle the app level "prepare" (render image) phase?
Who will cleanup thing if the prepare phase (render image) or run phase fails (mount overlayfs etc..) at any point? Since the rkt process could just exit/die, who will remove the stale data?
Who will handle the app level "gc" (umount overlayfs etc...) phase?
- If the idea is to do this in "removecontainer" there's the way to handle failure cases (rkt process exits with error/die), who will cleanup thing? how to handle problems when the "cleanup" fails?

I was under the impression that extending the current rkt pod lifecycle also to single apps will help on handling all possible problems like the ones handled at the pod level by the pod lifecycle.

Additionally not doing this will tie the per app handling to an external model enforced by the k8s interface and this won't work when doing per app management with "just" rkt.

yifan-gu commented 8 years ago

@sgotti Good question especially when an app is a crashloop but the pod is still running.... I guess we will need k8s to handle this as it is the one who creates the crashloop.

In other cases, we can let rkt gc to remove them when it removes the whole pod.

sgotti commented 8 years ago

@yifan-gu that case covers an application crashing, but what will happen when rkt has some problems preparing/starting/stopping the app? My impression is that using pods like lifecycle (with the assumption that an app when failed cannot be restarted but needs to be trashed and recreated) will ease and prpbably required by the rkt model.

Perhaps I'm missing something and if this is really needed will be discovered when implementing/testing #2932 .

euank commented 8 years ago

Perhaps we should strongly encourage (and default?) to having a data directory specific to Kubernetes. This would allow the rktnetes code to nuke any pod it can't recognize vs the current state of things where we have to carefully classify "owned" vs "not owned" pods and avoid nuking ones we don't know about.

sgotti commented 8 years ago

@euank I think that this issue should also cover how to handle garbage for pods not managed by k8s.

Instead I see this related #3029 (my understanding was that a missing pod manifest can't help distinguishing a k8s managed pod from other pods). I'm not sure how to handle upgrades if the new version changes the datadir since it'll leave old k8s pods unmanaged.

rkt / rkt

Garbage collection during the pod execution #2933