hyperhq / hyperd

HyperContainer Daemon
http://www.hypercontainer.io
Apache License 2.0
1.98k stars 196 forks source link

Decommision pod resources synchronously in Remove() method #736

Closed spearl closed 6 years ago

spearl commented 6 years ago

fixes #733

This will deallocate everything physically belonging to a pod right after those same things are deleted from the hyper DB.

Jimmy-Xu commented 6 years ago

Can one of the admins verify this patch?

gnawux commented 6 years ago

ok to test, hykins

teawater commented 6 years ago

Hi @spearl ,

I think the reason that you got the issue is after XPod.Remove timeout in https://github.com/hyperhq/hyperd/blob/60a4dd858f9a5db29e2878d67276134a93648505/daemon/pod/decommission.go#L61 , it will set the status of the pod to S_POD_NONE in https://github.com/hyperhq/hyperd/blob/60a4dd858f9a5db29e2878d67276134a93648505/daemon/pod/decommission.go#L71 Then XPod.cleanup will give up clean the pod because the status of the pod is S_POD_NONE in https://github.com/hyperhq/hyperd/blob/60a4dd858f9a5db29e2878d67276134a93648505/daemon/pod/decommission.go#L551

Even if the current patch can let XPod.Remove call decommissionResources release something. But There is still another function p.removeSandboxFromDB() need to be called.

So what about update https://github.com/hyperhq/hyperd/blob/60a4dd858f9a5db29e2878d67276134a93648505/daemon/pod/decommission.go#L551 let XPod.cleanup can handle the S_POD_NONE but not just give it up?

Thanks, Hui

spearl commented 6 years ago

Hi @teawater , thanks for responding!

p.removeSandboxFromDB() is actually already being called synchronously inside of the call to p.removeFromDB() here: https://github.com/hyperhq/hyperd/blob/60a4dd858f9a5db29e2878d67276134a93648505/daemon/pod/persist.go#L243 So the only missing call is the one to p.decommissionResources(). If we remove the check for S_POD_NONE do we still need to check for S_POD_STOPPED? Would we still need to change the status to S_POD_STOPPING in the else? What purpose does this check serve? The comment explaining it above is cut off mid sentence.

Also, what is point of this check? https://github.com/hyperhq/hyperd/blob/60a4dd858f9a5db29e2878d67276134a93648505/daemon/pod/decommission.go#L574 The status will never be S_POD_NONE at this point since it always gets changed to S_POD_STOPPING above. Which means that the status will get changed from S_POD_NONE to S_POD_STOPPED if we simply remove the check above.

teawater commented 6 years ago

Hi @spearl

I would like to handle this issue in XPod.cleanup but not XPod.Remove because XPod.cleanup is designed to clean the resource usage when VM quit. I think If VM doesn't quit, just release the resource usage in XPod.Remove is not a good idea.

About the code in XPod.cleanup. I think you can remove the check of S_POD_NONE in https://github.com/hyperhq/hyperd/blob/60a4dd858f9a5db29e2878d67276134a93648505/daemon/pod/decommission.go#L551 And update following part to change p.status to S_POD_STOPPING if p.status is not S_POD_NONE. https://github.com/hyperhq/hyperd/blob/60a4dd858f9a5db29e2878d67276134a93648505/daemon/pod/decommission.go#L555 Then following part become useable. https://github.com/hyperhq/hyperd/blob/60a4dd858f9a5db29e2878d67276134a93648505/daemon/pod/decommission.go#L574

Thanks, Hui

spearl commented 6 years ago

Sounds good. Just made those changes and took out the sync call in the remove() method