Open abhishekrai opened 10 years ago
You're right and we want to make creation failure handling more robust. I'm working on using UniqueDestroyPtr in CgroupResourceHandlerFactory::Create too. However there is still problem of failures in CreateResourceHandler methods overridden by different resource handlers which requires more work and probably won't be fixed very soon. On the other hand it should be much more rare for creation to fail at that stage.
Consider the following sequence of events:
The "/test" container being created above spans CPU and memory resource controllers. Here, lmctfy was successfuly in creating the state under freezer and cpu controllers, but had an error in creating memory controller state. Ignore the cause for the failure for the purpose of this issue.
Here's what the cgroup file system looks like at this point:
So, there is a leftover directory for test/abhishek under memory controller, but not under others. This confuses subsequent commands. For example, destroy throws the following error:
Trying to re-run the create command throws up a different error this time:
To summarize, manual intervention is necessary to recover from this state, which is unfortunate. I realize the manual intervention may be necessary in some other cases as well if say the application unexpectedly crashes in an intermediate state. Either we should fix it, or provide a "cleanup" operation which, given a container, cleans up all its state, including any such leftover state. The "destroy" operation is unable to do this as it only acts if the container's node under "freezer" exists.
All of this weirdness is because of the leftover state of memory controller after the failure of the first create operation. Deleting this directory brings us back to the original create error again:
From the code for the "create" operation, it looks like lmctfy is smart about cleaning up some state. In particular, ContainerApiImpl::Create() uses UniqueDestroyPtr<> for resource handlers to ensure that if some resource handlers were successfully created, before one failed, state of the previously initialized resource handlers gets cleaned up.
But, within the context of the same resource handler, this is not true. In the following code, perhaps we could inject a Destroy() call if the Update() call fails?