google / lmctfy

lmctfy is the open source version of Google’s container stack, which provides Linux application containers.
Other
3.41k stars 237 forks source link

Failure to create container can leave behind state requiring manual intervention #37

Open abhishekrai opened 10 years ago

abhishekrai commented 10 years ago

Consider the following sequence of events:


$ lmctfy create /test/abhishek 'cpu { limit: 1000 } memory { limit: -1 }'
Command exited with error message: 14: Failed to write "-1" to file "/sys/fs/cgroup/memory/test/abhishek/memory.memsw.limit_in_bytes" for hierarchy 7

The "/test" container being created above spans CPU and memory resource controllers. Here, lmctfy was successfuly in creating the state under freezer and cpu controllers, but had an error in creating memory controller state. Ignore the cause for the failure for the purpose of this issue.

Here's what the cgroup file system looks like at this point:


$ find /sys/fs/cgroup/ -name abhishek
/sys/fs/cgroup/memory/test/abhishek

So, there is a leftover directory for test/abhishek under memory controller, but not under others. This confuses subsequent commands. For example, destroy throws the following error:


$ lmctfy destroy -f /test/abhishek
Command exited with error message: 5: Can't get non-existent container "/test/abhishek"

Trying to re-run the create command throws up a different error this time:


$ lmctfy create /test/abhishek ' cpu { limit: 1000 } memory { limit: 1000 }'
Command exited with error message: 6: Expected cgroup "/sys/fs/cgroup/memory/test/abhishek" to not exist.

To summarize, manual intervention is necessary to recover from this state, which is unfortunate. I realize the manual intervention may be necessary in some other cases as well if say the application unexpectedly crashes in an intermediate state. Either we should fix it, or provide a "cleanup" operation which, given a container, cleans up all its state, including any such leftover state. The "destroy" operation is unable to do this as it only acts if the container's node under "freezer" exists.

All of this weirdness is because of the leftover state of memory controller after the failure of the first create operation. Deleting this directory brings us back to the original create error again:


$ rmdir /sys/fs/cgroup/memory/test/abhishek
$ lmctfy create /test/abhishek ' cpu { limit: 1000 } memory { limit: 1000 }'
Command exited with error message: 14: Failed to write "-1" to file "/sys/fs/cgroup/memory/test/abhishek/memory.memsw.limit_in_bytes" for hierarchy 7

From the code for the "create" operation, it looks like lmctfy is smart about cleaning up some state. In particular, ContainerApiImpl::Create() uses UniqueDestroyPtr<> for resource handlers to ensure that if some resource handlers were successfully created, before one failed, state of the previously initialized resource handlers gets cleaned up.

But, within the context of the same resource handler, this is not true. In the following code, perhaps we could inject a Destroy() call if the Update() call fails?


StatusOr CgroupResourceHandlerFactory::Create(
      const string &container_name,
      const ContainerSpec &spec) {
  // Create the ResourceHandler for the container.
  StatusOr statusor_handler =
      CreateResourceHandler(container_name, spec);
  if (!statusor_handler.ok()) {
    return statusor_handler.status();
  }
  unique_ptr handler(statusor_handler.ValueOrDie());

  // Run the create action before applying the update.
  RETURN_IF_ERROR(handler->CreateResource(spec));

  // Prepare the container by doing a replace update.
  Status status = handler->Update(spec, Container::UPDATE_REPLACE);
  if (!status.ok()) {
    return status;
  }

  return handler.release();
}
kyurtsever commented 10 years ago

You're right and we want to make creation failure handling more robust. I'm working on using UniqueDestroyPtr in CgroupResourceHandlerFactory::Create too. However there is still problem of failures in CreateResourceHandler methods overridden by different resource handlers which requires more work and probably won't be fixed very soon. On the other hand it should be much more rare for creation to fail at that stage.