google / lmctfy

lmctfy is the open source version of Google’s container stack, which provides Linux application containers.
Other
3.41k stars 237 forks source link

Using memory controller when swap accounting is disabled #38

Open abhishekrai opened 10 years ago

abhishekrai commented 10 years ago

We are seeing that on some kernels, when swap accounting is not enabled, lmctfy is unable to create any containers having memory limit. For example:


$ lmctfy create /test/abhishek ' cpu { limit: 1000 } memory { limit: -1 }'
Command exited with error message: 14: Failed to write "-1" to file "/sys/fs/cgroup/memory/test/abhishek/memory.memsw.limit_in_bytes" for hierarchy 7

$ echo -1 >/sys/fs/cgroup/memory/test/abhishek/memory.memsw.limit_in_bytes 
bash: echo: write error: Operation not supported

$ uname -a
Linux abhishek-samsung 3.8.0-44-generic #66~precise1-Ubuntu SMP Tue Jul 15 04:01:04 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

At ThoughtSpot, we are seeing this on all 3.8 kernels that we have in our dev environment. However, this works fine on 3.5 and 3.11 kernels which are the other two common kernels we have in our dev environment.

The root cause for this "operation not supported" seems to come from this code from mm/memcontrol.c which is returning EOPNOTSUPP.


5017 static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
5018                             const char *buffer)
5019 {
5020         struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
5021         enum res_type type;
5022         int name;
5023         unsigned long long val;
5024         int ret;
5025 
5026         type = MEMFILE_TYPE(cft->private);
5027         name = MEMFILE_ATTR(cft->private);
5028 
5029         if (!do_swap_account && type == _MEMSWAP)
5030                 return -EOPNOTSUPP;

Restarting the kernel with swap accounting enabled fixes the problem. In our environment, we always try to place a limit on memory, even if -1 in some cases, and not being able to do that is a limitation. I was curious to see how docker handles this and from the docs it looks like docker also recommends swap accounting to be enabled, but requires it only if user wants swap accounting.

We can work around this issue by enabling swap accounting on all machines, but filing it to see if there is a better approach. I noticed some comments in the code about "feature detection", and in some sense, this is a problem of feature detection where we have an invalid expectation about the kernel features.

One possible fix for this particular issue is to make the following change in MemoryResourceHandler::Update(). Does this look reasonable?


  // Set the swap limit if it was specified. The default is -1 if it was not
  // specified during a replace.
  // TODO(zohaib): swap_limit must be greater than or equal to the limit. We
  // need to check that this is true.
  if (memory_spec.has_swap_limit()) {
     RETURN_IF_ERROR(
         memory_controller_->SetSwapLimit(Bytes(memory_spec.swap_limit())));
   } else if (policy == Container::UPDATE_REPLACE) {
-    Status status = memory_controller_->SetSwapLimit(Bytes(-1));
-    // This may not be supported in all kernels so don't fail if it is not
-    // supported and not specified.
-    if (!status.ok() && status.error_code() != ::util::error::NOT_FOUND) {
-      return status;
+    // When swap limit is not specified, set it only when swap accounting is
+    // enabled.  Since swap accounting cannot be modified without rebooting the
+    // kernel, it's OK to not update the swap limit when swap accounting is
+    // disabled regardless of the previous value.
+    Status swap_accounting_enabled = memory_controller_->GetSwapLimit();
+    if (swap_accounting_enabled.ok()) {
+      Status status = memory_controller_->SetSwapLimit(Bytes(-1));
+      // This may not be supported in all kernels so don't fail if it is not
+      // supported and not specified.
+      if (!status.ok() && status.error_code() != ::util::error::NOT_FOUND) {
+        return status;
+      }
     }
   }
kyurtsever commented 10 years ago

It would be more idiomatic to gContain code to ignore EOPNOTSUPP same as we ignore NOT_FOUND but unfortunately EOPNOTSUPP is swallowed at lower level. I'll take a look how much work is required to pipe that information to higher level. This problem makes me wonder why kernel exposes this file at all if swap accounting is disabled?

abhishekrai commented 10 years ago

Thanks. Yeah, the kernel behavior is puzzling. And unfortunate too, cause only some versions have this problem as I mentioned above, forcing the need for a generic workaround in lmctfy.

jb11211 commented 9 years ago

I just hit the same issue. gentle bump?