nestybox / sysbox

An open-source, next-generation "runc" that empowers rootless containers to run workloads such as Systemd, Docker, Kubernetes, just like VMs.
Apache License 2.0
2.81k stars 155 forks source link

VM-like masking of total memory and CPUs at `/proc` #391

Open felipecrs opened 3 years ago

felipecrs commented 3 years ago

I'm not sure if this is out-of-scope or not, but yet I'm opening this issue for discussing.

When we run:

$ docker run --rm --memory 500Mi ubuntu free -h
              total        used        free      shared  buff/cache   available
Mem:          7.7Gi       2.6Gi       186Mi       4.1Gi       4.9Gi       839Mi
Swap:         2.0Gi       1.0Gi       1.0Gi

It shows the total memory of the host machine despite of the --memory constraint. The same happens when using sysbox-runc. This is intended.

The thing is: as Sysbox claims to transform containers to VM-like ones, would it be possible to allow Sysbox to enforce the total memory which the container will recognize just like normal VMs?

By using --memory today, if my containers trespass the limit, they gets killed. Ideally, I would like the containers not to recognize the total memory, so they handle the available memory the way they want without being killed by the daemon. In a CI/CD build pool it's a very desired feature, as I can set a given number of resources for each build and I do not have control over what happens in such builds.

The same rationale and use case applies for CPUs as well.

ctalledo commented 3 years ago

Hi @felipecrs, yes this is very much a feature we have in mind for Sysbox. Not just for memory, but also for CPUs.

The key is to have Sysbox "virtualize" the /proc/meminfo and /proc/cpuinfo resources exposed inside the containers, according to the corresponding cgroup limits assigned to the container. While Sysbox has the underlying infrastructure to do this already, we've not had the work cycles yet to implement this feature.

Also, it's not clear to us if this will be a Sysbox Enterprise only feature, or if it will go into Sysbox Community Edition too. It's one of those things we must carefully think about to create a balance between community benefit vs. sustainable business.

felipecrs commented 3 years ago

I'm very happy to know that this feature is considered!

And yes, I should have included CPUs in the original story as the rationale and use case applies for both.

ctalledo commented 3 years ago

FYI, we are hoping to get to get to this feature before the year is over.

felipecrs commented 2 years ago

Out of curiosity, I found https://github.com/fabiokung/procg. I'm sending here in case it can be useful as a reference for implementation, or something like that.

ctalledo commented 2 years ago

Out of curiosity, I found https://github.com/fabiokung/procg. I'm sending here in case it can be useful as a reference for implementation, or something like that.

Thanks @felipecrs; I'll take a look as several users are now asking for /proc/cpuinfo and /proc/meminfo emulation in Sysbox containers. We were hoping to get to this by last year's end, but looks like it will be closer to first half of this year.

felipecrs commented 2 years ago

For completeness, in a K8s environment, I was able to achieve this result by using:

Together with CPU Management Policy as Static:

Because for some reason, lxcfs isn't masking CPU by itself (but it works for memory).


lxcfs is easy to setup using this helm chart, but changing the CPU Management Policy configuration in kubelet can be a challenge depending on how your cluster is provisioned.

Not to mention that it would apply such a policy for all pods, without the option to customize.


So, it would still be very nice to have this feature in Sysbox itself. This would streamline the whole process to achieve VM-like containers.

rodnymolina commented 2 years ago

Thanks @felipecrs, that all makes sense. And yes, a Sysbox based approach should provide more flexibility by allowing the user to specify per-pod resources, ideally by honoring the cgroup constraints defined by the user.

felipecrs commented 7 months ago

I noticed this in the latest release notes:

Is there any chance this is implemented already?

ctalledo commented 7 months ago

I noticed this in the latest release notes:

  • Fix sysbox emulation of /proc and /sys in containers for kernels 6.5+

Hi @felipecrs, no I don't believe so; the fix described above refers to a problem where starting with kernel 6.5+, sysbox's emulation of /proc and /sys inside a container was totally broken due to a change in the kernel. It does not address this current issue unfortunately. Thanks.

felipecrs commented 7 months ago

Got it. Thank you!