Control CPU share allocation proportional to memory requirement with cgroup

LedgeDash commented 5 years ago

Currently, when a function's CPU field is set to 1, it gets one hyperthread. In other words, on a 20-core machine with hyperthreading, the total number of VMs is capped at 40.

We need to instead control CPU share based on memory requirement. According to Lambda:

Lambda allocates CPU power linearly in proportion to the amount of memory configured. At 1,792 MB, a function has the equivalent of 1 full vCPU (one vCPU-second of credits per second). https://docs.aws.amazon.com/lambda/latest/dg/resource-model.html

The cloudlab machine I have is 164GB with 20 cores (40 hyperthreads). At maximum, it can host 54 VMs of 3008MB memory. This is not an insignificant number of VMs. So I think we could just replicate exactly what Lambda does

alevy commented 5 years ago

There are larger cloudlab machines as well... particularly I think in the clemson cluster. If we need it.

LedgeDash commented 5 years ago

@alevy would we need it do you think?

alevy commented 5 years ago

Depends on the numbers. Generally more is probably better in this case, but only if we have the workload to support it. Let's see how the first set of results play out.

LedgeDash commented 5 years ago

I'm trying to implement this but not sure how to set the CPU share for each vm process and the number of vCPU for each vm. If I understand correctly, there are 2 separate variables that we can set: 1. cpu share for the VM process controlled through Cgroup. 2. vcpu_count in the VmConfig struct. The two seem to be independent of each other (?).

My question is: for a machine with 164GB memory and 40 cores, how to set the Cgroup cpu share such that a 1792MB VM would get the equivalent of 1 full vCPU (one vCPU-second of credits per second)? Does this mean its CPU share should be 1/40, i.e., a hyperthread? If so, what happens when there are more than 40 VMs of 1792MB running on the system since there's enough memory to do this?

alevy commented 5 years ago

There are two choices we make:

How many threads (or Firecracker VCPUs) do we give each VM, which limits the amount of parallelism is has.
What share of the CPU does each VM get relative to other VMs

Neither of these is really related to the size of the machine, this the size of the machine does sort of impact how many VMs we can run concurrently and actually provide the CPU shares we're guaranteeing.

So first, high level math.

Assume a 164GB machine with 40 cores (80 hyperthreads). To make the math work out better, and also to reserve some memory for the rest of the system, let's actually use 160GB (which divides eventy both into 80 hyperthreads and into 128MB, and leaves us with a respectable 4GB for non-workload tasks, like the filesystem cache).

160GB / 80 hyperthreads = a VM with 2GB should get a full hyperthread's worth of CPU

Note that this is also true for a 80GB machine with 40 hyperthreads. The important part is the ratio between memory and CPU on the machine, not the absolute values.

Other VM configurations should scale proportionally, so a 128MB VM should get 1/16th the CPU share of a 1GB VM.

Now for settings:

Firecracker VCPUs is a step function.

Any VM <= 2GB should get 1 VCPU, > 2GB but <= 4GB should get 2 VCPUs, > 4GB but <= 6GB 3 VCPUs, etc (if we're only going up to ~3GB, then obviously we only need to worry about 1 vs. 2 VCPUs.

Share of the CPU is relative to other processes, on a scale from 0-100. so it's sufficient to use fixed terms based on the smallest unit:

a 128MB VM gets 1 cpu share a 256MB VM gets 2 cpu shares ... a 2GB VM gets 16 cpu shares ... a 6GB VM gets 48 cpu shares

If we want to support intermediate memory sizes, we can't use fractions, but we can just scale all the values up, since they just need to be relative.

LedgeDash commented 5 years ago

Just to make sure I fully understand, a few follow up questions:

When lambda says:

At 1,792 MB, a function has the equivalent of 1 full vCPU (one vCPU-second of credits per second). the "1 full vCPU" means one hyperthread worth of CPU share.

This means for a 20-core (40 hyperthreads) machines, it should get 1/40 of CPU share through Cgroup. Is this understanding correct?

Initially I was thinking about using the total_memory to num_of_hyperthreads ratio to decide what size VM should get a full hyperthread share of CPU (I believe if I understand correctly is what you described in the comment above). However, this would mean that we won't follow exact what Lambda does, which is VMs of 1792MB gets a full hyperthread. Would this be a problem? I can't think of anything but I'm also inexperienced in predicting what reviewers might say.
So if I understand correctly, the algorithm would go something like this:
1. total_memory/total_number_of_hyperthreads = VM_size_that_would_get_one_full_hyperthread
2. CPU_share_of_128MB_VM = 128 / (VM_size_that_would_get_one_full_hyperthread total_number_of_hyperthreads). using your example of 160GM with 80 hyperthread: 160GB / 80 hyperthreads = a VM with 2GB should get a full hyperthread's worth of CPU. So a VM of 128MB would get 128MB/(2048MB80) = 1/1280 share of CPU
3. (continuing with the example in the previous bullet point) Since our VM sizes are multiples of 128MB, we can just go multiples of 1/1280 share of CPU
4. And VM vCPU count is set exactly as you described.

alevy commented 5 years ago

Correct
The spirit is the same, where I'm assuming that the machines Lambda's run on have a different ratio of memory to CPU (or available memory to available CPU, if the memory and CPU are also used for non-Lambda related tasks). Note that we're not very far off. Lambda's ratio suggests that they have machines with ratios along the lines of 80 hyperthreads and 140GB of memory.
...
1. Yes
2. Technically yes, but that's really a coherent way of expressing CPU share to Linux's cgroups subsystem. You express it as a number between 0-100. If two process both have the same number, they get an equal CPU share (i.e. half each). If three have the same number they get an equal CPU share (i.e. a third each). So, literally what I described is what we should do. A 128MB VM gets a CPU "share" of 1, a 2GB VM gets a CPU share of 16.
3. ^
4. Yes

princeton-sns / firecracker-tools

Control CPU share allocation proportional to memory requirement with cgroup #25