pwncollege / dojo

Infrastructure powering the pwn.college dojo
https://pwn.college
BSD 2-Clause "Simplified" License
279 stars 87 forks source link

User Container Metrics - Understanding Scaling #154

Open ConnorNelson opened 1 year ago

ConnorNelson commented 1 year ago

Measurements done using docker stats.

Just running a pwncollege-challenge container uses ~4MiB of memory. This low number makes sense, it's just a single bash process. In this state, neither VS Code, nor the desktop, are accessible. Of course. The processes aren't running.

After container start, starting VS code brings us up to ~72MiB, and then after a few moments settles down to ~58 MiB.

After container start, starting the desktop brings us up to ~174MiB.

After container start, starting both (aka /opt/pwn.college/docker-entrypoint.sh) brings us up to ~231MiB, and then after a few moments settles down to ~222MiB.


Looking at our user containers at a random point in time (Saturday @ ~4:00pm MST, 1 day before a 365 final deadline), we have:

import subprocess
import statistics

output = subprocess.check_output('docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}" | grep user_', shell=True)

cpus = []
mems = []
for line in output.strip().decode().splitlines():
    user, cpu, mem, *_ = line.split()
    cpu = float(cpu[:-1]) / 100
    mem = float(mem[:-3]) * (1024 if mem[-3:] == 'GiB' else 1)
    cpus.append(cpu)
    mems.append(mem)

print("### cpu")
print(f"- count: {len(cpus)}")
print(f"- total: {sum(cpus):.2f} vcpu")
print(f"- mean: {statistics.mean(cpus):.2f} vcpu")
print(f"- median: {statistics.median(cpus):.2f} vcpu")
print(f"- min: {min(cpus):.2f} vcpu")
print(f"- max: {max(cpus):.2f} vcpu")
print()
print("### memory")
print(f"- count: {len(mems)}")
print(f"- total: {sum(mems):.2f} MiB")
print(f"- mean: {statistics.mean(mems):.2f} MiB")
print(f"- median: {statistics.median(mems):.2f} MiB")
print(f"- min: {min(mems):.2f} MiB")
print(f"- max: {max(mems):.2f} MiB")

cpu

memory


User containers are constrained to 4vcpus, and 4GiB of memory.

Our current server has 40vcpu and 251.87 GiB of memory. Of course, we use the server for more than user containers (web service, database, cache, etc), but ignoring this for now, this means we are using:

ConnorNelson commented 1 year ago

An obvious method for reducing memory consumption is to limit the starting of VS Code and the desktop until a user wants to use them. Probably, we want to be smarter than this for the user experience. We can keep track (in the user's home directory?) of the last time they accessed the dojo through VS Code / desktop, and automatically start it on container start if they have used it in the last week (or if their most recent method of interacting with the dojo was that method, depending on how easy it is for us to collect that level of information).

We already do a similar thing for the virtual machine, only starting it automatically if the challenge has a .ko.

spencerpogo commented 2 months ago

What is left to do on this after #391 ?