losfair / blueboat

All-in-one, multi-tenant serverless JavaScript runtime.
https://blueboat.io
Apache License 2.0
1.93k stars 56 forks source link

MemoryWatermark tune operation interferes all apps. #96

Open lmxia opened 2 years ago

lmxia commented 2 years ago

What happened:

Sandbox not full isolated.

What you expected to happen:

If app A cost massive memory which result in availble memory less than Critical MemoryWatermark, then the runtime will tune smr param: "MAX_WORKERS_PER_APP", that's now the logic. But the tune operation will scale in the running app B, the scale in operation is triggered by app A.

That sound like not a good isolation behavior, apps interfere each other.

losfair commented 2 years ago

Memory watermark changes are shipped to a central system log through Kafka. The control plane is expected to monitor the log for High memory watermarks, and route new requests away from the affected instances, so most of the time the Critical watermark will not be triggered. If for some reason memory usage continues to grow, we rely on a few best-effort defenses to keep the system running in a degraded state:

But indeed this is a bug in performance isolation. Currently Blueboat does not have a hard per-worker memory limit, so it is possible to trigger the Critical watermark from a single worker pretty quickly, before the control plane has time to respond.

The solution would be to add per-worker resident set size limit.

losfair commented 2 years ago

Would you be interested to work on this? :)

lmxia commented 2 years ago

yes, sure, I woul like to that.

losfair commented 2 years ago

Happy to review your PR!

losfair commented 2 years ago

There are several approaches for implementing per-process RSS limit:

  1. cgroup: Put each worker into its own memory cgroup.

Pro: The limit is accurate. Con: May not play well with a sandboxed environment (seccomp/dropped privileges)

  1. rlimit: use RLIMIT_AS to limit the address space (VSZ) of each worker process.

Pro: Simple and plays well with OS-level sandboxing. Con: Prevents us from enabling V8 virtual memory tricks for optimizing WebAssembly memory accesses.

On the API side, memory limit should be passed to the runtime as a field in Metadata.