Closed EronWright closed 2 weeks ago
Attention: Patch coverage is 75.86207%
with 14 lines
in your changes missing coverage. Please review.
Project coverage is 49.57%. Comparing base (
6fbcd4c
) to head (2fb59f1
). Report is 1 commits behind head on v2.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@blampe here's the article that convinced me to give the system a hint that we're trying to stay within the 'requests'. https://weaviate.io/blog/gomemlimit-a-game-changer-for-high-memory-applications
@blampe here's the article that convinced me to give the system a hint that we're trying to stay within the 'requests'. https://weaviate.io/blog/gomemlimit-a-game-changer-for-high-memory-applications
Right, rephrasing my earlier comment I don't think the agent falls into this high-memory category. It can run under 100MiB and handles one request at a time -- its heap should be pretty quiet :) Child processes will eat most of our memory, hence why it felt premature to me, but again it doesn't really matter.
Proposed changes
Implements good defaults for the workspace resource, using a "burstable" approach. Since a workspace pod's utilization is bursty - with low resource usage during idle times and with high resource usage during deployment ops - the pod requests a small amount of resources (64mb, 100m) to be able to idle. A deployment op is able to use much more memory - all available memory on the host.
Users may customize the resources (e.g. to apply different requests and/or limits). For large/complex Pulumi apps, it might make sense to reserve more memory and/or use https://github.com/pulumi/pulumi-kubernetes-operator/issues/694.
The agent takes some pains to stay within the requested amount, using a programmatic form of the GOMEMLIMIT environment variable. The agent detects the requested amount via the Downward API. We don't use
GOMEMLIMIT
to avoid propagating it to sub-processes, and because the format is a Kubernetes 'quantity'.It was observed that zombies weren't being cleaned up, and this was leading to resource exhaustion. Fixed by using tini as the entrypoint process (PID 1).
Related issues (optional)
Closes #698