Set some default resource requests on the workspace pod

EronWright commented 2 weeks ago

Proposed changes

Implements good defaults for the workspace resource, using a "burstable" approach. Since a workspace pod's utilization is bursty - with low resource usage during idle times and with high resource usage during deployment ops - the pod requests a small amount of resources (64mb, 100m) to be able to idle. A deployment op is able to use much more memory - all available memory on the host.

Users may customize the resources (e.g. to apply different requests and/or limits). For large/complex Pulumi apps, it might make sense to reserve more memory and/or use https://github.com/pulumi/pulumi-kubernetes-operator/issues/694.

The agent takes some pains to stay within the requested amount, using a programmatic form of the GOMEMLIMIT environment variable. The agent detects the requested amount via the Downward API. We don't use GOMEMLIMIT to avoid propagating it to sub-processes, and because the format is a Kubernetes 'quantity'.

It was observed that zombies weren't being cleaned up, and this was leading to resource exhaustion. Fixed by using tini as the entrypoint process (PID 1).

Related issues (optional)

Closes #698

codecov[bot] commented 2 weeks ago

Codecov Report

Attention: Patch coverage is 75.86207% with 14 lines in your changes missing coverage. Please review.

Project coverage is 49.57%. Comparing base (6fbcd4c) to head (2fb59f1). Report is 1 commits behind head on v2.

Files with missing lines	Patch %	Lines
agent/cmd/serve.go	0.00%	5 Missing :warning:
...nternal/webhook/auto/v1alpha1/workspace_webhook.go	78.94%	2 Missing and 2 partials :warning:
operator/cmd/main.go	0.00%	3 Missing :warning:
...r/internal/controller/auto/workspace_controller.go	93.54%	1 Missing and 1 partial :warning:

Additional details and impacted files

```diff @@ Coverage Diff @@ ## v2 #707 +/- ## ========================================== + Coverage 49.37% 49.57% +0.19% ========================================== Files 26 27 +1 Lines 2882 2911 +29 ========================================== + Hits 1423 1443 +20 - Misses 1284 1290 +6 - Partials 175 178 +3 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

EronWright commented 2 weeks ago

@blampe here's the article that convinced me to give the system a hint that we're trying to stay within the 'requests'. https://weaviate.io/blog/gomemlimit-a-game-changer-for-high-memory-applications

blampe commented 2 weeks ago

@blampe here's the article that convinced me to give the system a hint that we're trying to stay within the 'requests'. https://weaviate.io/blog/gomemlimit-a-game-changer-for-high-memory-applications

Right, rephrasing my earlier comment I don't think the agent falls into this high-memory category. It can run under 100MiB and handles one request at a time -- its heap should be pretty quiet :) Child processes will eat most of our memory, hence why it felt premature to me, but again it doesn't really matter.

pulumi / pulumi-kubernetes-operator