Closed blampe closed 2 weeks ago
The baseline stats for random-yaml with 1-minute resync interval.
Zombie processes do seem to accumulate in the workspace pod, given a per-minute resync:
pulumi@random-yaml-workspace-0:/$ ps auxwww
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
pulumi 1 0.0 0.3 1248856 14268 ? Ssl 16:07 0:01 /share/agent serve --workspace /share/workspace --skip-install
pulumi 46 0.0 0.0 0 0 ? Z 16:07 0:00 [pulumi-language] <defunct>
pulumi 75 0.0 0.0 0 0 ? Z 16:07 0:00 [pulumi-language] <defunct>
pulumi 236 0.0 0.0 0 0 ? Z 16:07 0:00 [pulumi-language] <defunct>
pulumi 256 0.0 0.0 0 0 ? Z 16:07 0:00 [pulumi-resource] <defunct>
pulumi 271 0.0 0.0 0 0 ? Z 16:07 0:00 [pulumi-resource] <defunct>
pulumi 400 0.0 0.0 0 0 ? Z 16:08 0:00 [pulumi-language] <defunct>
pulumi 415 0.0 0.0 0 0 ? Z 16:08 0:00 [pulumi-resource] <defunct>
pulumi 431 0.0 0.0 0 0 ? Z 16:08 0:00 [pulumi-resource] <defunct>
pulumi 563 0.0 0.0 0 0 ? Z 16:09 0:00 [pulumi-language] <defunct>
pulumi 579 0.0 0.0 0 0 ? Z 16:09 0:00 [pulumi-resource] <defunct>
pulumi 594 0.0 0.0 0 0 ? Z 16:09 0:00 [pulumi-resource] <defunct>
pulumi 724 0.0 0.0 0 0 ? Z 16:10 0:00 [pulumi-language] <defunct>
pulumi 739 0.0 0.0 0 0 ? Z 16:10 0:00 [pulumi-resource] <defunct>
pulumi 753 0.0 0.0 0 0 ? Z 16:10 0:00 [pulumi-resource] <defunct>
pulumi 886 0.0 0.0 0 0 ? Z 16:11 0:00 [pulumi-language] <defunct>
pulumi 901 0.0 0.0 0 0 ? Z 16:11 0:00 [pulumi-resource] <defunct>
pulumi 917 0.0 0.0 0 0 ? Z 16:11 0:00 [pulumi-resource] <defunct>
pulumi 1044 0.0 0.0 0 0 ? Z 16:12 0:00 [pulumi-language] <defunct>
pulumi 1059 0.0 0.0 0 0 ? Z 16:12 0:00 [pulumi-resource] <defunct>
pulumi 1075 0.0 0.0 0 0 ? Z 16:12 0:00 [pulumi-resource] <defunct>
pulumi 1205 0.0 0.0 0 0 ? Z 16:13 0:00 [pulumi-language] <defunct>
pulumi 1220 0.0 0.0 0 0 ? Z 16:13 0:00 [pulumi-resource] <defunct>
pulumi 1236 0.0 0.0 0 0 ? Z 16:13 0:00 [pulumi-resource] <defunct>
pulumi 1368 0.0 0.0 0 0 ? Z 16:14 0:00 [pulumi-language] <defunct>
pulumi 1383 0.0 0.0 0 0 ? Z 16:14 0:00 [pulumi-resource] <defunct>
...
Likely related to https://github.com/pulumi/pulumi/issues/17361
These measurements made after "zombie" process issue was fixed.
After another hour of periodic execution:
And another:
A case of failed updates causing a lot more interactions with the workspace:
With all fixes:
Manager has limits on it already -- currently has guaranteed QoS.
Related to #694 and probably a pre-req -- set a small request limit to give the workspace pod burst-able QoS.
Additional considerations:
SetMemoryLimit
in code?).