Closed jterry75 closed 4 years ago
Should we default the gcs limit to 0 (no limit) instead? I'm not sure what the limit accomplishes since if it's hit, the whole pod will go down anyways, and is currently hard to troubleshoot in that state. We could leave in the ability to configure a limit via command line still.
Should we default the gcs limit to 0 (no limit) instead? I'm not sure what the limit accomplishes since if it's hit, the whole pod will go down anyways, and is currently hard to troubleshoot in that state. We could leave in the ability to configure a limit via command line still.
Wouldn't we at least know which container misbehaved? GCS or workload?
Should we default the gcs limit to 0 (no limit) instead? I'm not sure what the limit accomplishes since if it's hit, the whole pod will go down anyways, and is currently hard to troubleshoot in that state. We could leave in the ability to configure a limit via command line still.
I think as a best practice we should keep the gcs cgroup and use its limit. I agree that it will crash if it hits this limit but it would be nice to know. There is no reason that we should be seeing 50mb of usage and if we do keep seeing crashes we know this is not the real issue. Without the limit there is no way to know if a gcs is running with 200MB and stealing from workload memory.
DO NOT MERGE doing some studies to try and find the right numbers here.
Closing this since we took another fix for this issue: https://github.com/microsoft/opengcs/pull/372
Signed-off-by: Justin Terry (VM) juterry@microsoft.com