Open ramapalani opened 4 years ago
Are there any other logs ? Any more information from running docker plugin inspect
?
Also what's different between the pre-prod and prod environments ?
It is same except that traffic is more in prod
On Wed, Sep 16, 2020, 15:17 Ashutosh Narkar notifications@github.com wrote:
Also what's different between the pre-prod and prod environments ?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/open-policy-agent/opa-docker-authz/issues/51#issuecomment-693694772, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABV4VKUZFY2NXT6RJORFUYTSGE2N5ANCNFSM4RPPE65Q .
In that case, have you tried allotting more resources to check if the system is not exhausted ?
Ashutosh, You were right, both CPU and Memory(RAM) in the pod spiked way above the requested amount.
Resources in pod spec:
resources:
requests:
memory: 4G
cpu: 1
Actual consumption:
We can increase the resources to a higher level but not to 25G, but memory keeps on increasing. I suspect a memory leak or other issue? Could you help me debug this?
Thanks, Rama
Actual consumption screenshot
Memory usage typically depends on the size of the data and policy that you load into OPA. This page provides more details on resource utilization. Do you have an estimate of these values ?
This is the policy, it just evaluates only one field.
package docker.authz
default allow = false
allow {
not input.Body.HostConfig.Privileged
}
I don't control the data, docker sends the input data to OPA plugin
Here is a sample input data with Body as null.
time="2020-09-05T19:40:15Z" level=error msg="2020/09/05 19:40:15 {\"config_hash\":\"f418bd1c862c2178ff5c93054aa8c8adae2ddae3aa90a68e4011c07d396839d4\",\"decision_id\":\"78c32ebd-a216-4ea1-a971-acbc879df361\",\"input\":{\"AuthMethod\":\"\",\"Body\":null,\"Headers\":{\"Accept-Encoding\":\"gzip\",\"Connection\":\"close\",\"User-Agent\":\"go-dockerclient\"},\"Method\":\"GET\",\"Path\":\"/images/sha256:xxxxxxcc040e350e848dd39bf1cabc09653adb7ede6f050cbd16a7503de6/json\",\"User\":\"\"},\"labels\":{\"app\":\"opa-docker-authz\",\"id\":\"b6b53359-69d3-45e8-acbf-b7258ea848cf\",\"opa_version\":\"v0.18.0\",\"plugin_version\":\"0.7\"},\"result\":true,\"timestamp\":\"2020-09-05T19:40:15.136801273Z\"}" plugin=e680e3fff81e36d08a68f15256251be43a41a9a090f37f1c353f8d5fb95465a8
When Body is not null, data is around 6kb.
In 60 minutes OPA docker plugin processed around 2000 request.
Is there a way for me to control the size of the data?
@ashutosh-narkar Can you suggest a way to reduce the data or another way to avoid this 'huge' memory consumption by OPA docker plugin?
The data seems pretty small. Have you documented OPA's memory usage with time ? And also how much memory have you allocated so far ?
resource request is 4GB, but the actual usage went upto 25GB and then connection to scoket is lost. So we had to start docker-DIND without OPA plugin to get it working back
@ramapalani Can you provide an example of how to reproduce the issue ? Any scripts that you have to simulate the traffic etc. would be helpful.
I'll try to reproduce this in our pre-prod environment and share it with you.
@ashutosh-narkar I'm trying to reproduce this in pre-prod env. As part of this effort, I was checking whether the socket is open every minute using a simple shell script. I also collect open file and processes running at the failed instance.
Though I'm not exactly reproduce the issue as in prod env, I see opa socket is not listening often. Here is one instance of the failure. Many times the next check works fine and but failures do happen frequently.
Test script
#!/bin/sh
if ! which socat ; then apk add socat; fi
function testsocket
{
socket=$(find /run/docker/plugins/ -name "*.sock" | grep opa)
socat -u OPEN:/dev/null UNIX-CONNECT:${socket}
EXIT_CODE=$?
if [ ${EXIT_CODE} -eq 0 ];
then
echo "$(date): Connection to Socket successful"
else
echo "$(date): Connection to Socket FAILED"
echo "Open files"
lsof | grep opa
echo "Running processes"
ps -ef
fi
}
output_file=/tmp/testsocket.log
set -x
docker -H unix:///var/run/dind/docker.sock plugin ls | tee ${output_file}
docker -H unix:///var/run/dind/docker.sock plugin inspect openpolicyagent/opa-docker-authz-v2:0.7 | tee -a ${output_file}
set +x
while true
do
testsocket | tee -a ${output_file}
sleep 1
done
Failure
Tue Sep 22 21:28:33 UTC 2020: Connection to Socket successful
Tue Sep 22 21:28:34 UTC 2020: Connection to Socket successful
2020/09/22 21:28:35 socat[28132] E exiting on signal 11
Tue Sep 22 21:28:35 UTC 2020: Connection to Socket FAILED
Open files
219 /opa-docker-authz /dev/null
219 /opa-docker-authz pipe:[208415330]
219 /opa-docker-authz pipe:[208415331]
219 /opa-docker-authz anon_inode:[eventpoll]
219 /opa-docker-authz pipe:[208410360]
219 /opa-docker-authz pipe:[208410360]
219 /opa-docker-authz socket:[208410361]
219 /opa-docker-authz socket:[208482551]
Running processes
PID USER TIME COMMAND
1 root 13:45 dockerd --storage-driver=overlay2 -H unix:///var/run/dind/docker.sock
24 root 0:10 containerd --config /var/run/docker/containerd/containerd.toml --log-level info
201 root 0:00 containerd-shim -namespace plugins.moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/plugins.moby/2946790b93416011fcf7eed801b307afbea481a8d3992b6a538e91ede4bf96e8 -address /var/run/docker/containerd/containerd.sock -containerd-binary /usr/local/bin/containerd -runtime-root /run/docker/plugins/runtime-root
219 root 0:11 /opa-docker-authz -policy-file /opa/policies/authz.rego
5276 root 0:00 sh
9215 root 0:00 sh
19177 root 0:00 sh
21420 root 0:00 {test-socket.sh} /bin/sh ./test-socket.sh
22230 root 0:00 tail -f /tmp/testsocket.log
28127 root 0:00 {test-socket.sh} /bin/sh ./test-socket.sh
28128 root 0:00 tee -a /tmp/testsocket.log
28136 root 0:00 ps -ef
Tue Sep 22 21:28:36 UTC 2020: Connection to Socket successful
Tue Sep 22 21:28:37 UTC 2020: Connection to Socket successful
Full log file is attached: testsocket.log
Hmm you're getting a segmentation fault. What system are you running this on ?
We run docker DIND (docker in docker) container as a Kuberenetes daemonset. This is the image docker:18.09.5-dind. OPA docker plugin is installed into this instance of docker.
@ashutosh-narkar I couldn't reproduce this issue in pre-prod environment, but we encounter this in production environment (with higher traffic) consistently after a short period.
So I created a custom plugin, to prevent privileged container creation and that works well.
That's great ! Is that custom plugin using OPA ?
No, created a fresh docker authorization plugin totally separate from OPA
I'm trying to run OPA docker plugin as part of Daemonset DIND (docker-in-docker). Followed steps in this tutorial: https://www.openpolicyagent.org/docs/latest/docker-authorization/#goals
Only rule that in the rego file is to prevent privileged containers. This works as expected in a pre-prod environment. When we run this in prod env, it works as expected for about an hour, after that OPA plugin is not reachable. Docker logs has messages like these
time="2020-09-06T19:08:06.723350267Z" level=warning msg="Unable to connect to plugin: /run/docker/plugins/e680e3fff81e36d08a68f15256251be43a41a9a090f37f1c353f8d5fb95465a8/opa-docker-authz.sock/AuthZPlugin.AuthZReq: Post http://%2Frun%2Fdocker%2Fplugins%2Fe680e3fff81e36d08a68f15256251be43a41a9a090f37f1c353f8d5fb95465a8%2Fopa-docker-authz.sock/AuthZPlugin.AuthZReq: dial unix /run/docker/plugins/e680e3fff81e36d08a68f15256251be43a41a9a090f37f1c353f8d5fb95465a8/opa-docker-authz.sock: connect: connection refused, retrying in 1s"
time="2020-09-06T19:08:21.759791345Z" level=error msg="Handler for POST /v1.39/images/create returned error: plugin openpolicyagent/opa-docker-authz-v2:0.7 failed with error: Post http://%2Frun%2Fdocker%2Fplugins%2Fe680e3fff81e36d08a68f15256251be43a41a9a090f37f1c353f8d5fb95465a8%2Fopa-docker-authz.sock/AuthZPlugin.AuthZReq: dial unix /run/docker/plugins/e680e3fff81e36d08a68f15256251be43a41a9a090f37f1c353f8d5fb95465a8/opa-docker-authz.sock: connect: connection refused"
Daemonset definition:
authz.rego/