open-policy-agent / opa-docker-authz

A policy-enabled authorization plugin for Docker.
Apache License 2.0
85 stars 26 forks source link

Unable to connect opa-docker-authz.sock #51

Open ramapalani opened 4 years ago

ramapalani commented 4 years ago

I'm trying to run OPA docker plugin as part of Daemonset DIND (docker-in-docker). Followed steps in this tutorial: https://www.openpolicyagent.org/docs/latest/docker-authorization/#goals

Only rule that in the rego file is to prevent privileged containers. This works as expected in a pre-prod environment. When we run this in prod env, it works as expected for about an hour, after that OPA plugin is not reachable. Docker logs has messages like these

time="2020-09-06T19:08:06.723350267Z" level=warning msg="Unable to connect to plugin: /run/docker/plugins/e680e3fff81e36d08a68f15256251be43a41a9a090f37f1c353f8d5fb95465a8/opa-docker-authz.sock/AuthZPlugin.AuthZReq: Post http://%2Frun%2Fdocker%2Fplugins%2Fe680e3fff81e36d08a68f15256251be43a41a9a090f37f1c353f8d5fb95465a8%2Fopa-docker-authz.sock/AuthZPlugin.AuthZReq: dial unix /run/docker/plugins/e680e3fff81e36d08a68f15256251be43a41a9a090f37f1c353f8d5fb95465a8/opa-docker-authz.sock: connect: connection refused, retrying in 1s"

time="2020-09-06T19:08:21.759791345Z" level=error msg="Handler for POST /v1.39/images/create returned error: plugin openpolicyagent/opa-docker-authz-v2:0.7 failed with error: Post http://%2Frun%2Fdocker%2Fplugins%2Fe680e3fff81e36d08a68f15256251be43a41a9a090f37f1c353f8d5fb95465a8%2Fopa-docker-authz.sock/AuthZPlugin.AuthZReq: dial unix /run/docker/plugins/e680e3fff81e36d08a68f15256251be43a41a9a090f37f1c353f8d5fb95465a8/opa-docker-authz.sock: connect: connection refused"

Daemonset definition:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: dind-daemonset
spec:
...
  template:
    spec:
      containers:
      - name: dind
        image: docker:18.09.5-dind
        command: ['sh', '-c', 'if [ -d /var/run/dind/docker.sock ]; then rm -rf /var/run/dind/docker.sock;fi && /usr/local/bin/dockerd-entrypoint.sh dockerd --storage-driver=overlay2 -H unix:///var/run/dind/docker.sock']
        lifecycle:
          postStart:
            exec:
              command: ["/bin/sh", "-c", "mkdir -p /etc/docker/policies && cp /etc/docker/opa-policy/authz.rego /etc/docker/policies && docker -H unix:///var/run/dind/docker.sock plugin install --grant-all-permissions openpolicyagent/opa-docker-authz-v2:0.7 opa-args=\"-policy-file /opa/policies/authz.rego\" && echo '{ \"authorization-plugins\": [\"openpolicyagent/opa-docker-authz-v2:0.7\"] }' > /etc/docker/daemon.json && kill -HUP $(pidof dockerd)"]
        securityContext:
          privileged: true
        volumeMounts:
        - name: varlibdocker
          mountPath: /var/lib/docker
        - name: rundind
          mountPath: /var/run/dind
        - name: opa-policy
          mountPath: /etc/docker/opa-policy
...
      volumes:
      - name: varlibdocker
        emptyDir: {}
      - name: opa-policy
        configMap: 
          name: docker-opa-policy
      - name: rundind
        hostPath:
          path: /var/run/dind/

authz.rego/

apiVersion: v1
kind: ConfigMap
metadata:
  name: docker-opa-policy
data:
  authz.rego: |-
    package docker.authz

    default allow = false

    allow {
        not input.Body.HostConfig.Privileged
    }
ashutosh-narkar commented 4 years ago

Are there any other logs ? Any more information from running docker plugin inspect ?

ashutosh-narkar commented 4 years ago

Also what's different between the pre-prod and prod environments ?

ramapalani commented 4 years ago

It is same except that traffic is more in prod

On Wed, Sep 16, 2020, 15:17 Ashutosh Narkar notifications@github.com wrote:

Also what's different between the pre-prod and prod environments ?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/open-policy-agent/opa-docker-authz/issues/51#issuecomment-693694772, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABV4VKUZFY2NXT6RJORFUYTSGE2N5ANCNFSM4RPPE65Q .

ashutosh-narkar commented 4 years ago

In that case, have you tried allotting more resources to check if the system is not exhausted ?

ramapalani commented 4 years ago

Ashutosh, You were right, both CPU and Memory(RAM) in the pod spiked way above the requested amount.

Resources in pod spec:

    resources:
      requests:
        memory: 4G
        cpu: 1

Actual consumption: image

We can increase the resources to a higher level but not to 25G, but memory keeps on increasing. I suspect a memory leak or other issue? Could you help me debug this?

Thanks, Rama

ramapalani commented 4 years ago

Actual consumption screenshot image

ashutosh-narkar commented 4 years ago

Memory usage typically depends on the size of the data and policy that you load into OPA. This page provides more details on resource utilization. Do you have an estimate of these values ?

ramapalani commented 4 years ago

This is the policy, it just evaluates only one field.

    package docker.authz

    default allow = false

    allow {
        not input.Body.HostConfig.Privileged
    }

I don't control the data, docker sends the input data to OPA plugin

Here is a sample input data with Body as null.

time="2020-09-05T19:40:15Z" level=error msg="2020/09/05 19:40:15 {\"config_hash\":\"f418bd1c862c2178ff5c93054aa8c8adae2ddae3aa90a68e4011c07d396839d4\",\"decision_id\":\"78c32ebd-a216-4ea1-a971-acbc879df361\",\"input\":{\"AuthMethod\":\"\",\"Body\":null,\"Headers\":{\"Accept-Encoding\":\"gzip\",\"Connection\":\"close\",\"User-Agent\":\"go-dockerclient\"},\"Method\":\"GET\",\"Path\":\"/images/sha256:xxxxxxcc040e350e848dd39bf1cabc09653adb7ede6f050cbd16a7503de6/json\",\"User\":\"\"},\"labels\":{\"app\":\"opa-docker-authz\",\"id\":\"b6b53359-69d3-45e8-acbf-b7258ea848cf\",\"opa_version\":\"v0.18.0\",\"plugin_version\":\"0.7\"},\"result\":true,\"timestamp\":\"2020-09-05T19:40:15.136801273Z\"}" plugin=e680e3fff81e36d08a68f15256251be43a41a9a090f37f1c353f8d5fb95465a8

When Body is not null, data is around 6kb.

In 60 minutes OPA docker plugin processed around 2000 request.

Is there a way for me to control the size of the data?

ramapalani commented 4 years ago

@ashutosh-narkar Can you suggest a way to reduce the data or another way to avoid this 'huge' memory consumption by OPA docker plugin?

ashutosh-narkar commented 4 years ago

The data seems pretty small. Have you documented OPA's memory usage with time ? And also how much memory have you allocated so far ?

ramapalani commented 4 years ago

resource request is 4GB, but the actual usage went upto 25GB and then connection to scoket is lost. So we had to start docker-DIND without OPA plugin to get it working back

ashutosh-narkar commented 4 years ago

@ramapalani Can you provide an example of how to reproduce the issue ? Any scripts that you have to simulate the traffic etc. would be helpful.

ramapalani commented 4 years ago

I'll try to reproduce this in our pre-prod environment and share it with you.

ramapalani commented 4 years ago

@ashutosh-narkar I'm trying to reproduce this in pre-prod env. As part of this effort, I was checking whether the socket is open every minute using a simple shell script. I also collect open file and processes running at the failed instance.

Though I'm not exactly reproduce the issue as in prod env, I see opa socket is not listening often. Here is one instance of the failure. Many times the next check works fine and but failures do happen frequently.

Test script

#!/bin/sh

if ! which socat ; then apk add socat; fi

function testsocket
{
    socket=$(find /run/docker/plugins/ -name "*.sock" | grep opa)
    socat -u OPEN:/dev/null UNIX-CONNECT:${socket}
    EXIT_CODE=$?
    if [ ${EXIT_CODE} -eq 0 ];
    then
        echo "$(date): Connection to Socket successful"
    else
        echo "$(date): Connection to Socket FAILED"
        echo "Open files"
        lsof | grep opa
        echo "Running processes"
        ps -ef
    fi
}

output_file=/tmp/testsocket.log
set -x
docker -H unix:///var/run/dind/docker.sock plugin ls | tee ${output_file}
docker -H unix:///var/run/dind/docker.sock plugin inspect openpolicyagent/opa-docker-authz-v2:0.7 | tee -a ${output_file}
set +x
while true
do
    testsocket | tee -a ${output_file}
    sleep 1
done

Failure

Tue Sep 22 21:28:33 UTC 2020: Connection to Socket successful
Tue Sep 22 21:28:34 UTC 2020: Connection to Socket successful
2020/09/22 21:28:35 socat[28132] E exiting on signal 11
Tue Sep 22 21:28:35 UTC 2020: Connection to Socket FAILED
Open files
219 /opa-docker-authz   /dev/null
219 /opa-docker-authz   pipe:[208415330]
219 /opa-docker-authz   pipe:[208415331]
219 /opa-docker-authz   anon_inode:[eventpoll]
219 /opa-docker-authz   pipe:[208410360]
219 /opa-docker-authz   pipe:[208410360]
219 /opa-docker-authz   socket:[208410361]
219 /opa-docker-authz   socket:[208482551]
Running processes
PID   USER     TIME  COMMAND
    1 root     13:45 dockerd --storage-driver=overlay2 -H unix:///var/run/dind/docker.sock
   24 root      0:10 containerd --config /var/run/docker/containerd/containerd.toml --log-level info
  201 root      0:00 containerd-shim -namespace plugins.moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/plugins.moby/2946790b93416011fcf7eed801b307afbea481a8d3992b6a538e91ede4bf96e8 -address /var/run/docker/containerd/containerd.sock -containerd-binary /usr/local/bin/containerd -runtime-root /run/docker/plugins/runtime-root
  219 root      0:11 /opa-docker-authz -policy-file /opa/policies/authz.rego
 5276 root      0:00 sh
 9215 root      0:00 sh
19177 root      0:00 sh
21420 root      0:00 {test-socket.sh} /bin/sh ./test-socket.sh
22230 root      0:00 tail -f /tmp/testsocket.log
28127 root      0:00 {test-socket.sh} /bin/sh ./test-socket.sh
28128 root      0:00 tee -a /tmp/testsocket.log
28136 root      0:00 ps -ef
Tue Sep 22 21:28:36 UTC 2020: Connection to Socket successful
Tue Sep 22 21:28:37 UTC 2020: Connection to Socket successful

Full log file is attached: testsocket.log

ashutosh-narkar commented 4 years ago

Hmm you're getting a segmentation fault. What system are you running this on ?

ramapalani commented 4 years ago

We run docker DIND (docker in docker) container as a Kuberenetes daemonset. This is the image docker:18.09.5-dind. OPA docker plugin is installed into this instance of docker.

ramapalani commented 4 years ago

@ashutosh-narkar I couldn't reproduce this issue in pre-prod environment, but we encounter this in production environment (with higher traffic) consistently after a short period.

So I created a custom plugin, to prevent privileged container creation and that works well.

ashutosh-narkar commented 4 years ago

That's great ! Is that custom plugin using OPA ?

ramapalani commented 4 years ago

No, created a fresh docker authorization plugin totally separate from OPA