hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.99k stars 1.96k forks source link

Nomad connect functionality not working with SELinux enabled #7290

Open apollo13 opened 4 years ago

apollo13 commented 4 years ago

Nomad version

Nomad v0.10.4 (f750636ca68e17dcd2445c1ab9c5a34f9ac69345)

Operating system and Environment details

Fedora 31, with 18.09.8

Issue

The envoy health check in consul stays red and var/log/audit/audit.log contains denials:

type=AVC msg=audit(1583672022.178:2020): avc:  denied  { write } for  pid=70868 comm="envoy" name="consul_grpc.sock" dev="tmpfs" ino=676989 scontext=system_u:system_r:container_t:s0:c121,c146 tcontext=unconfined_u:object_r:user_tmp_t:s0 tclass=sock_file permissive=0

After sentenforce 0 the health check turns green.

Reproduction steps

Run nomad agent -dev and consul agent -dev and deploy the job file from below

Job file (if appropriate)

job "example" {
    datacenters = ["dc1"]
    type = "service"
    update { max_parallel = 1 }
    group "http1" {
        network {
            mode = "bridge"
            port "http" { to=80 }
        }
                service {
                        port = "http"
                        name = "http1"

            connect {
                sidecar_service {}
            }
                }
        task "http1" {
            driver = "docker"
            config { image = "nginx" }
        }
    }
    group "http2" {
        network {
            mode = "bridge"
            port "http" { to=80 }
        }
        service {
            port = "http"
            name = "http2"

              connect {
            sidecar_service {
              proxy {
                upstreams {
                  destination_name = "http1"
                  local_bind_port  = 8080 
                }
              }
            }
              }

        }
        task "http2" {
            driver = "docker"
            config { image = "nginx" }
        }
    }
}

Consul logs have:

    2020-03-08T14:02:56.962+0100 [WARN]  agent: Check socket connection failed: check=service:_nomad-task-6d05e4c5-b5d8-2941-c6a4-dc9bb1e675c6-group-http2-http2-http-sidecar-proxy:1 error="dial tcp 127.0.0.1:30124: connect: connection refused"
    2020-03-08T14:02:56.963+0100 [WARN]  agent: Check is now critical: check=service:_nomad-task-6d05e4c5-b5d8-2941-c6a4-dc9bb1e675c6-group-http2-http2-http-sidecar-proxy:1
    2020-03-08T14:03:00.781+0100 [WARN]  agent: Check socket connection failed: check=service:_nomad-task-a6ae689e-6b3c-206d-5b58-0562248a595c-group-http1-http1-http-sidecar-proxy:1 error="dial tcp 127.0.0.1:20423: connect: connection refused"
    2020-03-08T14:03:00.781+0100 [WARN]  agent: Check is now critical: check=service:_nomad-task-a6ae689e-6b3c-206d-5b58-0562248a595c-group-http1-http1-http-sidecar-proxy:1

The other logs do not contain anything interesting sadly.

apollo13 commented 4 years ago

This got away by setting:

plugin "docker" {
    config {
        volumes {
             enabled      = true
             selinuxlabel = "z"
        }
    }
}

in the nomad config.

There still seems to be a selinux issue because now I get:

type=AVC msg=audit(1583680590.931:2730): avc:  denied  { connectto } for  pid=88248 comm="envoy" path="/opt/nomad/alloc/016ddeb3-3253-e6aa-7795-1b8dea187224/alloc/tmp/consul_grpc.sock" scontext=system_u:system_r:container_t:s0:c764,c777 tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=unix_stream_socket permissive=1
type=AVC msg=audit(1583680590.974:2731): avc:  denied  { connectto } for  pid=88250 comm="envoy" path="/opt/nomad/alloc/cb3eb8f7-f82c-ece8-d3a8-5fc310771358/alloc/tmp/consul_grpc.sock" scontext=system_u:system_r:container_t:s0:c179,c881 tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=unix_stream_socket permissive=1
apollo13 commented 4 years ago

The blogpost https://danwalsh.livejournal.com/81143.html has a good explanation of why this is not working. The best thing to do here is probably --security-opt label=disable for the envoy container. Would this be a possibility?

apollo13 commented 4 years ago

I was able to manually fix the sidecars via:

        sidecar_task { config { security_opt = ["label=disable"] } }

in the connect stanza :)

tgross commented 4 years ago

@shoenig I'm not sure if this is something we could explore improving in the documentation / guide?

apollo13 commented 4 years ago

@tgross For what it's worth, even a simple "beware does not work well with default selinux rules" would probably go far. I guess the main question is: Is active selinux a supported mode of operation for nomad. The same question could probably be asked for app-armor.

If the answer is yes, then the next question becomes: to which extend do you want to support it.

stale[bot] commented 4 years ago

Hey there

Since this issue hasn't had any activity in a while - we're going to automatically close it in 30 days. If you're still seeing this issue with the latest version of Nomad, please respond here and we'll keep this open and take another look at this.

Thanks!

apollo13 commented 4 years ago

bump

greut commented 4 years ago

selinuxlabel is troublesome, https://github.com/hashicorp/nomad/pull/7094

ahjohannessen commented 2 years ago

I am having similar issues with Fedora CoreOS 35.x. The envoy container is having permission issues:

chown: changing ownership of '/dev/stderr': Permission denied

Is there a way to configure nomad clients to use the following as default?

       connect {
         sidecar_task { 
           config { 
            security_opt = ["label=disable"] 
           } 
         }         
       }

E.g. set it once for nomad clients on Fedora CoreOS instead of per job. I could tweak /etc/sysconfig/selinux, but I would rather not deal with that.

ahjohannessen commented 2 years ago

I suppose something that allows me to override the default Envoy task:

meta.connect.sidecar_task_security_opt = ["label=disable"]

similarly to how you allow users to adjust:

meta.connect.sidecar_image = "envoyproxy/envoy:1.21.1"

I guess a better way of dealing with this would having the Nomad agent detect if SELinux is present and do the necessary adjustments for envoy to work, however I would appreciate the ability to set the label via meta in the client config.