hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.81k stars 1.94k forks source link

Nomad only pull images supported by defined helper #9302

Open isereb opened 3 years ago

isereb commented 3 years ago

Nomad version

Output from nomad version Nomad v0.12.7 (6147cb578794cb2d0c35d68fe1791728a09bb081)

Operating system and Environment details

cat /etc/os-release NAME="CentOS Linux" VERSION="7 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="7" PRETTY_NAME="CentOS Linux 7 (Core)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:7" HOME_URL="https://www.centos.org/" BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7"

Issue

I have my Nomad running on an on-prem machine and using ECR as a container registry. For a long time, I was setting up credentials helper so it would pull the images from the ECR. Once that was done, everything worked for a while. Until I realized, that once the image of envoyproxy/envoy was gone, sidecar for any service wouldn't start with an error: Failed to find docker auth for repo "envoyproxy/envoy": docker-credential-ecr-login with input "envoyproxy/envoy" failed with stderr: exit status 1. Clearly, for some reason, it tries to authenticate against docker.io with ecr-login cred helper.

Reproduction steps

nomad.hcl

data_dir = "/opt/nomad/data"
bind_addr = "0.0.0.0"

server {
  enabled = true
  bootstrap_expect = 1
}

client {
  enabled = true
  servers = ["127.0.0.1:4646"]
  host_volume "postgres" {
    path      = "/opt/postgres/data"
    read_only = false
  }
  options = {
    "docker.auth.config" = "/etc/docker/config.json"
    "docker.auth.helper" = "ecr-login"
  }
}

consul {
  address             = "127.0.0.1:8500"
  server_service_name = "nomad"
  client_service_name = "nomad-client"
  auto_advertise      = true
  server_auto_join    = true
  client_auto_join    = true
}

config.json

{
    "credHelpers": {
        "059899322608.dkr.ecr.us-east-1.amazonaws.com": "ecr-login"
    },
    "credsStore": "ecr-login"
}

Job file (if appropriate)

job "my_job" {

  datacenters = ["dc1"]
  type = "service"

  update {
    max_parallel = 1
    min_healthy_time = "10s"
    healthy_deadline = "3m"
    progress_deadline = "10m"
    auto_revert = false
    canary = 0
  }

  migrate {
    max_parallel = 1
    health_check = "checks"
    min_healthy_time = "10s"
    healthy_deadline = "5m"
  }

  group "postgres" {

    network {
      mode = "bridge"
    }

    volume "postgres" {
      type      = "host"
      read_only = false
      source    = "postgres"
    }

    restart {
      attempts = 3
      interval = "15m"
      delay = "15s"
      mode = "delay"
    }

    reschedule {
      attempts = 15
      interval = "1h"
      delay = "30s"
      delay_function = "exponential"
      max_delay = "120s"
      unlimited = false
    }

    service {
      name = "postgres"
      tags = ["infra", "db"]
      port = 5432

      connect {
        sidecar_service {}
      }

      check {
        type = "script"
        command = "pg_isready"
        interval = "60s"
        timeout = "5s"
        task = "postgres"

        check_restart {
          limit = 3
          grace = "60s"
          ignore_warnings = false
        }
      }
    }

    task "postgres" {

      driver = "docker"

      volume_mount {
        volume      = "postgres"
        destination = "/var/lib/postgresql/data"
        read_only   = false
      }

      config {
        image = "postgres:12"
      }

      env {
        POSTGRES_DB = ""
        POSTGRES_USER = ""
        POSTGRES_PASSWORD = ""
      }

      resources {
        cpu = 500
        memory = 1024
        network {
          mbits = 10
        }
      }
    }
  }

  group "my_task_group" {

    network {
      mode = "bridge"
      port "my_service_healthcheck" {
        to = -1
      }
    }

    restart {
      attempts = 18
      interval = "5m"
      delay = "15s"
      mode = "delay"
    }

    reschedule {
      attempts = 15
      interval = "1h"
      delay = "30s"
      delay_function = "exponential"
      max_delay = "120s"
      unlimited = false
    }

    service {
      name = "my_service"
      tags = ["microservice"]
      port = 8030

      check {
        name = "my-service-health"
        type = "http"
        port = "my_service_healthcheck"
        path = "/v1/health"
        interval = "10s"
        timeout = "5s"

        check_restart {
          limit = 3
          grace = "60s"
          ignore_warnings = false
        }
      }

      connect {
        sidecar_service {
          proxy {
            expose {
              path {
                path = "/v1/health"
                protocol = "http"
                local_path_port = 8030
                listener_port = "my_service_healthcheck"
              }
            }
            upstreams = {
              destination_name = "postgres"
              local_bind_port = 5432
            }
          }
        }
      }
    }

    task "my_task" {

      driver = "docker"

      config {
        image = "____ACCOUNT_ID____.dkr.ecr.____REGION____.amazonaws.com/image-name:tag"
        force_pull = true
      }

      env {
        DS_DB_NAME = ""
        DS_DB_USER = ""
        DS_DB_PASSWORD = ""
      }

      resources {
        cpu = 500
        memory = 512
        network {
          mbits = 10
        }
      }
    }
  }
}

Nomad Client/Server logs (if appropriate)

Nothing that would be related to this issue...

isereb commented 3 years ago

An intersting thing to notice, is that those images can be pulled using docker pull envoyproxy/envoy:v1.11.2 and this error won't appear anymore

shoenig commented 3 years ago

Hi @isereb, sorry for the slow reply. I suspect this is just a matter of not having configured the sidecar image to point at the image in your private ECR. This can currently be done in one of two ways: 1) set meta.connect.sidecar_image in the nomad client config docs

isereb commented 3 years ago

I do not want to put those images on the ECR. I want Nomad to be able to pull both: ECR images and docker.io images at the same time. I think this StackOverFlow question will shed more light.