hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.81k stars 1.94k forks source link

Problem with ECR #2233

Closed rokka-n closed 7 years ago

rokka-n commented 7 years ago

Nomad version

0.5.2

Operating system and Environment details

linux ubuntu

Issue

auth fails for docker images stored in ecr

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

/var/log/upstart/docker.log

time="2017-01-23T23:25:55.625133337Z" level=debug msg="Calling POST /images/create?fromImage=xxxxxxxxxxxx.dkr.ecr.us-west-2.amazonaws.com%2Fhello-world&tag=latest"
time="2017-01-23T23:25:55.637676581Z" level=debug msg="hostDir: /etc/docker/certs.d/xxxxxxxxxxxx.dkr.ecr.us-west-2.amazonaws.com"
time="2017-01-23T23:25:55.639529204Z" level=debug msg="hostDir: /etc/docker/certs.d/xxxxxxxxxxxx.dkr.ecr.us-west-2.amazonaws.com"
time="2017-01-23T23:25:55.639761681Z" level=debug msg="Trying to pull xxxxxxxxxxxx.dkr.ecr.us-west-2.amazonaws.com/hello-world from https://xxxxxxxxxxxx.dkr.ecr.us-west-2.amazonaws.com v2"
time="2017-01-23T23:25:55.648736306Z" level=error msg="Attempting next endpoint for pull after error: Get https://xxxxxxxxxxxx.dkr.ecr.us-west-2.amazonaws.com/v2/hello-world/manifests/latest: no basic auth credentials"
time="2017-01-23T23:25:55.648935771Z" level=debug msg="Trying to pull xxxxxxxxxxxx.dkr.ecr.us-west-2.amazonaws.com/hello-world from https://xxxxxxxxxxxx.dkr.ecr.us-west-2.amazonaws.com v1"
time="2017-01-23T23:25:55.649600844Z" level=debug msg="hostDir: /etc/docker/certs.d/xxxxxxxxxxxx.dkr.ecr.us-west-2.amazonaws.com"
time="2017-01-23T23:25:55.649840065Z" level=debug msg="attempting v1 ping for registry endpoint https://xxxxxxxxxxxx.dkr.ecr.us-west-2.amazonaws.com/v1/"
time="2017-01-23T23:25:55.659110984Z" level=debug msg="Error unmarshalling the _ping PingResult: invalid character 'N' looking for beginning of value"
time="2017-01-23T23:25:55.659293885Z" level=debug msg="PingResult.Version: \"\""
time="2017-01-23T23:25:55.659457903Z" level=debug msg="Registry standalone header: ''"
time="2017-01-23T23:25:55.659650740Z" level=debug msg="PingResult.Standalone: true"
time="2017-01-23T23:25:55.659827095Z" level=debug msg="Endpoint https://xxxxxxxxxxxx.dkr.ecr.us-west-2.amazonaws.com/v1/ is eligible for private registry. Enabling decorator."
time="2017-01-23T23:25:55.660043320Z" level=debug msg="[registry] Calling GET https://xxxxxxxxxxxx.dkr.ecr.us-west-2.amazonaws.com/v1/repositories/hello-world/images"
time="2017-01-23T23:25:55.668620146Z" level=error msg="Not continuing with pull after error: unauthorized: authentication required"
time="2017-01-23T23:25:55.909662064Z" level=debug msg="Calling GET /version"

Job file (if appropriate)

job "helloworld-v1" {
  region      = "us"
  datacenters = ["us-west-2"]
  type = "service"
  priority    = 50

  constraint {
    attribute = "${meta.machine_function}"
    value     = "nomad-client" #
  }

  update {
    stagger = "5s"
    max_parallel = 2
  }

  group "hello-group" {
    count = 6

    task "hello-task" {
      driver = "docker"
      config {
        image = "https://xxxxxx.dkr.ecr.us-west-2.amazonaws.com/hello-world"
        port_map {
          http = 80
        }
      }
      resources {
        cpu = 20
        memory = 20
        network {
          mbits = 1
          port "http" {}
        }
      }
dadgar commented 7 years ago

You should either specify auth using the auth block https://www.nomadproject.io/docs/drivers/docker.html#authentication or put it in a file and configure Nomad to read from that file: https://www.nomadproject.io/docs/drivers/docker.html#docker_auth_config

rokka-n commented 7 years ago

Hi Alex,

We went with option 2, and that's where it fails.

First option works just fine, but we're trying to avoid it since ecr credentials are ephemeral and will cause confusion for developers when expired.

/etc/nomad.d/client.hcl

client {
  enabled    = true
  node_class = "Linux"

  client_max_port = 15000

  options {
    "docker.auth.config" = "/root/.docker/config.json"
    "docker.cleanup.image"   = "0"
    "driver.raw_exec.enable" = "1"
  }

  meta {
    region       = "us"
    machine_type = "m3.medium"
    machine_function = "nomad-client"
  }
}
rokka-n commented 7 years ago

858

rokka-n commented 7 years ago

logs of nomad client, I don't see the line "Failed to find docker auth with key" in docker.go

    2017/01/24 04:13:48.576442 [INFO] client: Restarting task "hello-task" for alloc "f64724b3-b493-e180-fb94-d01ca7c5d2a3" in 18.534507544s
    2017/01/24 04:14:00.978182 [INFO] Failed to find docker auth with key https://xxxxxxxx.dkr.ecr.us-west-2.amazonaws.com
rokka-n commented 7 years ago

Oooh, it is a bug: if repository named as xxxxxx.dkr.ecr.us-west-2.amazonaws.com/blah-project/blah-image (compare to simply xxxxx.dkr.ecr.us-west-2.amazonaws.com/blah-image) - then everything works as expected.

Splitting strings fails somewhere in that docker.go :)

rokka-n commented 7 years ago

Actually, let me take the last statement back. It is some sort of problem between docker, ecr and nomad.

Nomad somehow can't authenticate with ecr when permissions stores in .docker/config.json as following:

{
        "auths": {
                "xxxxxxxxx.dkr.ecr.us-west-2.amazonaws.com": {
                        "auth": "xxx

But docker pull works fine with this formatting.

However, nomad works fine too if the address includes https, e.g

{
        "auths": {
                "https://xxxxxxxxx.dkr.ecr.us-west-2.amazonaws.com": {
                        "auth": "xxx

How could such a simple thing as http base auth got out of control and become such a mess?!

Btw, aws folks got frustrated with this and wrote helper: https://github.com/awslabs/amazon-ecr-credential-helper

dadgar commented 7 years ago

@rokka-n Can you try on Nomad 0.5.3 and report back? We have updated the way we parse auth blocks in the file to be inline with how docker itself does it.

rokka-n commented 7 years ago

@dadgar Confirming, 0.5.3 eliminated it. Thank you!

dadgar commented 7 years ago

Sweet!

github-actions[bot] commented 1 year ago

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.