forced container replacement with 2.7.0

stephennancekivell commented 4 years ago

Hi, thanks for the project.

Im having a problem where repeated docker apply's are causing the same container to be rebuilt for no reason. This does not happen with the docker provider 2.6.0

Terraform Version

terraform -v
Terraform v0.12.13
+ provider.docker v2.7.0

Affected Resource(s)

docker_container

Terraform Configuration Files

provider "docker" {
  version = "~> 2.7.0"
}

resource "docker_container" "nginx" {
  image      = "nginx"
  name       = "nginx"
  restart    = "on-failure"
  must_run   = "true"
}

Debug Output

https://gist.github.com/stephennancekivell/fb5a3113daa6d47a1350cc25721f386f

Expected Behavior

A terraform apply (or plan) should show no difference after the change has been applied.

Actual Behavior

terraform apply (and plan) always show that the container needs to be re-built as shown.

Steps to Reproduce

Use docker container like in the provided config.
terraform apply a first time. It should successfully create the container.
run terraform apply again, without changing anything. This will show that it wants to re-build the container. It should not do this.

The linked console output shows the diff. Note it includes lots of settings that I am not explicitly setting. Maybe they are default values or something. Also the image Id is showing instead of the name. Also some ENV vars im not setting. https://gist.github.com/stephennancekivell/39eeabe0ef085cbcb7a06816628c7994

2020/02/18 15:58:03 [DEBUG] command: asking for input: "Do you want to perform these actions?"
  # docker_container.nginx must be replaced
-/+ resource "docker_container" "nginx" {
        attach            = false
      + bridge            = (known after apply)
      ~ command           = [
          - "nginx",
          - "-g",
          - "daemon off;",
        ] -> (known after apply)
      + container_logs    = (known after apply)
      - cpu_shares        = 0 -> null
      - dns               = [] -> null
      - dns_opts          = [] -> null
      - dns_search        = [] -> null
      ~ entrypoint        = [] -> (known after apply)
      ~ env               = [
          - "NGINX_VERSION=1.17.8",
          - "NJS_VERSION=0.3.8",
          - "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
          - "PKG_RELEASE=1~buster",
        ] -> (known after apply)
      + exit_code         = (known after apply)
      ~ gateway           = "172.17.0.1" -> (known after apply)
      - group_add         = [] -> null
      ~ hostname          = "08d38a7c3056" -> (known after apply)
      ~ id                = "08d38a7c3056e96dcfb69e32eab9c9d0c330a61b7b6def2775f24fa3fcbff967" -> (known after apply)
      ~ image             = "sha256:2073e0bcb60ee98548d313ead5eacbfe16d9054f8800a32bedd859922a99a6e1" -> "nginx" # forces replacement
      ~ ip_address        = "172.17.0.2" -> (known after apply)
      ~ ip_prefix_length  = 16 -> (known after apply)
      ~ ipc_mode          = "private" -> (known after apply)
      - links             = [] -> null
        log_driver        = "json-file"
      - log_opts          = {} -> null
        logs              = false
      - max_retry_count   = 0 -> null
      - memory            = 0 -> null
      - memory_swap       = 0 -> null
        must_run          = true
        name              = "nginx"
      ~ network_data      = [
          - {
              - gateway          = "172.17.0.1"
              - ip_address       = "172.17.0.2"
              - ip_prefix_length = 16
              - network_name     = "bridge"
            },
        ] -> (known after apply)
      - network_mode      = "default" -> null
      - privileged        = false -> null
      - publish_all_ports = false -> null
        read_only         = false
        restart           = "on-failure"
        rm                = false
      ~ shm_size          = 64 -> (known after apply)
        start             = true
      - sysctls           = {} -> null
      - tmpfs             = {} -> null

      - labels {
          - label = "maintainer" -> null
          - value = "NGINX Docker Maintainers <docker-maint@nginx.com>" -> null
        }
      + labels {
          + label = (known after apply)
          + value = (known after apply)
        }
    }

Plan: 1 to add, 0 to change, 1 to destroy.

Fenkiou commented 4 years ago

I had the issue with docker_service in the 2.6 version.

But the issue was a good thing because it forced me to use an explicit version of the image, I was using myimage:latest and using latest isn't a good thing, we should stick to version numbers.

Specify the exact image version you want to use and terraform won't try to recreate the container, except if the image is different than the one you're actually using

cvandesande commented 4 years ago

Not sure if I'm doing something wrong, but I was using latest. Then I applied using an exact version. Now planning still wants to recreate (images haven't changed) but I am using a private registry.

      ~ hostname          = "664a9702e9e7" -> (known after apply)
      ~ id                = "664a9702e9e74454e68c8d750896b9aaa1930f0699eb9e64c542c4d65859a093" -> (known after apply)
      ~ image             = "sha256:6f7a6c626c16aab8f6c80a2362ba019f1518fba185110577fafa1e0573ff1252" -> "registry.opendmz.com/yul-syncthing:1.0" # forces replacement

stephennancekivell commented 4 years ago

Hi guys, I have found an improvement.

I can confirm that setting the version in the name attribute of docker_container doesnt fix it. But if we provide the name to docker_container as the output of docker_image then it doesnt force replacement.

This does seem to be a behaviour change in the provider version. Its funny that it setting docker_container.name as something other than the docker_image output works, if thats what it needs to be in order to be consistent.

This is actually documented in the provider docs.

image - (Required, string) The ID of the image to back this container. The easiest way to get this value is to use the docker_image resource as is shown in the example above.

This is the config that produces consistent results.

provider "docker" {
  version = "~> 2.7.0"
}

resource "docker_image" "nginx_img" {
    name = "nginx"
}

resource "docker_container" "nginx" {
  image      = "${docker_image.nginx_img.latest}"
  name       = "nginx"
  restart    = "on-failure"
  must_run   = "true"
}

Note you dont even have to set the docker image version.

cvandesande commented 4 years ago

You're right, that fixed the replacement based the image sha. For some reason my container is still getting replaced, however this time due to the use of environment variables?

  # docker_container.puff_registry must be replaced
-/+ resource "docker_container" "puff_registry" {
        attach            = false
      + bridge            = (known after apply)
      ~ command           = [
          - "/etc/docker/registry/config.yml",
        ] -> (known after apply)
      + container_logs    = (known after apply)
      - cpu_shares        = 0 -> null
      - dns               = [] -> null
      - dns_opts          = [] -> null
      - dns_search        = [] -> null
      ~ entrypoint        = [
          - "/entrypoint.sh",
        ] -> (known after apply)
      ~ env               = [ # forces replacement
          - "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
            "REGISTRY_AUTH=htpasswd",
            "REGISTRY_AUTH_HTPASSWD_PATH=/auth/htpasswd",
            "REGISTRY_AUTH_HTPASSWD_REALM=Registry Realm",

stephennancekivell commented 4 years ago

Thanks @cvandesande. I just realised that as well.

For my elasticsearch servers both working_dir and env are forcing replacement.

I wonder if its the same underlying problem. Its attributes that resolve to be the same are treated as different.

slikhanov commented 4 years ago

I'm getting the same container rebuild issue with env. The problem is that I only override certain env variables in Terraform, but after creation apparently all env variables make it to the state file, which makes it not equal to .tf file. This is definitely a regression in 2.7. I rolled back to 2.6 of docker provider and this issue went away.

dannmartens commented 4 years ago

Same issue here, I upgraded to 2.7 in order to make use of the new source attribute in docker_container.upload and plan now wants to replace everything, all the time.

 ~ env               = [ # forces replacement
...
      ~ image             = "sha256:33f524107b92220fb3c9c6f5f071a6f5541b62441d0c1eccea0703fdf46e5af1" -> "<redacted>" # forces replacement
...
      - user              = "vcap" -> null # forces replacement
      - working_dir       = "/home/vcap" -> null # forces replacement

mavidser commented 4 years ago

For me, the additional label blocks, like MAINTAINER are doing the force replacement in 2.7.0

dubo-dubon-duponey commented 4 years ago

Confirming this is true for labels, environment variables and workdir.

As far as I can tell, the provider compares existing labels on the running container to the provided list of labels. Of course, if the image is also defining labels, it sees a diff...

        labels {
            label = "co.elastic.logs/module"
            value = "coredns"
        }
      - labels { # forces replacement
          - label = "org.opencontainers.image.authors" -> null
          - value = "Dubo Dubon Duponey" -> null
        }

(here, the first label is provided in terraform, the second one is defined by the image itself)

The same applies to "ENV":

      ~ env               = [ # forces replacement
          - "DEBIAN_FRONTEND=noninteractive",
            "DNS_PORT=1053",

(the second env is defined in terraform, the first is defined in the image)

Apparently, also, if a working dir is defined in the image, and not in terraform, this will trigger a replacement as well.

      - working_dir       = "/" -> null # forces replacement

@mavogel this used to work fine - right now, it means basically almost all containers are always going to be restarted every time, making this quite unusable...

stephenliberty commented 4 years ago

Following this as I just noticed it as well and it affects our deployments. I thought it was strange that suddenly the environmental variables that were set by the image were causing issues. Interestingly this does not happen in the test binary that I had created on 1/20. I'm wondering if https://github.com/terraform-providers/terraform-provider-docker/commit/33c17570c98d27e000abb8dc2c438babbacb67b4 is related?

mkrakowitzer commented 4 years ago

Any suggestions on a potential fix for this issue or is reverting to 2.6.0 my only option?

pannon commented 4 years ago

We are experiencing similar issues after switching to v2.7.0. Number of properties are triggering a replacement.

Example:

      - dns               = [
          - "8.8.8.8",
          - "8.8.4.4",
        ] -> null # forces replacement

      - domainname        = "local" -> null # forces replacement

      ~ env               = [
          - "CATALINA_HOME=/usr/local/tomcat",
          - "CONSUL_CONFIG_DIR=/consul/config",
          -        ] -> (known after apply) # forces replacement

      - network_mode      = "bridge" -> null # forces replacement

      - working_dir       = "/usr/local/tomcat" -> null # forces replacement

      - labels { # forces replacement
          - label = "description" -> null
          - value = "app" -> null
        }

These variables seem to be picked up from the Docker image and for some reason override the ones in the saved terraform state.

In our case these ones are the problematic ones:

dns
dns_search
domainname
env
network_mode
working_dir
cpu_shares
memory
memory_swap
labels

adding these to ignore_changes produces clean runs with no changes triggered:

ignore_changes        = [env, dns, dns_search, domainname, network_mode, working_dir, labels, cpu_shares, memory, memory_swap]

Not a solution, just something to be aware of.

mavogel commented 4 years ago

Sorry for this unintended BC: probably we'll revert the changes of #234 and do then in Milestone 3.0

brandonros commented 4 years ago

what is the last known good version to rollback to that does not have this issue?

edit:

provider "docker" {
  version = "2.6.0"
}

that is a confirmed fix for me

mate4st commented 4 years ago

I am experiencing the same issue with docker_network when using ipam_config. Not sure if this is related or even a bug though. Tested with version 2.6 and 2.7

resource "docker_network" "test_net" {
  name = "test_net"
  ipv6 = true
  ipam_config {
    subnet = "1adf:8a37:824f:47d1::2:0/112"
  }
}

mkrakowitzer commented 4 years ago

@mavogel do you know what the timeframe is to have these changes reverted? Need to decide if I should roll back to 2.6.0 and revert all the changes we made to make our codebase compatible with the 2.7.0.

pannon commented 4 years ago

This issue is definitely up there between the "worst provider" bugs category in Terraform. Would have expected a bit faster reaction from Hashicorp given the seriousness.

This issue is open since the 18th of Feb and was also raised via Hashicorp support under ticket number 26274 where a demo code was supplied to reproduce the error conditions - 0 response since (8th of April).

Essentially this bug undermines the main selling point of Terraform - safe infrastructure changes via plan/apply. There are 0 guarantees now that a plan will actually result in a successful final state after apply.

Rolling back to 2.6.0 is not really an option, as in some cases there are syntax changes, for example: labels.

Docker is used pretty extensively, this should deserve a bit more attention in general imho.

suzuki-shunsuke commented 4 years ago

Hi, I'm the author of #234 . Sorry for inconvenience. I have created a pull request to fix the problem about some attributes. #269

mavogel commented 4 years ago

Sorry for all the inconvenience and my absence from replying. I fully understand the comments of @pannon and I'll try to be faster in the future. This provider is still community maintained and I'd also happy to have more people helping and becoming a contributor.

pannon commented 4 years ago

@mavogel no harm done, and thanks for all the hard work (and everyone else involved) on this plugin. Initially assumed Hashicorp support does cover these sort of cases with providers and their support ppl didn't indicate otherwise for weeks after the support ticket was raised... As for future contributions, will look into that as we are starting to use this provider extensively in combination with our infra and we have some specifics which are on the "nice to have" list.

suzuki-shunsuke commented 4 years ago

I have created an additional pull request to follow up #269 . https://github.com/terraform-providers/terraform-provider-docker/pull/272

https://github.com/terraform-providers/terraform-provider-docker/pull/269#issuecomment-640697737

hashicorp / terraform-provider-docker