hashicorp / terraform-provider-docker

As part of our introduction to self-service publishing in the Terraform Registry, this copy of the provider has been archived, and ownership has been transferred to active maintainers in the community. Please see the new location on the Terraform Registry: https://registry.terraform.io/providers/kreuzwerker/docker/latest
https://registry.terraform.io/providers/kreuzwerker/docker/latest
Mozilla Public License 2.0
132 stars 92 forks source link

better Docker Swarm support #29

Closed sandys closed 6 years ago

sandys commented 6 years ago

hi, if we compare Kubernetes provider support (https://www.terraform.io/docs/providers/kubernetes/index.html) versus Docker support ( https://www.terraform.io/docs/providers/docker/index.html), we see that the Docker provider supports no Swarm features.

In fact, most people who work with swarm are forced to write it as "inline". Is there any chance we can get support for primitives like Swarm init, join, service create/delete/update ? This would be super awesome. If we are able to do something like https://github.com/docker/docker.github.io/blob/master/swarm/configure-tls.md using Terraform, that would be mindblowing.

There are a lot of us using Docker Swarm in production and this is one of the reasons why we are delaying adopting terraform. If these primitives get supported in Terraform, a lot of us would straight away use Terraform and give up using the docker-compose.yml files.

mavogel commented 6 years ago

hi @sandys I am currently on implementing exactly your wish 😃 Since one month to be honest.

Take a look at the swarm-approach2 branch of this repo.

Current state

ATM I expect that a swarm is already initialized with docker swarm init and the daemon is already secured with TLS certificates. I do it manually with the remote-executioner. It is already possible to

Things to do for release

Nice to have

IMHO then I think we have to implement the compose features (spawn up a network, link containers) here in terraform or am I completely wrong or do you see a better way?

I will take a look at the kubernetes provider and see how they solved things like setting up a cluster there. Let's see what I can adapt. BTW my first approach was to implement feature like swarm init and join in the provider but then I had to add multiple instances to the docker provider which was a pain to handle all that in the provider.

sandys commented 6 years ago

@mavogel this is awesome! Can I make one request - the whole aspect of docker swarm init and join is the single most important thing that only something like terraform can do... because it is already aware of the multiple nodes/machines that are being orchestrated. In fact, I would argue that update for a swarm is not entirely needed in the beginning. You can destroy and recreate.

Docker Compose is a single machine stack - is there a particular reason you want to support them ? Because Swarm does the same thing and more. As long as you have the docker stack commands working, everyone will be happy. NOTE: I dont understand what you mean by support "compose" - are you talking about docker compose or the docker-compose.yml used by docker stack ?

Again I would like to reiterate - swarm init and join are the most important pieces in this. Writing multiple instances is not an issue!

mavogel commented 6 years ago

Handling swarm init and join is possible in terraform but not really nice. The devil is in the details. Let me explain why I choose to expect the swarm to be already initialized:

Swarm

Approach 1

For the docker provider one docker host is expected and on this host, all docker commands will be executed

# default aka the bootstrap node which initializes the swarm
provider "docker" {
  host      = "tcp://<docker-daemon-ip>:2376/"
}

Imagine now I want multiple docker hosts, which is the scenario in a swarm, I need multiple docker providers. Read here how terraform can handle this.

provider "docker" {
  alias = "worker_1"
  host  = "tcp://<docker-daemon-ip-worker-1>:2376/"
}
provider "docker" {
  alias = "manager_1"
  host  = "tcp://<docker-daemon-ip-manager-2>:2376/"
}
# and so on...
# now ref to each alias 
resource "docker_swarm_node" "bootstrap_node" {
  is_bootstrap = true # which defaults to false
}
resource "docker_swarm_node" "worker_1" {
  provider = "docker.worker_1"
  token    = "${docker_swarm_init.bootstrap_node.tokenworker}"
}
resource "docker_swarm_node" "manager_1" {
  provider = "docker.manager_1"
  token    = "${docker_swarm_init.bootstrap_node.tokenmanager}"
}
# if I later want to ramp up a service with 3 replicas, I can make
# It just uses the daemon of the bootstrap node which distributes the replicas to the swarm
resource "docker_service" "service" {
  name     = "my-service"
  image    = "nginx"
  replicas = "3"
}

Then several questions came up:

Approach 2

So I tried it with multiple docker hosts on the start, which can variably change.

provider "docker" {
  # checks if all possible docker hosts are pingable
  hosts = ["${formatlist("tcp://%s:2376/", var.external_ips)}"]
}

But it went way to complex internally... see here

Approach 3

So I decided to handle the creation of the swarm outside of the provider with terraforms remote-exec provisioner:

$ sudo scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ~/${data.terraform_remote_state.iam.key_pair_id} ${var.user}@${aws_instance.swarm-bootstrap-manager.private_ip}:~/tokenmanager .
$ sudo docker swarm join --token $(cat ~/tokenmanager) ${aws_instance.swarm-bootstrap-manager.private_ip}:2377

For my case with having the swarm on AWS and using terraform.tfvars files I can easily add and remove nodes from the swarm by changing the number of instances in one place:

managers = "3"
workers = "5"

by changing the count variable in each aws_instance. BTW @catsby could you give me your opinion about that as well. I'd be really curious about what you think about Approach 1. Does it make sense to additionally implement this functionality?

Compose

Regarding docker-compose: I don't fully understand how it works but IMHO compose combines basic docker commands. Imagine your docker-compose.yml file looks as follows:

backend: 
  image: redis:3 
  restart: always

frontend: 
  build: commander 
  links: 
    - backend:redis  
  ports: 
    - 8081:8081 
  environment: 
    - VAR1=value 
  restart: always

In terraform you'd implement and the steps manually by packing both containers into a acommon network where the backend would be available as redis in the frontend container. So basically reimplementing the feature of compose. This is probably why the underlying go-dockerclient does not implement compose / stack features

Update: moby-32781 will implement compose/stack functionality on the daemon side

Summary

Let's wait for Clint's opinion and I'll add tests in the meantime.

sandys commented 6 years ago

@mavogel thank you for the detailed reply.

So Approach 3 is orthogonal to 1 or 2... since it would work regardless. So I would say go ahead with the functionality of 3 - it will not depend if you decide between 1 and 2. I hope you agree with that ?

Approach 2 will probably break existing users.

Approach 1 seems the safest. However, I think instead of tags.. you might need to create new provider types. This will allow you to have a list of masters (in case of multi-master swarm) and list of workers (P.S. please do account for worker and master being on the SAME node. We use this for developent).

mavogel commented 6 years ago

Merged PR #40. Once the provider is released please test it if it fits your needs. I'll keep you updated once the release has happened. There is some infra CI stuff left todo...

sandys commented 6 years ago

This is super cool. Thanks !!!

On Wed 16 May, 2018, 21:33 Manuel Vogel, notifications@github.com wrote:

Merged PR #40 https://github.com/terraform-providers/terraform-provider-docker/pull/40. Once the provider is released please test it if it fits your needs. I'll keep you updated once the release has happened. There is some infra CI stuff left todo...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/terraform-providers/terraform-provider-docker/issues/29#issuecomment-389573759, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEsU9YBjQlAFWKfsdeOSnkBT3qQWAlaks5tzE2_gaJpZM4ROQ4u .

FortuneLenovo commented 6 years ago

Thanks to get merged !!! I have build terraform-provider-docker in my local and try to use docker_service resource.

resource "docker_service" "service" {
  task_spec = [ "nginx" ]
  name     = "my-service"
  image    = "nginx"
  replicas = "3"
}

Using this above resource i was getting below errors in plan

Error: docker_service.service: "task_spec": required field is not set Error: docker_service.service: : invalid or unknown key: image Error: docker_service.service: : invalid or unknown key: replicas

Then i changes it to very simple one

  task_spec = [ "nginx" ]
  name     = "my-service"
}

now getting below error Error: docker_service.service: task_spec.0: expected object, got string

can you please help me to find one consolidated user Guide for this new docker_service resource.

Thanks a lot,

mavogel commented 6 years ago

@FortuneLenovo no worries. The documentation of the terraform website will be updated once the provider is released. Until then take a look at the tests how to configure a docker service. Especially the full configuration can be helpful, where I added al the possible configuration values: https://github.com/terraform-providers/terraform-provider-docker/blob/master/docker/resource_docker_service_test.go#L127

HTH :)

FortuneLenovo commented 6 years ago

In the below code under the resource docker_service

restart_policy {
  condition    = "on-failure"
  delay        = "3s"
  max_attempts = 4
  window       = "10s"
}

if the status of condition was changed to other values like "always", "unless-stopped", "no" it throws an error, how do i update it ?

thanks.

mavogel commented 6 years ago

@FortuneLenovo according to the Docker API 1.32, which is currently implemented the only valid values are none, on-failureor any for the condition value.

Can you provide me a more detailed error? How is the value changed to always?

FortuneLenovo commented 6 years ago

thank you, that answers my question actually.

FortuneLenovo commented 6 years ago

Hi,

Now i am able to successfully created docker_service also with replicas and restart condition. Further looking for docker_secret and docker_config to create these and utilize in my docker_service

So after going through docker_service_test.go and same for config i get to know how to create and use Once i am trying very simple string in data its working, but actually i am looking into how i can pass one file to config or secret (which is fairly easy to do in manually without terraform)

below one is my code with simple string in data section

resource "docker_secret" user-pass {
  name = "user"
  data = "pass"
}

resource "docker_config" site-conf {
  name = "site-conf"
  data = "site"
}

once i tried either long data string or try to pass on file got below error: Error: docker_config.site-conf: "data" is not base64 decodeable

thanks in advance !!!

mavogel commented 6 years ago

Hi @FortuneLenovo ,

Docker needs the data of configs and secrets to be in base64 format. You could use terraform interpolations function

resource "docker_config" site-conf {
  name = "site-conf"
  data = "${base64encode("site")}"
}

HTH

FortuneLenovo commented 6 years ago

Thank you for the help on secret key and it worked. When we configured restart policy in docker swarm mode its not reflecting to docker inspect means these are not attached to docker container, while we run container individually with restart policy it shows in inspect. So basically we want to have working restart policy even in case of docker swarm master is detached and containers should heal itself, is there a possible to achieve this ?

Thanks in advance!!!

mavogel commented 6 years ago

@FortuneLenovo glad to hear that it helped :) I assume you refer to the containers of docker services in swarm mode right? Because then the restart policy is attached to the service and not to each container.

Well I made a little POC and inspected the services. Note that I used the migration branch of https://github.com/terraform-providers/terraform-provider-docker/pull/70 to build the binary:

prepare

$ go build -o terraform-provider-docker_v1.0.0
# move to the local dir because it is no released yet to https://releases.hashicorp.com/
# adapt the directory '~/.terraform.d/plugins/darwin_amd64' accordingly
$ mv terraform-provider-docker_v1.0.0 ~/.terraform.d/plugins/darwin_amd64

start via docker cli

$ docker service create --name redis --restart-condition=on-failure --restart-delay=3s --restart-max-attempts=4 --restart-window=10s redis:3.0.6

start via terraform

main.tf:

provider "docker" {
  version = "~> 1.0.0"
}

resource "docker_service" "foo" {
  name = "redis-terraform"

  task_spec {
    container_spec {
      image = "redis:3.0.6"
    }

    restart_policy {
      condition = "on-failure"
      delay = "3s"
      max_attempts = 4
      window = "10s"
    }
  }
}
$ terraform init
$ terraform apply

inspect

$ docker service inspect redis
$ docker service inspect redis-terraform

gives me

[
    {
        "ID": "zw9m7qykyv7pjketotmdvogwo",
        "Version": {
            "Index": 3945
        },
        "CreatedAt": "2018-06-06T07:30:32.4534158Z",
        "UpdatedAt": "2018-06-06T07:30:32.4534158Z",
        "Spec": {
            "Name": "redis-terraform",
            "Labels": {},
            "TaskTemplate": {
                "ContainerSpec": {
                    "Image": "redis:3.0.6",
                    "StopGracePeriod": 0,
                    "Healthcheck": {},
                    "DNSConfig": {},
                    "Isolation": "default"
                },
                "Resources": {},
                "RestartPolicy": {
                    "Condition": "on-failure",
                    "Delay": 3000000000,
                    "MaxAttempts": 4,
                    "Window": 10000000000
                },
                "Placement": {},
                "ForceUpdate": 0,
                "Runtime": "container"
            },
            "Mode": {
                "Replicated": {
                    "Replicas": 1
                }
            },
            "UpdateConfig": {
                "Parallelism": 1,
                "FailureAction": "pause",
                "Monitor": 5000000000,
                "MaxFailureRatio": 0,
                "Order": "stop-first"
            },
            "RollbackConfig": {
                "Parallelism": 1,
                "FailureAction": "pause",
                "Monitor": 5000000000,
                "MaxFailureRatio": 0,
                "Order": "stop-first"
            },
            "EndpointSpec": {
                "Mode": "vip"
            }
        },
        "Endpoint": {
            "Spec": {}
        }
    }
]

So I could not reproduce your error. Or did I get your question wrong? And my assumption was wrong.. Manu

FortuneLenovo commented 6 years ago

Thanks for quick response, i understood concept that if we are running in swarm mode then restart_policy will apply to service not to container. So just want to know what will happen to failed containers if by chance swarm master node will disconnect from worker nodes that was the only master node.

In master disconnected scenario, will worker node try to restart a failed container according to restart_policy.

mavogel commented 6 years ago

@FortuneLenovo this goes now deep into docker. Regarding your question, workers should apply the restart policy even if the swarm has no leader and or lost the quorum. Here are more docs about this topic. HTH

Crapworks commented 6 years ago

@mavogel do you any idea when this is going to be fully released approximately?

mavogel commented 6 years ago

@Crapworks I hope by the end of next week... Still waiting for a review of #70 but everyone seems to be busy due to Hashidays... but I'll see them there and ask f2f :)

Crapworks commented 6 years ago

@mavogel Cool! Thanks for the heads up! This is awesome work and I'm looking forward using it in my next project!

kristerr commented 6 years ago

@mavogel there is a little error in mapTypeMapValsToString which cause empty strings passes to docker service env. Fix below:

diff --git a/docker/resource_docker_container_funcs.go b/docker/resource_docker_container_funcs.go index 0ee3690..a9069ef 100644 --- a/docker/resource_docker_container_funcs.go +++ b/docker/resource_docker_container_funcs.go @@ -387,7 +387,7 @@ func mapTypeMapValsToString(typeMap map[string]interface{}) map[string]string {


 // mapTypeMapValsToStringSlice maps a map to a slice with '=': e.g. foo = "bar" -> 'foo=bar'
 func mapTypeMapValsToStringSlice(typeMap map[string]interface{}) []string {
-       mapped := make([]string, len(typeMap))
+       mapped := make([]string, 0)
        for k, v := range typeMap {
                mapped = append(mapped, k+"="+v.(string))
        }
mavogel commented 6 years ago

@kristerr thank you for pointing out this bug. It will be addressed in #51 with tests for the next minor release

mavogel commented 6 years ago

Version 1.0.0 got released https://github.com/terraform-providers/terraform-provider-docker/issues/29#issuecomment-400296076 :) Please try it out and give me feedback/issues/bug. Happy to fix your stuff and also happy if it just works :)

FortuneLenovo commented 6 years ago

I have already checked with latest version code, its behaving same only. Waiting for you to test at your end, please

Thanks,

hatched-DavidMichon commented 6 years ago

I also test https://github.com/terraform-providers/terraform-provider-docker/issues/29#issuecomment-400296076 it right now and it continue to add empty "" in env