segmentio / stack

A set of Terraform modules for configuring production infrastructure with AWS
https://open.segment.com
MIT License
2.1k stars 422 forks source link

ecs_cluster my bootcmd not executed in instance #95

Closed nmarcetic closed 7 years ago

nmarcetic commented 7 years ago

I have a issue with bootcmd, the user data https://github.com/segmentio/stack/blob/master/ecs-cluster/files/cloud-config.yml.tpl is not added to my instance. I checked user data via aws console all params defined by ecs_cluster are there, but when I ssh to instance and check /etc/ecs/ecs.config there is no this data. Here is a output

DOCKER_HOST=unix:///var/run/docker.sock
ECS_LOGLEVEL=warn
ECS_LOGFILE=/ecs-agent.log
ECS_CHECKPOINT=true
ECS_DATADIR=/data
ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION=1h
ECS_AVAILABLE_LOGGING_DRIVERS=["journald"]

This causes a lot of problems, cant add instance to custom cluster, cant pull images from dockerhub etc... I would really appreciate any suggestion, thanks!

achille-roussel commented 7 years ago

Would you have a bit more info on your configuration? Maybe show us the terraform sources that you used?

You're the first one reporting this problem so I'd guess it's either due to something that recently changed or something specific to the way you used the modules.

nmarcetic commented 7 years ago

@achille-roussel Sorry for delay. Unfortunately I can't share the whole configuration its under NDA (I hope you will understand). But the part which causes the issue is really simple, I made a the same shorter project which can reproduce my problem, is actually a 1:1 with your example project https://github.com/segmentio/pingdummy referenced in documentation. So no need for sharing ;) Btw pingdummy also has some issues, will post it on pingdummy repo (its just a quick fixes). Its a great starting point, thank you for that. So basically I build the IAM images with packer following the https://github.com/segmentio/stack/tree/master/tools all good here all my instances use this image and I work in one region only us-west. Then I init just a few modules like you do with pingdummy main.tf

module "stack" {
  source      = "github.com/nmarcetic/stack"
  name        = "mainflux"
  environment = "staging"
  key_name    = "bastion-ssh",
  region = "${var.aws_default_region}"
  ecs_max_size =  "${var.ecs_cluster_max_size}"
  ecs_docker_auth_type = "docker"
  ecs_docker_auth_data = "${file("dockerhub-auth.json")}"
  availability_zones= ["us-west-2a", "us-west-2b", "us-west-2c"]
}

module "domain" {
  source = "github.com/nmarcetic/stack//dns"
  name   = "demo-stage.com"
}

module "core-service" {
  source             = "github.com/nmarcetic/stack//web-service"
  image              = "mainflux/mainflux-core"
  port               = 9000
  name               = "core-service"
  // Terraform does not support creating TLS, so we do it manually via AWS console and just  copy the SSL ID
  ssl_certificate_id = ""

  environment      = "${module.stack.environment}"
  cluster          = "${module.stack.cluster}"
  iam_role         = "${module.stack.iam_role}"
  security_groups  = "${module.stack.external_elb}"
  subnet_ids       = "${join(",",module.stack.external_subnets)}"
  log_bucket       = "${module.stack.log_bucket_id}"
  internal_zone_id = "${module.stack.zone_id}"
  external_zone_id = "${module.domain.zone_id}"
 // Adding base env vars in order ot override default image vars
  env_vars = <<EOF
[
  { "name": "AWS_REGION", "value": "${module.stack.region}"},
  { "name": "AWS_ACCESS_KEY_ID", "value": "${module.ses_user.access_key}"},
  { "name": "AWS_SECRET_ACCESS_KEY", "value": "${module.ses_user.secret_key}"},
  ]
EOF
}

resource "aws_route53_record" "root" {
  zone_id = "${module.domain.zone_id}"
  name    = "${module.domain.name}"
  type    = "A"

  alias {
    name                   = "${module.core-service.dns}"
    zone_id                = "${module.core-service.zone_id}"
    evaluate_target_health = false
  }
}

/**
 * Provides an RDS Cluster Resource Instance.
 */
module "db" {
  source             = "github.com/nmarcetic/stack//rds-cluster"
  name               = "mainflux-db"
  database_name      = "mainflux"
  master_username    = "root"
  master_password    = "password"
  environment        = "${module.stack.environment}"
  vpc_id             = "${module.stack.vpc_id}"
  zone_id            = "${module.stack.zone_id}"
  security_groups    = ["${module.stack.ecs_cluster_security_group_id}"]
  subnet_ids         = "${module.stack.internal_subnets}"
  availability_zones = "${module.stack.availability_zones}"
}

module "auth-service" {
 source             = "github.com/nmarcetic/stack//service"
 name               = "updates-service"
 image              = "mainflux/mainflux-auth"
 port               = 9000
 container_port     = 9000
 dns_name           = "auth-service"

 environment      = "${module.stack.environment}"
 //cluster          = "${module.stack.cluster}"
 cluster          =   "${module.stack.cluster}"
 zone_id          = "${module.stack.zone_id}"
 iam_role         = "${module.stack.iam_role}"
 security_groups  = "${module.stack.external_elb}"
 subnet_ids       = "${join(",",module.stack.external_subnets)}"
 log_bucket       = "${module.stack.log_bucket_id}"
 }

 /**
 * The module creates an IAM user.
 */
module "ses_user" {
  source = "github.com/nmarcetic/stack//iam-user"
  name   = "ses-user"

    policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["ses:*"],
      "Resource":"*"
    }
  ]
}
EOF
}

resource "aws_route53_record" "main" {
  zone_id = "${module.domain.zone_id}"
  name    = "${module.domain.name}"
  type    = "A"

  alias {
    name                   = "${module.core-service.dns}"
    zone_id                = "${module.core-service.zone_id}"
    evaluate_target_health = false
  }
}

/**
 * The bastion host acts as the "jump point" for the rest of the infrastructure.
 * Since most of our instances aren't exposed to the external internet,
 * the bastion acts as the gatekeeper for any direct SSH access.
 */
output "bastion_ip" {
  value = "${module.stack.bastion_ip}"
}

I forked a stack repo and pulling all modules from my fork in order to use this changes https://github.com/nmarcetic/stack/commit/4bdc51ade0a3fa2a8cc80a57771f85c10d3f8aaf Following this thread https://github.com/segmentio/stack/pull/78 I use Terraform v0.8.4 And segmentio/stack master branch fork, its the same with upstream repo. Everything else looks ok on my AWS console and on each instance user data I see this in user data https://github.com/segmentio/stack/blob/master/ecs-cluster/files/cloud-config.yml.tpl its just not executed on boot. I ssh to instance cat /etc/ecs/ecs.config no user data there...

One more question, does rds-cluster provision aws Aurora cluster ? If so, it should be added to documentation section (little confusing). I want postgresql cluster on RDS, should I use rds module not the rds-cluster ? Thank you!

andersonkyle commented 7 years ago

@nmarcetic Did you check /var/log/cloud-init-output.log for any errors? You should find any problems with your cloud-init directives there.

nmarcetic commented 7 years ago

@andersonkyle No, there is no any errors in cloud-init-output.log all good (key generation, network setup).

nmarcetic commented 7 years ago

Closing this one, should be fixed by https://github.com/segmentio/stack/commit/25336516104a20e71124583c919fe3f20fd671f2 thanks @andersonkyle & @achille-roussel