iterative / terraform-provider-iterative

☁️ Terraform plugin for machine learning workloads: spot instance recovery & auto-termination | AWS, GCP, Azure, Kubernetes
https://registry.terraform.io/providers/iterative/iterative/latest/docs
Apache License 2.0
288 stars 27 forks source link

Task secret management #602

Open 0x2b3bfa0 opened 2 years ago

0x2b3bfa0 commented 2 years ago

The current mechanism used to pass environment variables to task scripts stores the values in plain text as part of the instance's cloud-init startup script.

Instead, it should be using permission_set from #550 and:

This brings the high-level debate of whether automate the whole permissions set and secret manager creation process or leave it to the user as part of a separate lifecycle, similarly to #299.

Prototypes

Amazon Web Services

terraform {
  required_providers {
    aws = {
      source = "hashicorp/aws"
    }
    iterative = {
      source = "iterative/iterative"
    }
  }
}

provider "aws" {
  region = "us-east-1"
}

provider "iterative" {
}

resource "iterative_task" "example" {
  cloud          = "aws"
  permission_set = aws_iam_instance_profile.example.arn

  environment = {
    SECRET_ID  = aws_secretsmanager_secret.example.arn
    AWS_REGION = data.aws_region.current.name
  }

  script = <<-END
    #!/bin/sh
    sudo apt-get update
    sudo apt-get install awscli --yes
    aws secretsmanager get-secret-value --region $AWS_REGION --secret-id $SECRET_ID --query SecretString --output text
  END
}

resource "aws_secretsmanager_secret" "example" {
  name = "example"
}

resource "aws_secretsmanager_secret_version" "example" {
  secret_id     = aws_secretsmanager_secret.example.id
  secret_string = "v3rys3cret" # don't hardcode this here, use a variable instead
}

resource "aws_iam_instance_profile" "example" {
  name = "example"
  role = aws_iam_role.example.name
}

resource "aws_iam_role" "example" {
  name               = "example"
  assume_role_policy = data.aws_iam_policy_document.example_assume_role.json
}

resource "aws_iam_role_policy" "example" {
  name   = "example"
  role   = aws_iam_role.example.id
  policy = data.aws_iam_policy_document.example.json
}

data "aws_iam_policy_document" "example_assume_role" {
  statement {
    actions = ["sts:AssumeRole"]

    principals {
      type        = "Service"
      identifiers = ["ec2.amazonaws.com"]
    }
  }
}

data "aws_iam_policy_document" "example" {
  statement {
    actions   = ["secretsmanager:GetSecretValue"]
    resources = [aws_secretsmanager_secret.example.arn]
  }
}

data "aws_region" "current" {
}

[^1]: Semantically equivalent, but insecure by default: securing them is not our responsibility, but the cluster administrators'

0x2b3bfa0 commented 2 years ago

Amazon Web Services

terraform {
  required_providers {
    aws = {
      source = "hashicorp/aws"
    }
    iterative = {
      source = "iterative/iterative"
    }
  }
}

provider "aws" {
  region = "us-east-1"
}

provider "iterative" {
}

resource "iterative_task" "example" {
  cloud          = "aws"
  permission_set = aws_iam_instance_profile.example.arn

  environment = {
    SECRET_ID  = aws_secretsmanager_secret.example.arn
    AWS_REGION = data.aws_region.current.name
  }

  script = <<-END
    #!/bin/sh
    sudo apt-get update
    sudo apt-get install awscli --yes
    aws secretsmanager get-secret-value --region $AWS_REGION --secret-id $SECRET_ID --query SecretString --output text
  END
}

resource "aws_secretsmanager_secret" "example" {
  name = "example"
}

resource "aws_secretsmanager_secret_version" "example" {
  secret_id     = aws_secretsmanager_secret.example.id
  secret_string = "v3rys3cret" # don't hardcode this here, use a variable instead
}

resource "aws_iam_instance_profile" "example" {
  name = "example"
  role = aws_iam_role.example.name
}

resource "aws_iam_role" "example" {
  name               = "example"
  assume_role_policy = data.aws_iam_policy_document.example_assume_role.json
}

resource "aws_iam_role_policy" "example" {
  name   = "example"
  role   = aws_iam_role.example.id
  policy = data.aws_iam_policy_document.example.json
}

data "aws_iam_policy_document" "example_assume_role" {
  statement {
    actions = ["sts:AssumeRole"]

    principals {
      type        = "Service"
      identifiers = ["ec2.amazonaws.com"]
    }
  }
}

data "aws_iam_policy_document" "example" {
  statement {
    actions   = ["secretsmanager:GetSecretValue"]
    resources = [aws_secretsmanager_secret.example.arn]
  }
}

data "aws_region" "current" {
}
dacbd commented 2 years ago

I like this idea, and it would be the creating credentials that would need to access the various secret managers?

0x2b3bfa0 commented 2 years ago

If we automate the whole process as part of the task lifecycle, it would manage the following additional resources:

dacbd commented 2 years ago

Going to use aws terminology

I think we discussed that we should not edit, create, or delete roles. I feel that if a user creates permission_set with access to Secrets Manager the connotation is that their code/script is going to call that service.

If they list an arn to SecretsManager in the hcl the connotation is that we query that service.

0x2b3bfa0 commented 2 years ago

I think we discussed that we should not edit, create, or delete roles.

I don't recall discussing this, but we're on the same page: if task created roles, it would need administrator-level permissions, and that's a clear “no” for teams; still, it could be handy for individual practitioners.

0x2b3bfa0 commented 2 years ago

_I feel that if a user creates permissionset with access to Secrets Manager the connotation is that their code/script is going to call that service.

If they list an arn to SecretsManager in the hcl the connotation is that we query that service.

Then, do you propose to add a new secret_storage field taking an ARN or whatever?

dacbd commented 2 years ago

I think that a secrets block could have some value. Where keys get populated as env vars.

secrets {
  SNOWFLAKE_USER = "arn"
  SNOWFLAKE_PASS = "arn"
  GITHUB_BOT_PAT = "arn"
}

However I don't really see this being a highly demanded feature, it would be nice and I'm sure that it could/would get used, but you can also just use the environment to pass the arn and code just load it as well...

I feel that there could be some confusion about who has access to the secrets, ie the creating user's credentials or the tasks assigned credentials. Maybe I am overcomplicating this?

0x2b3bfa0 commented 2 years ago

What if we use the environment argument for everything?

Unencrypted by default

resource "iterative_task" "example" {
  ···
  environment = {
    NAME = "value"
  }
}

Encrypted on demand

resource "iterative_task" "example" {
  ···
  key = "arn:"
  environment = {
    NAME = "value"
  }
}

When users provide a key identifier (e.g. ARN to SecretsManager or KMS), the environment file is (en|de)crypted with that key.

dacbd commented 2 years ago

I like this idea! Now we just need some log masking for them as well 😅

0x2b3bfa0 commented 2 years ago

See also mozilla/sops