Closed joneszc closed 2 weeks ago
@joneszc, there are two PRs which seem to add the same thing, this one and #2752 -- I assume the first one was the original; can you close this one? (or move any relevant changes back to the other PR?)
@joneszc can we close #2752 and #2723 since we have this one?
@joneszc can we close #2752 and #2723 since we have this one?
@dcmcand @viniciusdc
Yes, those two PRs were built on forks of the old develop
branch that is now main
Thanks for help determining that the branch was not the issue causing Pytest failures. #2752 and #2723 can be closed.
@viniciusdc I've opened PR#537 to update the docs per your request
Also in follow-up to your ask, it appears that re-deploying to set KMS encryption on an existing Nebari EKS Cluster, without previous encryption set, does succeed. However, attempting thereafter to re-deploy to remove the previously set EKS secrets encryption will fail as terraform attempts to delete and rebuild the EKS cluster but cannot due to existing node groups.
However, attempting thereafter to re-deploy to remove the previously set EKS secrets encryption will fail as terraform attempts to delete and rebuild the EKS cluster but cannot due to existing node groups.
Hi @joneszc, thanks for checking that out! I was already expecting it to fail, but I had another thing in mind: they might be connected. Can you post a sanitized output of the terraform error and any error messages you might encounter in the CloudTrail history? I suspect you will find something related to the KMS key in there.
the main reason for this request is to validate if it will be beneficial to have this as an immutable field or, depending on the error, we can add manual steps to the user in our docs to disable it.
However, attempting thereafter to re-deploy to remove the previously set EKS secrets encryption will fail as terraform attempts to delete and rebuild the EKS cluster but cannot due to existing node groups.
Hi @joneszc, thanks for checking that out! I was already expecting it to fail, but I had another thing in mind: they might be connected. Can you post a sanitized output of the terraform error and any error messages you might encounter in the CloudTrail history? I suspect you will find something related to the KMS key in there.
the main reason for this request is to validate if it will be beneficial to have this as an immutable field or, depending on the error, we can add manual steps to the user in our docs to disable it.
@viniciusdc
Nebari output after failed attempt to re-deploy to remove eks cluster's envelope encryption of secrets:
[terraform]: # module.kubernetes.aws_eks_cluster.main must be replaced
[terraform]: -/+ resource "aws_eks_cluster" "main" {
[terraform]: ~ arn = "arn:aws:eks:us-east-1:<account-id>:cluster/nebari-test-dev" -> (known after apply)
[terraform]: ~ certificate_authority = [
[terraform]: - {
[terraform]: - data = "<>"
[terraform]: },
[terraform]: ] -> (known after apply)
[terraform]: + cluster_id = (known after apply)
[terraform]: ~ created_at = "2024-10-28 15:25:47.172 +0000 UTC" -> (known after apply)
[terraform]: - enabled_cluster_log_types = [] -> null
[terraform]: ~ endpoint = "https://0000000000000000000000000.gr7.us-east-1.eks.amazonaws.com" -> (known after apply)
[terraform]: ~ id = "nebari-test-dev" -> (known after apply)
[terraform]: ~ identity = [
[terraform]: - {
[terraform]: - oidc = [
[terraform]: - {
[terraform]: - issuer = "https://oidc.eks.us-east-1.amazonaws.com/id/0000000000000000"
[terraform]: },
[terraform]: ]
[terraform]: },
[terraform]: ] -> (known after apply)
[terraform]: name = "nebari-test-dev"
[terraform]: ~ platform_version = "eks.17" -> (known after apply)
[terraform]: ~ status = "ACTIVE" -> (known after apply)
[terraform]: tags = {
[terraform]: "Environment" = "dev"
[terraform]: "Name" = "nebari-test-dev"
[terraform]: "Owner" = "terraform"
[terraform]: "Project" = "nebari-test"
[terraform]: }
[terraform]: # (3 unchanged attributes hidden)
[terraform]:
[terraform]: - access_config {
[terraform]: - authentication_mode = "CONFIG_MAP" -> null
[terraform]: - bootstrap_cluster_creator_admin_permissions = false -> null
[terraform]: }
[terraform]:
[terraform]: - encryption_config { # forces replacement
[terraform]: - resources = [
[terraform]: - "secrets",
[terraform]: ] -> null
[terraform]:
[terraform]: - provider {
[terraform]: - key_arn = "arn:aws:kms:us-east-1:<account-id>:key/0000000000000000" -> null
[terraform]: }
[terraform]: }
[terraform]:
[terraform]: - kubernetes_network_config {
[terraform]: - ip_family = "ipv4" -> null
[terraform]: - service_ipv4_cidr = "172.20.0.0/16" -> null
[terraform]: }
[terraform]:
[terraform]: ~ vpc_config {
[terraform]: ~ cluster_security_group_id = "sg-xxxxxxxxxxxxxxxxxx" -> (known after apply)
[terraform]: ~ vpc_id = "vpc-xxxxxxxxxxxxxxxx" -> (known after apply)
[terraform]: # (5 unchanged attributes hidden)
[terraform]: }
[terraform]: }
[terraform]:
[terraform]: # module.kubernetes.aws_iam_openid_connect_provider.oidc_provider must be replaced
[terraform]: -/+ resource "aws_iam_openid_connect_provider" "oidc_provider" {
[terraform]: ~ arn = "arn:aws:iam::<account-id>:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/0000000000000000000000000" -> (known after apply)
[terraform]: ~ id = "arn:aws:iam::<account-id>:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/0000000000000000000000000" -> (known after apply)
[terraform]: tags = {
[terraform]: "Environment" = "dev"
[terraform]: "Name" = "nebari-test-dev-eks-irsa"
[terraform]: "Owner" = "terraform"
[terraform]: "Project" = "nebari-test"
[terraform]: }
[terraform]: ~ thumbprint_list = [
[terraform]: - "9e99a48a9960b14926bb7f3b02e22da2b0ab7280",
[terraform]: - "06b25927c42a721631c1efd9431e648fa62e1e39",
[terraform]: - "d9fe0a65fa00cabf61f5120d373a8135e1461f15",
[terraform]: - "7f3682e963aa03a7bcd67f11b0fedae315af49d4",
[terraform]: ] -> (known after apply)
[terraform]: ~ url = "oidc.eks.us-east-1.amazonaws.com/id/0000000000000000000000000" # forces replacement -> (known after apply) # forces replacement
[terraform]: # (2 unchanged attributes hidden)
[terraform]: }
[terraform]:
[terraform]: # module.kubernetes.aws_iam_policy.cluster_encryption[0] will be destroyed
[terraform]: # (because index [0] is out of range for count)
[terraform]: - resource "aws_iam_policy" "cluster_encryption" {
[terraform]: - arn = "arn:aws:iam::<account-id>:policy/nebari-test-dev-eks-encryption-policy" -> null
[terraform]: - description = "IAM policy for EKS cluster encryption" -> null
[terraform]: - id = "arn:aws:iam::<account-id>:policy/nebari-test-dev-eks-encryption-policy" -> null
[terraform]: - name = "nebari-test-dev-eks-encryption-policy" -> null
[terraform]: - path = "/" -> null
[terraform]: - policy = jsonencode(
[terraform]: {
[terraform]: - Statement = [
[terraform]: - {
[terraform]: - Action = [
[terraform]: - "kms:ListGrants",
[terraform]: - "kms:Encrypt",
[terraform]: - "kms:DescribeKey",
[terraform]: - "kms:Decrypt",
[terraform]: ]
[terraform]: - Effect = "Allow"
[terraform]: - Resource = "arn:aws:kms:us-east-1:<account-id>:key/3zzzzzzzzzzzzz"
[terraform]: },
[terraform]: ]
[terraform]: - Version = "2012-10-17"
[terraform]: }
[terraform]: ) -> null
[terraform]: - policy_id = "ANPARM6PEZIZXIYANUQUT" -> null
[terraform]: - tags = {} -> null
[terraform]: - tags_all = {} -> null
[terraform]: }
[terraform]:
[terraform]: # module.kubernetes.aws_iam_role_policy_attachment.cluster_encryption[0] will be destroyed
[terraform]: # (because index [0] is out of range for count)
[terraform]: - resource "aws_iam_role_policy_attachment" "cluster_encryption" {
[terraform]: - id = "nebari-test-dev-eks-cluster-role-00000000000000000" -> null
[terraform]: - policy_arn = "arn:aws:iam::<account-id>:policy/nebari-test-dev-eks-encryption-policy" -> null
[terraform]: - role = "nebari-test-dev-eks-cluster-role" -> null
[terraform]: }
[terraform]:
[terraform]: Plan: 3 to add, 0 to change, 5 to destroy.
[terraform]:
[terraform]: Changes to Outputs:
[terraform]: ~ cluster_oidc_issuer_url = "https://oidc.eks.us-east-1.amazonaws.com/id/0000000000000000000000000" -> (known after apply)
[terraform]: ~ kubernetes_credentials = (sensitive value)
[terraform]: ~ oidc_provider_arn = "arn:aws:iam::<account-id>:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/0000000000000000000000000" -> (known after apply)
[terraform]: local_file.kubeconfig[0]: Destroying... [id=ebb9ba2900716cbac8f3zzzzzzzzzzzzz]
[terraform]: local_file.kubeconfig[0]: Destruction complete after 0s
[terraform]: module.kubernetes.aws_iam_openid_connect_provider.oidc_provider: Destroying... [id=arn:aws:iam::<account-id>:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/0000000000000000000000000]
[terraform]: module.kubernetes.aws_iam_openid_connect_provider.oidc_provider: Destruction complete after 0s
[terraform]: module.kubernetes.aws_eks_cluster.main: Destroying... [id=nebari-test-dev]
[terraform]: module.kubernetes.aws_eks_cluster.main: Still destroying... [id=nebari-test-dev, 10s elapsed]
[terraform]: module.kubernetes.aws_eks_cluster.main: Still destroying... [id=nebari-test-dev, 20s elapsed]
[terraform]: module.kubernetes.aws_eks_cluster.main: Still destroying... [id=nebari-test-dev, 30s elapsed]
[terraform]:
[terraform]: Error: deleting EKS Cluster (nebari-test-dev): operation error EKS: DeleteCluster, https response error StatusCode: 409, RequestID: de8f18ba-0abe-42ae-961f-86d8865fbcf3, ResourceInUseException: Cluster has nodegroups attached
[terraform]:
[terraform]:
[terraform]:
Traceback (most recent call last) «
/home/ssm-user/nebari_private_test/nebari/src/_nebari/subcommands/deploy.py:92 in deploy
89 msg = "Digital Ocean support is currently being deprecated and will be removed
90 typer.confirm(msg)
91
92 deploy_configuration(
93 config,
94 stages,
95 disable_prompt=disable_prompt,
/home/ssm-user/nebari_private_test/nebari/src/_nebari/deploy.py:55 in deploy_configuration
52 s: hookspecs.NebariStage = stage(
53 output_directory=pathlib.Path.cwd(), config=config
54 )
55 stack.enter_context(s.deploy(stage_outputs, disable_prompt))
56
57 if not disable_checks:
58 s.check(stage_outputs, disable_prompt)
/usr/lib64/python3.11/contextlib.py:505 in enter_context
502 except AttributeError:
503 raise TypeError(f"'{cls.__module__}.{cls.__qualname__}' object does "
504 f"not support the context manager protocol") from None
505 result = _enter(cm)
506 self._push_cm_exit(cm, _exit)
507 return result
508
/usr/lib64/python3.11/contextlib.py:137 in __enter__
134 # they are only needed for recreation, which is not possible anymore
135 del self.args, self.kwds, self.func
136 try:
137 return next(self.gen)
138 except StopIteration:
139 raise RuntimeError("generator didn't yield") from None
140
/home/ssm-user/nebari_private_test/nebari/src/_nebari/stages/infrastructure/__init__.py:961 in deploy
958 def deploy(
959 self, stage_outputs: Dict[str, Dict[str, Any]], disable_prompt: bool = False
960 ):
961 with super().deploy(stage_outputs, disable_prompt):
962 with kubernetes_provider_context(
963 stage_outputs["stages/" + self.name]["kubernetes_credentials"]["value"]
964 ):
/usr/lib64/python3.11/contextlib.py:137 in __enter__
134 # they are only needed for recreation, which is not possible anymore
135 del self.args, self.kwds, self.func
136 try:
137 return next(self.gen)
138 except StopIteration:
139 raise RuntimeError("generator didn't yield") from None
140
/home/ssm-user/nebari_private_test/nebari/src/_nebari/stages/base.py:298 in deploy
295 deploy_config["terraform_import"] = True
296 deploy_config["state_imports"] = state_imports
297
298 self.set_outputs(stage_outputs, terraform.deploy(**deploy_config))
299 self.post_deploy(stage_outputs, disable_prompt)
300 yield
301
/home/ssm-user/nebari_private_test/nebari/src/_nebari/provider/terraform.py:71 in deploy
68 )
69
70 if terraform_apply:
71 apply(directory, var_files=[f.name])
72
73 if terraform_destroy:
74 destroy(directory, var_files=[f.name])
/home/ssm-user/nebari_private_test/nebari/src/_nebari/provider/terraform.py:153 in apply
150 + ["-var-file=" + _ for _ in var_files]
151 )
152 with timer(logger, "terraform apply"):
153 run_terraform_subprocess(command, cwd=directory, prefix="terraform")
154
155
156 def output(directory=None):
/home/ssm-user/nebari_private_test/nebari/src/_nebari/provider/terraform.py:119 in run_terraform_subprocess
116 logger.info(f" terraform at {terraform_path}")
117 exit_code, output = run_subprocess_cmd([terraform_path] + processargs, **kwargs)
118 if exit_code != 0:
119 raise TerraformException("Terraform returned an error")
120 return output
121
122
TerraformException: Terraform returned an error
Additional Error details from CloudTrail:
So @joneszc am I reading that correctly that enabling this option will destroy and replace your cluster? We should probably go ahead and make this field immutable then. We definitely don't want anyone accidentally destroying their deploy. The docs should reflect that this should only be used on fresh deploys too.
Reference Issues or PRs
Fixes #2681 Fixes #2746 Modifies PR#2723 (Failing Tests / Pytest) Modifies PR#2752 (Failing Tests / Pytest)
What does this implement/fix?
Put a
x
in the boxes that applyTesting
How to test this PR?
Any other comments?
Allows user to set EKS encryption of secrets by specifying a KMS key ARN in nebari-config.yaml
The KMS key must meet the following conditions: