spiffe / helm-charts-hardened

Apache License 2.0
17 stars 31 forks source link

Feature: Provide native support for `aws_iid` nodeAttestor plugin. #377

Open ranjit-se7en opened 3 months ago

ranjit-se7en commented 3 months ago

We were looking at the helm chart for deploying spire-server in K8s, but it seems it doesn't support the aws_iid node attestation.

Version

helm list
NAME        NAMESPACE       REVISION    UPDATED                                 STATUS      CHART               APP VERSION
spire       spire-server    1           2024-05-24 11:30:31.89683 +0530 IST     deployed    spire-0.20.0        1.9.4
spire-crds  spire-server    3           2024-05-16 12:20:02.086293 +0530 IST    deployed    spire-crds-0.4.0    0.0.1

We have a special use case where we require to run the spire server in kubernetes and run the agents on an EC2 instance. I noticed there is an unsupportedBuiltInPlugins option in the spire-agent charts, that we leverage and use the aws_iid attestation.

https://github.com/spiffe/helm-charts-hardened/blob/2c5dfa010f4ae0c50c2d5c5f5d5fd75c10e5a021/charts/spire/charts/spire-agent/values.yaml#L293

# NOTE: This is unsupported and only to configure currently supported spire built-in plugins but plugins unsupported by the chart.
# Upgrades wont be tested for anything under this config. If you need this, please let the chart developers know your needs so we
# can prioritize proper support.
## @skip unsupportedBuiltInPlugins
unsupportedBuiltInPlugins:
  keyManager: {}
  nodeAttestor: {}
  svidStore: {}
  workloadAttestor: {}

We have used the same and are happy to report that it's working. However, there are a few caveats.

While using this approach we noticed that the node attestation fails, on the agent with an error. Agent logs

ERRO[0002] Agent crashed error="failed to receive attestation response: rpc error: code = Internal desc = nodeattestor(aws_iid): failed to describe instance: operation error EC2: DescribeInstances, get identity: get credentials: failed to refresh cached credentials, no EC2 IMDS role found, operation error ec2imds: GetMetadata, exceeded maximum number of attempts, 3, request send failed, Get \"http://169.254.169.254/latest/meta-data/iam/security-credentials/\": dial tcp 169.254.169.254:80: i/o timeout"

Server logs

time="2024-05-20T05:46:33Z" level=error msg="Nodeattestor(aws_iid): failed to describe instance: operation error EC2: DescribeInstances, get identity: get credentials: failed to refresh cached credentials, no EC2 IMDS role found, operation error ec2imds: GetMetadata, exceeded maximum number of attempts, 3, request send failed, Get \"http://169.254.169.254/latest/meta-data/iam/security-credentials/\": dial tcp 169.254.169.254:80: i/o timeout" authorized_as=nobody authorized_via= caller_addr="XXXXXX:38874" method=AttestAgent node_attestor_type=aws_iid request_id=33b80e8b-ff27-4e8c-98af-69f800e82025 service=agent.v1.Agent subsystem_name=api

We figured out that access to IDMS is disabled by default in the EKS node group as a security measure, https://aws.github.io/aws-eks-best-practices/security/docs/iam/#when-your-application-needs-access-to-imds-use-imdsv2-and-increase-the-hop-limit-on-ec2-instances-to-2 .

The options available to us are below.

Although we could use HostNetworking, we would like to mesh the spire-server pods to linkerd, which doesn't work on pods with hostNet enabled. Hence, none of these options work for us.

Request: Utilize alternate means of verifying instance metadata, via AWS EC2 APIs which can be used with IRSA in K8s.

kfox1111 commented 2 weeks ago

This depends on: https://github.com/spiffe/spire/issues/5495