Add Amazon Managed Prometheus scraper support

kstevensonnv commented 11 months ago

Is your request related to a new offering from AWS?

Is this functionality available in the AWS provider for Terraform? See CHANGELOG.md, too.

Yes ✅: please list the AWS provider version which introduced this functionality

5.32.0

Is your request related to a problem? Please describe.

N/A.

Describe the solution you'd like.

Implement support for creating and using AMP scrapers.

Describe alternatives you've considered.

N/A

Additional context

Amazon Managed Service for Prometheus launches an agentless collector for Prometheus metrics from Amazon EKS

We are excited to announce Amazon Managed Service for Prometheus collector, a fully-managed agentless collector customers can use to collect Prometheus metrics from their workloads running on Amazon EKS. Customers can now enable the discovery and collection of Prometheus metrics from their Amazon EKS applications and infrastructure through the EKS console or through an API call, without having to self-manage agents.

Customers currently invest days, if not weeks, of effort to monitor, right-size, and operate Prometheus agents. Now with Amazon Managed Service for Prometheus collector, customers can automatically discover and collect Prometheus metrics from their Amazon EKS applications, infrastructure, and the Kubernetes apiserver without having to install any agents in their cluster. The fully-managed collector removes the “undifferentiated heavy lifting” of installing, patching, scaling, and upgrading agents for the discovery and collection of Prometheus metrics. The collector provides customers with a multi-AZ, highly available, reliable, and fully-managed service for collecting Prometheus metrics without any data ever leaving secure VPCs.

To get started, customers can utilize the Amazon Managed Service for Prometheus APIs, SDK, CLI and the Amazon EKS console to create fully-managed collectors. Amazon Managed Service for Prometheus collector is available in all regions where Amazon Managed Service for Prometheus is available. To learn more about Amazon Managed Service for Prometheus collector, visit the user guide or product page.

Resource: aws_prometheus_scraper

bryantbiggs commented 10 months ago

after looking into this a bit more, I don't know how beneficial it will be to add to this module (this module is already quite large). There isn't a tight integration between this resource and the EKS module, and there is this one caveat listed on the provider docs

Your source Amazon EKS cluster must be configured to allow the scraper to access metrics. Follow the user guide to setup the appropriate Kubernetes permissions.

I'm not saying we won't add this - but for right now, we're going to wait and evaluate. Cluster access management was recently launched and support is being added in #2858, but we'll see how the AMP team wants to handle this scraper authentication (i.e. - manually by users specifying something, or through an SLR like EMR and Batch do, etc.)

andrewbcoyle commented 10 months ago

There isn't a tight integration between this resource and the EKS module, and there is this one caveat listed on the provider docs

Your source Amazon EKS cluster must be configured to allow the scraper to access metrics. Follow the user guide to setup the appropriate Kubernetes permissions.

The trouble I am running into is there doesn't appear to be a pure TF way of using AMP's scraper. AWS's wants users to use eksctl to configure auth as does the TF resource (which you noted).

Without being able to do this in TF only, the AMP scraper is basically unusable, and since it only works with EKS, I dont know what other module would be suitable to add it to.

What I would like to see is a simple argument like configure_amp_scraper = true along with the AMP workspace to send the metrics to plus an arg for the config (maybe also a use_default_scrape_config = true for default) and have that also configure auth.

bryantbiggs commented 10 months ago

yes, I totally agree and I can elaborate a bit more on whats required and what I am looking for from service support before adding it natively here

To start, eksctl is not required; it is just simply being used in this context to create the IAM identity mapping (mapping the IAM role to K8s RBAC groups/permissions inside the cluster via the aws-auth configmap). You could do the same with the module today using:

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 19.0"

  ... Truncated for brevity

  # aws-auth configmap
  manage_aws_auth_configmap = true

  aws_auth_roles = [
    {
      rolearn  = "<ROLE-ARN>"
      username = "aps-collector-user"
      groups   = ["system:masters"] # I don't know which groups the collector needs to use, the docs seem to be lacking this info
    },
  ]
}

With the recent features launches for pod identity and cluster access management, I suspect addons like this will have an improved user experience. With cluster access management, it should be possible for the addons to create all of this in the background to where the user experience is simply deploying the addon and users provide the IAM role the addon will utilize. When that happens, adding the collector should be a matter of simply adding the collector Terraform resource and the ability to create the appropriate IAM role and policy used by the collector and map that to the cluster. But for now, we are removing all resources that interact with the Kubernetes API from this module due to the numerous issues with that approach (i.e. - replacing aws-auth configmap with cluster access entry), so we wouldn't be able to support the collector at this time

andrewbcoyle commented 10 months ago

Thanks for the reply! What I meant by the eksctl comment was really that it appears that manual transformation has to be done on the roleARN the scraper uses. Since the scraper id is non-deterministic, it isn't possible to perform the transformation from AWS IAM ARN to K8s ARN at TF runtime. Does that make more sense?

Overall it seems this service on AWS's side isn't quite ready for primetime...

github-actions[bot] commented 8 months ago

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 10 days

github-actions[bot] commented 7 months ago

This issue was automatically closed because of stale in 10 days

github-actions[bot] commented 6 months ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

terraform-aws-modules / terraform-aws-eks