kubernetes-sigs / cluster-api-provider-aws

Kubernetes Cluster API Provider AWS provides consistent deployment and day 2 operations of "self-managed" and EKS Kubernetes clusters on AWS.
http://cluster-api-aws.sigs.k8s.io/
Apache License 2.0
639 stars 562 forks source link

Excessive AWS API calls leads to high Amazon GuardDuty cost #4192

Closed wyike closed 8 months ago

wyike commented 1 year ago

/kind bug

What steps did you take and what happened: [A clear and concise description of what the bug is.]

Recently in our testing enviroment, , cloudTrail have excessive events recorded for AWS API calls. As a result, the GuardDuty cost on the event analysis is very high. From our analysis, it should come from the in-using CAPA.

To double confirm, I create a simple cluster with one master and one worker, note the api calls CAPA sent during this time with cloudTrails:

  Cluster1/API calls Cluster2/API calls
DescribeSubnets 20 19
DescribeRouteTables 28 26
DescribeVpcAttribute 28 26
DescribeLoadBalancers 41 32
DescribeNetworkInterfaces 53 34
DescribeNatGateways 54 50
DescribeInstances 64 44

The calls are too much:

For cluster2, from the capa logs, the awsmachine keeps reconcile for many times, that I think leads to multiple times findInstance and getInstanceENIs, i.e multiple DescribeInstances and DescribeNetworkInterfaces

~ cat capa_log| grep "Looking for instance by id.*default/c101-control-plane-4fg26" | wc -l
      21
~ cat capa_log| grep "Looking for instance by id.*default/c101-md-0-pq7b6" | wc -l
      15

For cluster1, the full data is here: https://docs.google.com/spreadsheets/d/1fBzZ8IqQ203cTv0WQ1cbX7-RiO7eHrKJ4kB5saWWkY4/edit#gid=486402940

Part of the DescribeInstances API calls:

bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:37:05Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:37:01Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:37:01Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:36:21Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:36:21Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:35:28Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:35:28Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:35:17Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:35:14Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:35:04Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:35:04Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:35:02Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:35:02Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:35:01Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:35:00Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:34:35Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:34:33Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:29:03Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:26:17Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:24:53Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:24:12Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:23:51Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:23:39Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:23:39Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:23:33Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:23:32Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:23:31Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:23:30Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:23:30Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:23:29Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:23:28Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:23:27Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:23:26Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:23:26Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:23:20Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:23:09Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:23:08Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:22:16Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:21:34Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:21:12Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:21:07Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:21:06Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:21:04Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:21:03Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:21:02Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:21:01Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:21:00Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:20:59Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:20:58Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:20:56Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:20:56Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:20:55Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:20:51Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:20:48Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:20:42Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:20:22Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:20:02Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:19:42Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:19:33Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:19:27Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:19:22Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:19:17Z ec2.amazonaws.com DescribeInstances  
bootstrapper.cluster-api-provider-aws.sigs.k8s.io   2023-03-31T03:19:12Z ec2.amazonaws.com DescribeInstances  

We can see DescribeInstances is sent very frequently, even within 1 min.

This issue is similar with https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/1764. However https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/1764 foucus on the throttling to avoid the API limit. I open a new one, aiming to decrease the outstanding API call total numbers when creating (even deleting) a cluster.

It's important to investigate and improve. Because in production env, customer usually enables guardduty for security concerns. To avoid unexpected cost for customers, we should decrease the total API calls.

Maybe we can start with DescribeInstances, DescribeNetworkInterfaces, then DescribeNatGateways and DescribeLoadBalancers.

What did you expect to happen:

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Environment:

k8s-ci-robot commented 1 year ago

This issue is currently awaiting triage.

If CAPA/CAPI contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
wyike commented 1 year ago

Currently only from capa log, I don't know why the reconciles, for example, for awsmachine (which increases DescribeInstances and DescribeNetworkInterfaces so many ) can be so frequent. I am still trying to get farmilar with the whole process.

Hope guys can share more opinions on it, thanks! I can also help on the investigation or more enhancement work.

wyike commented 1 year ago

/assign

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 9 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 8 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 8 months ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/4192#issuecomment-1951240432): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.