Closed wyike closed 8 months ago
This issue is currently awaiting triage.
If CAPA/CAPI contributors determines this is a relevant issue, they will accept it by applying the triage/accepted
label and provide further guidance.
The triage/accepted
label can be added by org members by writing /triage accepted
in a comment.
Currently only from capa log, I don't know why the reconciles, for example, for awsmachine (which increases DescribeInstances and DescribeNetworkInterfaces so many ) can be so frequent. I am still trying to get farmilar with the whole process.
Hope guys can share more opinions on it, thanks! I can also help on the investigation or more enhancement work.
/assign
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
/kind bug
What steps did you take and what happened: [A clear and concise description of what the bug is.]
Recently in our testing enviroment, , cloudTrail have excessive events recorded for AWS API calls. As a result, the GuardDuty cost on the event analysis is very high. From our analysis, it should come from the in-using CAPA.
To double confirm, I create a simple cluster with one master and one worker, note the api calls CAPA sent during this time with cloudTrails:
The calls are too much:
For cluster2, from the capa logs, the awsmachine keeps reconcile for many times, that I think leads to multiple times findInstance and getInstanceENIs, i.e multiple
DescribeInstances
andDescribeNetworkInterfaces
For cluster1, the full data is here: https://docs.google.com/spreadsheets/d/1fBzZ8IqQ203cTv0WQ1cbX7-RiO7eHrKJ4kB5saWWkY4/edit#gid=486402940
Part of the DescribeInstances API calls:
We can see DescribeInstances is sent very frequently, even within 1 min.
This issue is similar with https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/1764. However https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/1764 foucus on the throttling to avoid the API limit. I open a new one, aiming to decrease the outstanding API call total numbers when creating (even deleting) a cluster.
It's important to investigate and improve. Because in production env, customer usually enables guardduty for security concerns. To avoid unexpected cost for customers, we should decrease the total API calls.
Maybe we can start with
DescribeInstances
,DescribeNetworkInterfaces
, thenDescribeNatGateways
andDescribeLoadBalancers
.What did you expect to happen:
Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]
Environment:
kubectl version
):/etc/os-release
):