wallix / awless

A Mighty CLI for AWS
http://awless.io/
Apache License 2.0
4.97k stars 263 forks source link

ThrottlingException: Rate exceeded in `awless list containertasks` #250

Open diogovieira opened 5 years ago

diogovieira commented 5 years ago

Related to #172, I'm getting rate exceeded errors even when I'm filtering for running containertasks in a specific cluster:

awless list containertasks --filter cluster=<cluster> --filter state=running

I have 31 services with 31 running tasks.

I'm using awless 0.1.11

deinspanjer commented 5 years ago

I'd really like to see this one fixed. If there is a known pattern for fixing it that you can point me at, I'd be willing to try to put together a PR for it

deinspanjer commented 5 years ago

Well, I tried pretty hard to get this fixed, but I ran into too many problems in the guts of aweless.

As mentioned in #172, ultimately, the problem is awless has places in the code where it just creates one gofunc per request it is going to make to AWS (in this case, calls to GetDescribeTask and lets them all fire off simultaneously, hoping the default retry handler will do the right thing. In cases where there are many hundreds of requests to be made though, they will all try their initial request, fail, and get some sort of timeout to retry up to 3 times, but there are just too many and you hit the overall rate limit.

Below is a patch that almost provides a workaround for the issue by not calling ListDescribeTasks but instead just calling DescribeTaskDefinition using the taskDefArn we already have.

If an AWS account's number of clusters and task definitions was small enough, that would probably get by. For our account, we have far too many clusters and task defs. What we need is the ability to filter on the cluster name. The code already has the ability to do so because it makes a call to getClusterArns which can filter by cluster name, however, the cluster parameter would have to be added to the defaults for containertask, and when I try to add it, something goes wrong in the generation of the graph, and even though all the resources are found by the query, they all get filtered out and it returns 0 results.

There are a lot more properties that it would be useful to filter on, but when I tried to add those to the properties_definitions and ran make generate I ran into several problems that appear to be related to my using the latest release version of GoLang which generates different looking files than what are committed. ::sigh::

Anyway, in the hopes that someone is still active in the project and knows what is what, here is the patch that should be a good start for being able to list containertasks by cluster if you can just figure out the issue with the graph filtering.

Index: console/defaults.go
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- console/defaults.go (revision 44e892b4961fc09abf82f61de2ffeb66a20c82b7)
+++ console/defaults.go (date 1562097041000)
@@ -49,7 +49,7 @@
    cloud.ScalingPolicy:       {properties.Name, properties.Type, properties.ScalingGroupName, properties.AlarmNames, properties.AdjustmentType, properties.ScalingAdjustment},
    cloud.Repository:          {properties.Name, properties.URI, properties.Created, properties.Account, properties.Arn},
    cloud.ContainerCluster:    {properties.Name, properties.State, properties.ActiveServicesCount, properties.PendingTasksCount, properties.RegisteredContainerInstancesCount, properties.RunningTasksCount},
-   cloud.ContainerTask:       {properties.Name, properties.Version, properties.State, properties.ContainersImages, properties.Deployments},
+   cloud.ContainerTask:       {properties.Name, properties.Version, properties.Cluster, properties.State, properties.ContainersImages, properties.Deployments},
    cloud.Container:           {properties.Name, properties.DeploymentName, properties.State, properties.Created, properties.Launched, properties.Stopped, properties.Cluster, properties.ContainerTask},
    cloud.ContainerInstance:   {properties.ID, properties.Instance, properties.Cluster, properties.State, properties.RunningTasksCount, properties.PendingTasksCount, properties.Created, properties.AgentConnected},
    cloud.Certificate:         {properties.Arn, properties.Name},
Index: aws/fetch/manual_fetchers.go
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- aws/fetch/manual_fetchers.go    (revision 44e892b4961fc09abf82f61de2ffeb66a20c82b7)
+++ aws/fetch/manual_fetchers.go    (date 1562095308000)
@@ -134,6 +134,7 @@
    }

    funcs["containertask"] = func(ctx context.Context, cache fetch.Cache) ([]*graph.Resource, interface{}, error) {
+       var err error
        var objects []*ecs.TaskDefinition
        var resources []*graph.Resource

@@ -142,6 +143,15 @@
            return resources, objects, nil
        }

+       var tasks []*ecs.Task
+       if val, e := cache.Get("getTasks", func() (interface{}, error) {
+           return getAllTasks(ctx, cache, conf.APIs.Ecs)
+       }); e != nil {
+           return resources, objects, e
+       } else if v, ok := val.([]*ecs.Task); ok {
+           tasks = v
+       }
+
        type resStruct struct {
            res *ecs.TaskDefinition
            err error
@@ -150,13 +160,7 @@
        var wg sync.WaitGroup
        resc := make(chan resStruct)

-       fetchDefinitionsInput := &ecs.ListTaskDefinitionsInput{}
-       if givenFamilyPrefix, hasFilter := getUserFiltersFromContext(ctx)["name"]; hasFilter {
-           fetchDefinitionsInput.FamilyPrefix = &givenFamilyPrefix
-       }
-
-       err := conf.APIs.Ecs.ListTaskDefinitionsPages(fetchDefinitionsInput, func(out *ecs.ListTaskDefinitionsOutput, lastPage bool) (shouldContinue bool) {
-           for _, arn := range out.TaskDefinitionArns {
+       for _, t := range tasks {
                wg.Add(1)
                go func(taskDefArn *string) {
                    defer wg.Done()
@@ -166,12 +170,7 @@
                        return
                    }
                    resc <- resStruct{res: tasksOut.TaskDefinition}
-               }(arn)
-           }
-           return out.NextToken != nil
-       })
-       if err != nil {
-           return resources, objects, err
+           }(t.TaskDefinitionArn)
        }

        go func() {
@@ -179,15 +178,6 @@
            close(resc)
        }()

-       var tasks []*ecs.Task
-       if val, e := cache.Get("getAllTasks", func() (interface{}, error) {
-           return getAllTasks(ctx, cache, conf.APIs.Ecs)
-       }); e != nil {
-           return resources, objects, e
-       } else if v, ok := val.([]*ecs.Task); ok {
-           tasks = v
-       }
-
        var errors []string

        for res := range resc {
diogovieira commented 4 years ago

I'd really like to see this one fixed. If there is a known pattern for fixing it that you can point me at, I'd be willing to try to put together a PR for it

Sorry for the long delay in the response but somehow this got past me. From what I recall this happened everytime I ran list containertasks. It was not an intermittent issue.