jorgebastida / awslogs

AWS CloudWatch logs for Humans™
Other
4.86k stars 336 forks source link

Throttling or slow performance when searching group with many streams #358

Open ajenkinski opened 3 years ago

ajenkinski commented 3 years ago

When performing an awslogs get operation on a log group with many streams, even if I I specify the full log stream name I want to search, the performance is either very slow, or if the group has enough streams, I get a ThrottlingException error.

I've run into this when searching logs from AWS Batch. Batch uses the same log group for all jobs, /aws/batch/jobs, and puts the output from each job into its own stream. This means the /aws/batch/jobs log group ends up with a large number of streams if you use batch a lot. However, this shouldn't be a problem if I know the log stream I want to search.

For example, if I ran

awslogs get -GS -s 1d /aws/batch/job my-job/default/309e41b6173e4bb98171fb3529a58092

where the last argument is the complete log stream name, I would have expected fast performance since it doesn't need to search all log streams. However what the code actually does is treat the stream name as a regex, and list every log stream in the group and compare its name to the given pattern. This causes a ThrottlingException for me. In other instance where the group doesn't have quite so many streams, the problem manifests as just very slow performance.

Here is the error I get when I run the above command:

Traceback (most recent call last):
  File "/Users/ajenkins/.local/bin/awslogs", line 8, in <module>
    sys.exit(main())
  File "/Users/ajenkins/.local/pipx/venvs/awslogs/lib/python3.6/site-packages/awslogs/bin.py", line 179, in main
    getattr(logs, options.func)()
  File "/Users/ajenkins/.local/pipx/venvs/awslogs/lib/python3.6/site-packages/awslogs/core.py", line 109, in list_logs
    streams = list(self._get_streams_from_pattern(self.log_group_name, self.log_stream_name))
  File "/Users/ajenkins/.local/pipx/venvs/awslogs/lib/python3.6/site-packages/awslogs/core.py", line 102, in _get_streams_from_pattern
    for stream in self.get_streams(group):
  File "/Users/ajenkins/.local/pipx/venvs/awslogs/lib/python3.6/site-packages/awslogs/core.py", line 261, in get_streams
    for page in paginator.paginate(**kwargs):
  File "/Users/ajenkins/.local/pipx/venvs/awslogs/lib/python3.6/site-packages/botocore/paginate.py", line 255, in __iter__
    response = self._make_request(current_kwargs)
  File "/Users/ajenkins/.local/pipx/venvs/awslogs/lib/python3.6/site-packages/botocore/paginate.py", line 332, in _make_request
    return self._method(**current_kwargs)
  File "/Users/ajenkins/.local/pipx/venvs/awslogs/lib/python3.6/site-packages/botocore/client.py", line 357, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/Users/ajenkins/.local/pipx/venvs/awslogs/lib/python3.6/site-packages/botocore/client.py", line 661, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (ThrottlingException) when calling the DescribeLogStreams operation (reached max retries: 4): Rate exceeded

I've just created a pull request to fix this. I added a --stream-prefix option to awslogs get, which tells it to treat the log stream argument as a string prefix instead of as a regex. Then it can pass it as the logStreamNamePrefix argument to describe_log_streams, which results in much faster performance, since the filtering is done in AWS. This completely fixes the problem for me.