obsrvbl-oss / flowlogs-reader

Command line tool and Python library for working with AWS VPC Flow Logs
Apache License 2.0
136 stars 25 forks source link

No response due to getresponse() got an unexpected keyword argument 'buffering #36

Closed nevosial closed 5 years ago

nevosial commented 5 years ago

The flow_logs_reader hangs while trying to retrieve large result dataset.

File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 377, in _make_request httplib_response = conn.getresponse(buffering=True) TypeError: getresponse() got an unexpected keyword argument 'buffering'

bbayles commented 5 years ago

Is this the full traceback? Could you post the section of your code that is getting this?

nevosial commented 5 years ago

Using the CLI $ flowlogs_reader "my-LogGroup" findip

^CTraceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.6/bin/flowlogs_reader", line 11, in <module>
    sys.exit(main())
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/flowlogs_reader/__main__.py", line 235, in main
    action_method(reader, *args.action[1:])
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/flowlogs_reader/__main__.py", line 69, in action_findip
    for record in reader:
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/flowlogs_reader/flowlogs_reader.py", line 193, in __next__
    return next(self.iterator)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/flowlogs_reader/flowlogs_reader.py", line 202, in _reader
    for event in self._read_streams():
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/flowlogs_reader/flowlogs_reader.py", line 248, in _read_streams
    for page in response_iterator:
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/botocore/paginate.py", line 255, in __iter__
    response = self._make_request(current_kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/botocore/paginate.py", line 332, in _make_request
    return self._method(**current_kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/botocore/client.py", line 320, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/botocore/client.py", line 611, in _make_api_call
    operation_model, request_dict)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/botocore/endpoint.py", line 102, in make_request
    return self._send_request(request_dict, operation_model)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/botocore/endpoint.py", line 147, in _send_request
    request, operation_model, attempts)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/botocore/endpoint.py", line 179, in _get_response
    http_response = self._send(request)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/botocore/endpoint.py", line 223, in _send
    return self.http_session.send(request)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/botocore/httpsession.py", line 272, in send
    http_response.content
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/botocore/awsrequest.py", line 587, in content
    self._content = bytes().join(self.raw.stream()) or bytes()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/response.py", line 494, in stream
    data = self.read(amt=amt, decode_content=decode_content)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/response.py", line 442, in read
    data = self._fp.read(amt)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 449, in read
    n = self.readinto(b)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 493, in readinto
    n = self.fp.readinto(b)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/socket.py", line 586, in readinto
    return self._sock.recv_into(b)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py", line 1012, in recv_into
    return self.read(nbytes, buffer)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py", line 874, in read
    return self._sslobj.read(len, buffer)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py", line 631, in read
    v = self._sslobj.read(len, buffer)
KeyboardInterrupt
bbayles commented 5 years ago

Did the original exception occur after you pressed Ctrl+C? I expect that this is the actual cause of the error message shown.

Also, can you try using the --start-time filter to limit the results?

nevosial commented 5 years ago

I ran the example using the python library.

start = datetime.datetime(2018,12,3,21,10,0)
end = datetime.datetime(2018,12,3,21,30,0)
flow_logs_reader = FlowLogsReader(log_group_name="myLogGroup",boto_client_kwargs=my_map, start_time=start, end_time=end)
print(len(list(flow_logs_reader)))

I get the same error after waiting for more than 15 mins and then stopping the execution in pycharm.(have attached the full error log)

error_log.txt

bbayles commented 5 years ago

Thanks for the details.

What about a really short time window - like one minute. Can you get results to come back from that?

nevosial commented 5 years ago

Thanks. I did try it for 1 and 2 minute intervals; i get the same error.

start = datetime.datetime(2018,12,3,21,11,0) end = datetime.datetime(2018,12,3,21,12,0)

bbayles commented 5 years ago

Let's take this library out of the loop.

Install the AWS CLI: pip install awscli

Make the same query with it:

$ aws --region="some-region" logs filter-log-events --log-group-name="some-group" --start-time="1544040000" --end-time="1544040060"

How long does that take?

nevosial commented 5 years ago

Thanks. Without the lib the response is quick. Above command gave an output within 5 seconds.

bbayles commented 5 years ago

OK, that's interesting. Could you tell me what versions of botocore and boto3 you are using? You can see that this library is using the very same API calls.

nevosial commented 5 years ago

Yes i see that but I'm not sure if i'm missing something here,

boto3 - 1.9.59 botocore - 1.12.59

from flowlogs_reader import FlowLogsReader
import datetime, time
import boto3

client = boto3.client('logs')
my_map = {'aws_access_key_id':'mykey','aws_secret_access_key':'myaccess', 'region_name':'my-region-1'}
start = datetime.datetime(2018,12,3,21,10,0)
end = datetime.datetime(2018,12,3,21,11,0)

flow_logs_reader = FlowLogsReader(log_group_name="myLogGroup", boto_client_kwargs=my_map, start_time=start, end_time=end, boto_client=client)
print(list(flow_logs_reader))
bbayles commented 5 years ago

Hmm, could you delete the client = boto3.client('logs') bit? As you can see here, boto_client_kwargs are not read if a client is supplied. So your key and such aren't being read.

bbayles commented 5 years ago

Also: when you're pasting in code snippets, please use ``` instead of ` to make sure the lines are rendered properly.

nevosial commented 5 years ago

Yes, Thank you. removing the client variable doesn't help. i see the same error. I tried updating my my_map = {'service_name':'logs', 'aws_access_key_id':'mykey','aws_secret_access_key':'myaccess', 'region_name':'my-region-1'}

this update doesn't help as well.

bbayles commented 5 years ago

Let's take this library out of the loop again:

Can you run something like this?

import boto3

session = boto3.Session(
    aws_access_key_id='...',
    aws_secret_access_key='...',
    aws_session_token='...',
)
client = session.client('logs', region_name='...')
resp = client.filter_log_events(logGroupName='...'):
print(len(resp['events']))

My suspicion is that your boto3 client isn't actually being initialized with the right information.

nevosial commented 5 years ago

If i do not use the library, i can access and get back the desired result. As said before as well as trying the above code snippet gives me the correct result.

    client = boto3.client('logs')
    resp = client.filter_log_events(logGroupName="my_group", limit=10)
    print(resp['events'])

Also gives me an output.

bbayles commented 5 years ago

There must be an answer here. The library uses the get_paginator instead of calling filter_log_events directly (here).

So the next test would be to do:

paginator = client.get_paginator('filter_log_events')
response_iterator = paginator.paginate(
    logGroupName='my_group',
    interleaved=True,
)
for page in response_iterator:
    for event in page['events']:
        print(event)
        assert False

If that doesn't work, try leaving the interleaved bit out?

nevosial commented 5 years ago

The get_paginator approach (above) works too with and without the interleaved bit !

bbayles commented 5 years ago

We're missing something, I'm sure.

Start with the client that is working manually. Now do:

# Ensure the standard client is working
standard_flow = client.filter_log_events(logGroupName='flowlog_group')['events'][0]
print(standard_flow)

# Attach the standard client to the library and retrieve a flow
reader = FlowLogsReader('flowlog_group')
reader.boto_client = client 
library_flow = next(iter(reader)).to_dict()
print(library_flow)

This attaches the working boto client to the reader. Then the reader does exactly what we just did.

What's that give?

nevosial commented 5 years ago

The above snippet also works and provides the output. I added the start_time and end_time filter to the above reader; which works fine as well.

bbayles commented 5 years ago

Well that's a relief.

This should do the same thing:

# Ensure the standard client is working
standard_flow = client.filter_log_events(logGroupName='flowlog_group')['events'][0]
print(standard_flow)

# Attach the standard client to the library and retrieve a flow
reader = FlowLogsReader('flowlog_group', boto_client=client)
library_flow = next(iter(reader)).to_dict()
print(library_flow)

Does it? If not, could you do print(id(client)) and print(id(reader.boto_client))?

nevosial commented 5 years ago

Actually, this is what i tried previously along with the start_time and end_time. This works and I see that the client id in both cases are the same.

bbayles commented 5 years ago

OK, so when used as a library everything is working as expected?

Does the CLI tool, without the findip, produce output?

$ flowlogs_reader --region="region-name" "flowlog_group"
nevosial commented 5 years ago

The CLI tool without findip param works fine.

Not sure if i understand right, the library doesn't work as expected. The following code still doesn't work (here i'm setting the session explicitly),

import boto3

session = boto3.Session(
    aws_access_key_id='...',
    aws_secret_access_key='...',
    aws_session_token='...',
)
client = session.client('logs', region_name='my-region-1')
start = datetime.datetime(2018,12,3,21,10,0)
end = datetime.datetime(2018,12,3,21,11,0)
flow_logs_reader = FlowLogsReader(log_group_name, boto_client=client, start_time=start, end_time=end)
print(len(list(flow_logs_reader)))

The error is the same as seen in the error_log.txt

bbayles commented 5 years ago

If you do what we said above, that works, right?

reader = FlowLogsReader('flowlog_group', boto_client=client)
library_flow = next(iter(reader)).to_dict()
print(library_flow)

That is retrieving one flow.

That's a lot different from print(len(list(flow_logs_reader))), which is retrieving every flow. If you change it to:

for flow in flow_logs_reader:
    print(flow)

then that probably prints a ton of stuff and takes a long time, right?

bbayles commented 5 years ago

For those finding this issue from Google: the No response due to getresponse() got an unexpected keyword argument 'buffering' thing is actually being caused by another exception.

In this case it's a KeyboardInterrupt or a connection timeout, but for other code it may be something else. Look to what's calling urllib3!


For others having an issue pulling in large amounts of flow log data from AWS CloudTrail in a reasonable amount of time, (a) consider using the newer S3 destination (which you can read from with this library), or (b) consider setting up a Kinesis subscription to your CWL group and using this library.