googleapis / google-cloud-python

Google Cloud Client Library for Python
https://googleapis.github.io/google-cloud-python/
Apache License 2.0
4.85k stars 1.53k forks source link

httplib2.Http is not thread-safe #1214

Closed pcostell closed 7 years ago

pcostell commented 9 years ago

This is from a StackOverflow post. I've done some debugging from the Datastore side but I don't think these requests ever make it into the Datastore part of the stack. I'd appreciate if someone look at this from the gcloud-python team. Note that this issue was originally reported via gcd-discuss@google.com In July.

I have a Python Django application running on a Google Compute instance. It is using gcloudoem to interface from Django to Google Datastore. gcloudoem uses the same underlying code to communicate with Datastore as gcloud-python 0.5.x

At what seems to be completely random times, I will get SSL errors happening when trying to talk to Datastore. There is no pattern in where in my application code these happen. It's just during a random call to Datastore. Here are the two flavours of errors:

ERROR:django.request:Internal Server Error: /complete/google-oauth2/
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/base.py", line 111, in get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/django/views/decorators/cache.py", line 52, in _wrapped_view_func
    response = view_func(request, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/django/views/decorators/csrf.py", line 57, in wrapped_view
    return view_func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/social/apps/django_app/utils.py", line 51, in wrapper
    return func(request, backend, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/social/apps/django_app/views.py", line 28, in complete
    redirect_name=REDIRECT_FIELD_NAME, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/social/actions.py", line 43, in do_complete
    user = backend.complete(user=user, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/social/backends/base.py", line 41, in complete
    return self.auth_complete(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/social/utils.py", line 229, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/social/backends/oauth.py", line 387, in auth_complete
    *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/social/utils.py", line 229, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/social/backends/oauth.py", line 396, in do_auth
    return self.strategy.authenticate(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/social/strategies/django_strategy.py", line 96, in authenticate
    return authenticate(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/django/contrib/auth/__init__.py", line 60, in authenticate
    user = backend.authenticate(**credentials)
  File "/usr/local/lib/python2.7/dist-packages/social/backends/base.py", line 82, in authenticate
    return self.pipeline(pipeline, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/social/backends/base.py", line 85, in pipeline
    out = self.run_pipeline(pipeline, pipeline_index, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/social/backends/base.py", line 112, in run_pipeline
    result = func(*args, **out) or {}
  File "/usr/local/lib/python2.7/dist-packages/social/pipeline/social_auth.py", line 20, in social_user
    social = backend.strategy.storage.user.get_social_auth(provider, uid)
  File "./social_gc/storage.py", line 105, in get_social_auth
    return cls.objects.get(provider=provider, uid=uid)
  File "/usr/local/lib/python2.7/dist-packages/gcloudoem/queryset/__init__.py", line 162, in get
    num = len(clone)
  File "/usr/local/lib/python2.7/dist-packages/gcloudoem/queryset/__init__.py", line 126, in __len__
    self._fetch_all()
  File "/usr/local/lib/python2.7/dist-packages/gcloudoem/queryset/__init__.py", line 370, in _fetch_all
    self._result_cache = list(self.iterator())
  File "/usr/local/lib/python2.7/dist-packages/gcloudoem/datastore/query.py", line 480, in __iter__
    self.next_page()
  File "/usr/local/lib/python2.7/dist-packages/gcloudoem/datastore/query.py", line 452, in next_page
    transaction_id=transaction and transaction.id,
  File "/usr/local/lib/python2.7/dist-packages/gcloudoem/datastore/connection.py", line 249, in run_query
    response = self._rpc('runQuery', request, datastore_pb.RunQueryResponse)
  File "/usr/local/lib/python2.7/dist-packages/gcloudoem/datastore/connection.py", line 159, in _rpc
    data=request_pb.SerializeToString()
  File "/usr/local/lib/python2.7/dist-packages/gcloudoem/datastore/connection.py", line 134, in _request
    body=data
  File "/usr/local/lib/python2.7/dist-packages/oauth2client/client.py", line 589, in new_request
    redirections, connection_type)
  File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1609, in request
    (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
  File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1351, in _request
    (response, content) = self._conn_request(conn, request_uri, method, body, headers)
  File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1307, in _conn_request
    response = conn.getresponse()
  File "/usr/lib/python2.7/httplib.py", line 1127, in getresponse
    response.begin()
  File "/usr/lib/python2.7/httplib.py", line 453, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python2.7/httplib.py", line 409, in _read_status
    line = self.fp.readline(_MAXLINE + 1)
  File "/usr/lib/python2.7/socket.py", line 480, in readline
    data = self._sock.recv(self._rbufsize)
  File "/usr/lib/python2.7/ssl.py", line 734, in recv
    return self.read(buflen)
  File "/usr/lib/python2.7/ssl.py", line 621, in read
    v = self._sslobj.read(len or 1024)
SSLError: [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1752)

Unfortunately, for the second, I don't have a full stacktrace handy:

[SSL: DECRYPTION_FAILED_OR_BAD_RECORD_MAC] decryption failed or bad record mac (_ssl.c:1752)

These errors don't happen when I am using the GCD tool. Does anyone have any idea what is happening here? Is this some sort of networking problem?

pcostell commented 9 years ago

Also -- this is using gcloudoem, which most (all?) of the datastore code was taken from gcloud-python. If you do think this is only an issue with gcloudoem and not gcloud-python, let's close this issue and reopen it for the gcloudoem owner.

jgeewax commented 9 years ago

Unfortunately, even if we did find and fix a bug in gcloud-python, gcloud-datastore-oem / gcloudoem copied, pasted and modified the code, so our changes won't fix anything for this particular situation.

I think it's worth a quick shot at trying to reproduce on our side, but we still need to open a bug at gcloud-datastore-oem.

pcostell commented 9 years ago

True -- but based on the type of error I'd expect it to also be an error in gcloud-python (unless we've already fixed it). If we are fairly confident it isn't, we can just open a bug against gcloud-datastore-oem and forget about it(may be worthwhile to do this anyways! -- the bug, not the forgetting)

dhermes commented 9 years ago

Can you provide code to repro? Like, loop 5,000 times and then it happens probably?

rstuart85 commented 9 years ago

I originally reported this problem and I can provide a little more context.

The problem happens quite regularly when I am developing code from my local machine using the production datastore. However, if I use the gcd tool locally instead of actual datastore, it never happens. It happens less regularly in production from a compute instance. It might also be worth noting that the code is run in a docker container.

I've tried installing pyOpenSSL to see if the problem is rectified but it appears to have no affect.

I will write some code using only gcloud-python to see if I can reproduce the problem. I will set it running locally first because I'm slightly concerned about the potential cost it could accumulate on gcloud. I'll report back with results.

rstuart85 commented 9 years ago

Here is the code I am running. I will report back when something goes wrong.

dhermes commented 9 years ago

Thanks @rstuart85! It looks like a very simple script :+1:

rstuart85 commented 9 years ago

The script has been running for a couple of days with no errors. So there appears to be something related to my setup that is causing the SSL errors. I'll keep investigating.

rstuart85 commented 9 years ago

Another update. It isn't just Datastore that gets the problem. Here is a stacktrace from trying to access a bucket using gcloud-python.

ERROR:django.request:Internal Server Error: /shared_image/ded1e2fa794e425b866ffdcf5ffbccda/
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/base.py", line 111, in get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/django/views/decorators/http.py", line 41, in inner
    return func(request, *args, **kwargs)
  File "./research_portal/views.py", line 623, in shared
    blob = gcloud_utils.get_object(shared_id, bucket=storage.get_bucket(settings.SHARED_BUCKET))
  File "./research_portal/gcloud_utils.py", line 31, in get_object
    return bucket.get_blob(name)
  File "/usr/local/lib/python2.7/dist-packages/gcloud/storage/bucket.py", line 190, in get_blob
    path=blob.path)
  File "/usr/local/lib/python2.7/dist-packages/gcloud/storage/connection.py", line 229, in api_request
    method=method, url=url, data=data, content_type=content_type)
  File "/usr/local/lib/python2.7/dist-packages/gcloud/storage/connection.py", line 141, in _make_request
    return self._do_request(method, url, headers, data)
  File "/usr/local/lib/python2.7/dist-packages/gcloud/storage/connection.py", line 166, in _do_request
    body=data)
  File "/usr/local/lib/python2.7/dist-packages/oauth2client/client.py", line 589, in new_request
    redirections, connection_type)
  File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1609, in request
    (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
  File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1351, in _request
    (response, content) = self._conn_request(conn, request_uri, method, body, headers)
  File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1307, in _conn_request
    response = conn.getresponse()
  File "/usr/lib/python2.7/httplib.py", line 1127, in getresponse
    response.begin()
  File "/usr/lib/python2.7/httplib.py", line 453, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python2.7/httplib.py", line 409, in _read_status
    line = self.fp.readline(_MAXLINE + 1)
  File "/usr/lib/python2.7/socket.py", line 480, in readline
    data = self._sock.recv(self._rbufsize)
  File "/usr/lib/python2.7/ssl.py", line 734, in recv
    return self.read(buflen)
  File "/usr/lib/python2.7/ssl.py", line 621, in read
    v = self._sslobj.read(len or 1024)
SSLError: [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1752)

And here is the code:

def get_object(name, bucket=None):
    retries = 0
    while True:
        try:
            if not bucket:
                bucket = storage.get_bucket(settings.UPLOADS_BUCKET)
            return bucket.get_blob(name)
        except InternalServerError as e:  # Retry when Google has an error
            logger.exception("Got a Google Storage error trying to upload a file.")
            if retries < 3:
                sleep = random.random() * 2
                time.sleep(sleep)
                retries += 1
                continue
            raise e

As you can see, I've tried to sidestep the errors using retires.

rstuart85 commented 9 years ago

Just incase it is worth while, here is the Dockerfile I use to build the environment this runs from.

FROM ubuntu:15.04
MAINTAINER Ryan Stuart <ryan@kapiche.com>

# Install packages
RUN apt-get update && apt-get install -y \
                                build-essential \ 
                                git \
                                libffi-dev \
                                libjpeg-dev \
                                libsqlite3-dev \
                                libssl-dev \
                                libyaml-dev \
                                libxml2-dev \
                                libxslt1-dev \
                                python \
                                python-dev \
                                python-setuptools \
                                sqlite3 \
                                zlib1g-dev

# Setup ssh acces to private repos
RUN mkdir /root/.ssh/
ADD id_rsa /root/.ssh/id_rsa
RUN chmod 500 /root/.ssh/id_rsa && echo "    IdentityFile ~/.ssh/id_rsa" >> /etc/ssh/ssh_config
RUN touch /root/.ssh/known_hosts && ssh-keyscan github.com >> /root/.ssh/known_hosts

# Setup the application
RUN easy_install -U pip
RUN pip install apsw==3.8.7.3.post1
RUN mkdir /research_portal
ADD . /research_portal
WORKDIR /research_portal
RUN pip install -r requirements.txt
RUN python -m nltk.downloader punkt
ENV PYTHONUNBUFFERED=1 \
    STATIC_URL='https://storage.googleapis.com/static.kapiche.com/research/b0fa/'

And the requirements.txt:

caterpillar>=1.0.0.dev7
-e git+ssh://git@github.com/Kapiche/caterpillar-influence.git@master#egg=caterpillar-influence
-e git+ssh://git@github.com/Kapiche/caterpillar-topics.git@master#egg=caterpillar-topics
celery~=3.1.17
Django~=1.7.9
django-debug-toolbar~=1.3.2
django-gravatar2
gcloud~=0.4.2
gcloudoem>=0.1.0rc10
goose-extractor~=1.0.25
markdown~=2.6.2
opbeat
pdfminer==20140328
pycrypto~=2.6.1
pyopenssl~=0.15.1
python-social-auth~=0.2.7
requests~=2.6.0
stripe~=1.22.2
wsgiref==0.1.2
xlrd==0.9.3
uwsgi
dhermes commented 9 years ago

It's worrisome. I'm curious if the error is

Maybe we should call in @jcgregorio to see if he has encountered this anywhere else with httplib2

dhermes commented 9 years ago

I'm curious if turning off stdin buffering (PYTHONUNBUFFERED=1) makes a difference. I highly doubt it, but it's interesting to see.

Have you tried catching the ssl.SSLError exception and entering a debugger (I prefer ipdb over pdb but either will work).

rstuart85 commented 9 years ago

No, but I'll give that a try. It happens both inside and outside of docker (on the production server which uses docker and on my local laptop which doesn't), so I doubt it is docker related. My local machine is OSX and the server is created using docker-machine.

dhermes commented 9 years ago

Very good to know! Local laptop is running which OS?

rstuart85 commented 9 years ago

OSX

dhermes commented 9 years ago

Nice. And the docker instance is running on OS X or on a linux VM somewhere?

rstuart85 commented 9 years ago

Linux VM on Google Compute created using docker-machine.

dhermes commented 9 years ago

Good. This (almost) certainly means the issue is in the libraries.

rstuart85 commented 9 years ago

I've seen issues related to SSL with httplib2, but they are almost always related to httplib2 using its own bundled certificate store. I can't find anything related to these issues.

dhermes commented 9 years ago

"these issues" meaning the "bundled certificate store" or the "SSL: WRONG_VERSION_NUMBER"

rstuart85 commented 9 years ago

The two exceptions I get which are "SSL: WRONG_VERSION_NUMBER" and "SSL: DECRYPTION_FAILED_OR_BAD_RECORD_MAC"

dhermes commented 9 years ago

Did you ever find a stracktrace for DECRYPTION_FAILED_OR_BAD_RECORD_MAC?

Some places where the error occurs (they don't really illuminate): https://github.com/python/cpython/blob/2.7/Lib/ssl.py#L734 https://github.com/python/cpython/blob/2.7/Lib/ssl.py#L621 https://github.com/python/cpython/blob/2.7/Modules/_ssl.c#L1752 https://hg.python.org/cpython/file/v2.7.10/Lib/ssl.py#l621 https://hg.python.org/cpython/file/v2.7.10/Modules/_ssl.c

What versions of oauth2client, httplib2 and gcloud-python do you have? From what I can tell, you have version 0.4.3 of our library, which is very old.


Alternate theory: this error occurs because you are finding a Google server which doesn't speak the right SSL protocol.

From an openssl mailing list:

Versions in client/server SSL records do not agree. Probably your client sends SSL2 ... and server is configured only for SSL3/TLS1. In this situation server does not accept SSL2 ... is being manifested by "wrong version number" error. To resolve this error you may disable SSL2 on client or enable SSL2 handshake on server. tcpdump output from wrong session handshake may be helpful too.

UPDATE: Some sections of _ssl.c may be relevant here. UPDATE 2: It looks like other people have experienced this error using a Google API. (SSLv3 is deprecated).

dhermes commented 9 years ago

You could use Wireshark to capture the traffic and make a Capture Filter host www.googleapis.com and then see if SSL3 is used. I did it for some light traffic on my Ubuntu 14.04 laptop and only saw TLSv1.2.


It seems that Python ssl.py has disabled SSLv3:

    # SSLv3 has problematic security and is only required for really old
    # clients such as IE6 on Windows XP
    context.options |= OP_NO_SSLv3

Double-check that file on your systems to make sure SSLv3 isn't the problem?

rstuart85 commented 9 years ago

I haven't had another stacktrace yet for DECRYPTION_FAILED_OR_BAD_RECORD_MAC (all the recent ones have been WRONG_VERSION_NUMBER) but I know they come from _ssl.c (1752).

The post from the mailing list is interesting but the only people that could elaborate on that would be Google Engineers I assume? Do all your servers speak SSL2?

rstuart85 commented 9 years ago

I will check SSLv3 support.

dhermes commented 9 years ago

After POODLE, SSLv1, SSLv2 and SSLv3 are deprecated (SSLv1 isn't even implemented). SSLv23 is actually the default in ssl.py. TLSv1, TLSv1_1 and TLSv1_2 are the other supported protocol variants. The line _ssl.c (1752) is just parroting back an error from the system SSL libraries so it doesn't tell us much.

I checked on my machine with Python 2.7.6 (default on Ubuntu 14.04) and SSLv3 is still enabled.

dhermes commented 9 years ago

@rstuart85 Were you able to check your python -V and the contents of ssl.py? Also, were you able to reproduce the error with a debugger (pdb or ipdb)?

I may also try to reproduce on my machine and monitor with Wireshark to find out what was happening.

rstuart85 commented 9 years ago

@krisrogers is getting this problem regularly from his development machine so he is going to debug it and report back here.

dhermes commented 9 years ago

Great thanks. I've been running an infinite loop with just getting an object from a bucket and having no error (not even a 500) after

Iteration: 21463, 3368.84s

UPDATE: I killed the script after no failures in

Iteration: 34801, 5414.76s

This was on my local machine and it was connected to the web via my apartment WiFi.

rstuart85 commented 9 years ago

I'm pretty sure it is a client side issue because it happens a lot more in development rather than production. Either way, @krisrogers should be able to give some more info shortly.

dhermes commented 9 years ago

Good to know. Thanks.

rstuart85 commented 8 years ago

We can now no longer reproduce this on our development environment. But this is a screenshot from our monitoring tool. All bar 1 error is related to Datastore/Storage. Not quite sure what to do next.

screen shot 2015-12-10 at 11 49 33 am
rstuart85 commented 8 years ago

Here are some of the other types of errors.

screen shot 2015-12-10 at 11 54 26 am
theacodes commented 8 years ago

Could this be a threading issue? When I used a connection across threads I'd get random SSL errors.

On Wed, Dec 9, 2015, 6:06 PM Ryan Stuart notifications@github.com wrote:

Here are some of the other types of errors. [image: screen shot 2015-12-10 at 11 54 26 am] https://cloud.githubusercontent.com/assets/2389518/11704767/72f2dcf6-9f36-11e5-8e74-53a628b56fd5.png

— Reply to this email directly or view it on GitHub https://github.com/GoogleCloudPlatform/gcloud-python/issues/1214#issuecomment-163463428 .

dhermes commented 8 years ago

Ooohh goody I love a new lead! @jonparrott you mean a gcloud-python Connection?

@rstuart85 is that the case?

theacodes commented 8 years ago

Yes, although I can't say for sure if it was the same kind of errors. But if the app this is occurring in is multithreaded (or is hosted by a multithreaded wsgi server) and shares a client/connection that can certainly cause inexplicable errors.

(I'm currently at a team offsite, so I can't really test it out)

On Wed, Dec 9, 2015, 9:18 PM Danny Hermes notifications@github.com wrote:

Ooohh goody I love a new lead! @jonparrott https://github.com/jonparrott you mean a gcloud-python Connection?

@rstuart85 https://github.com/rstuart85 is that the case?

— Reply to this email directly or view it on GitHub https://github.com/GoogleCloudPlatform/gcloud-python/issues/1214#issuecomment-163489567 .

rstuart85 commented 8 years ago

Yes, it looks to be threading related. We can now reproduce reliably using our application from multiple browser tabs. @krisrogers is gathering information and posting it here shortly.

rstuart85 commented 8 years ago

Also, we host the application using uwsgi in multithreaded mode. So @jonparrott is on the money there. Glad we are getting somewhere with this!

theacodes commented 8 years ago

Wow. Total shot in the dark there. You could try installing from #1274 and seeing if that fixes the issue? (though that PR is not reviewed or approved yet).

rstuart85 commented 8 years ago

I've re-implemented my test script that shows the issue. The issue appears every time for me.

rstuart85 commented 8 years ago

Using the basis of the solution used by @jonparrott worked great for gcloudoem.

theacodes commented 8 years ago

Awesome, @rstuart85. Glad to know that we both fixed your issue and confirmed that the patch works.

dhermes commented 8 years ago

I love that we Jon found the cause. Since I found a ton of other bugs on the web that experienced this with Google APIs, I still want to create a very simple repro-script (likely won't use gcloud-python or at least won't use much of it).

dhermes commented 8 years ago

I am playing around and not able to force the error with 15 threads using the same httplib2.Http object.

@rstuart85 Can you give a sense of how many threads were active?

(FWIW, I am using a different origin for all 15 threads and that may be mitigating the issue. I am going to try to make a datastore example as suggested by @rstuart85 and then extract the pieces that are not tied to gcloud-python or oauth2client)

rstuart85 commented 8 years ago

@dhermes as I mentioned, it happens every time I run the revised version of my script linked to above. All you need is two active HTTP connections from the same origin talking to the same service at the same time. Hence why I used sleeps in the code. My code used 4 threads.

dhermes commented 8 years ago

Thanks. Sorry for losing the thread of the conversation.

dhermes commented 8 years ago

Yay! I got my first repro!

Exception in thread Thread-2:
...
SSLError: [Errno 1] _ssl.c:1429: error:1408F10B:SSL routines:SSL3_GET_RECORD:wrong version number

and

Exception in thread Thread-1:
...
SSLError: [Errno 1] _ssl.c:1429: error:1408F081:SSL routines:SSL3_GET_RECORD:block cipher pad is wrong

I'm working on throwing away as much of the extra (e.g. all of gcloud-python) as possible so that we can dive deep on the failure (or more likely, someone from httplib2 can).

rstuart85 commented 8 years ago

Sounds like a plan. I'm pretty gob smacked that more people aren't reporting this problem. I couldn't find much about it out there and I've personally being trying to convince people for a few months that it is an issue.

dhermes commented 8 years ago

The lack of thread-safety in httplib2 is a well-known thing: https://developers.google.com/api-client-library/python/guide/thread_safety

Though we don't really publicly advertise that we use httplib2 underneath.

I have a "broken down" isolated example with httplib2 only: https://gist.github.com/dhermes/aa7b6186ede2a9d0b471

I'm sure someone could use it to find out how the bits are going to the wrong thread and maybe even fix it.

dhermes commented 8 years ago

@rstuart85 RE: "more people reporting" it seems there is chatter:

It looks like other people have experienced this error using a Google API. (SSLv3 is deprecated).