Closed pcostell closed 7 years ago
Also -- this is using gcloudoem, which most (all?) of the datastore code was taken from gcloud-python. If you do think this is only an issue with gcloudoem and not gcloud-python, let's close this issue and reopen it for the gcloudoem owner.
Unfortunately, even if we did find and fix a bug in gcloud-python, gcloud-datastore-oem / gcloudoem copied, pasted and modified the code, so our changes won't fix anything for this particular situation.
I think it's worth a quick shot at trying to reproduce on our side, but we still need to open a bug at gcloud-datastore-oem.
True -- but based on the type of error I'd expect it to also be an error in gcloud-python (unless we've already fixed it). If we are fairly confident it isn't, we can just open a bug against gcloud-datastore-oem and forget about it(may be worthwhile to do this anyways! -- the bug, not the forgetting)
Can you provide code to repro? Like, loop 5,000 times and then it happens probably?
I originally reported this problem and I can provide a little more context.
The problem happens quite regularly when I am developing code from my local machine using the production datastore. However, if I use the gcd tool locally instead of actual datastore, it never happens. It happens less regularly in production from a compute instance. It might also be worth noting that the code is run in a docker container.
I've tried installing pyOpenSSL to see if the problem is rectified but it appears to have no affect.
I will write some code using only gcloud-python to see if I can reproduce the problem. I will set it running locally first because I'm slightly concerned about the potential cost it could accumulate on gcloud. I'll report back with results.
Here is the code I am running. I will report back when something goes wrong.
Thanks @rstuart85! It looks like a very simple script :+1:
The script has been running for a couple of days with no errors. So there appears to be something related to my setup that is causing the SSL errors. I'll keep investigating.
Another update. It isn't just Datastore that gets the problem. Here is a stacktrace from trying to access a bucket using gcloud-python.
ERROR:django.request:Internal Server Error: /shared_image/ded1e2fa794e425b866ffdcf5ffbccda/
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/base.py", line 111, in get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "/usr/local/lib/python2.7/dist-packages/django/views/decorators/http.py", line 41, in inner
return func(request, *args, **kwargs)
File "./research_portal/views.py", line 623, in shared
blob = gcloud_utils.get_object(shared_id, bucket=storage.get_bucket(settings.SHARED_BUCKET))
File "./research_portal/gcloud_utils.py", line 31, in get_object
return bucket.get_blob(name)
File "/usr/local/lib/python2.7/dist-packages/gcloud/storage/bucket.py", line 190, in get_blob
path=blob.path)
File "/usr/local/lib/python2.7/dist-packages/gcloud/storage/connection.py", line 229, in api_request
method=method, url=url, data=data, content_type=content_type)
File "/usr/local/lib/python2.7/dist-packages/gcloud/storage/connection.py", line 141, in _make_request
return self._do_request(method, url, headers, data)
File "/usr/local/lib/python2.7/dist-packages/gcloud/storage/connection.py", line 166, in _do_request
body=data)
File "/usr/local/lib/python2.7/dist-packages/oauth2client/client.py", line 589, in new_request
redirections, connection_type)
File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1609, in request
(response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1351, in _request
(response, content) = self._conn_request(conn, request_uri, method, body, headers)
File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1307, in _conn_request
response = conn.getresponse()
File "/usr/lib/python2.7/httplib.py", line 1127, in getresponse
response.begin()
File "/usr/lib/python2.7/httplib.py", line 453, in begin
version, status, reason = self._read_status()
File "/usr/lib/python2.7/httplib.py", line 409, in _read_status
line = self.fp.readline(_MAXLINE + 1)
File "/usr/lib/python2.7/socket.py", line 480, in readline
data = self._sock.recv(self._rbufsize)
File "/usr/lib/python2.7/ssl.py", line 734, in recv
return self.read(buflen)
File "/usr/lib/python2.7/ssl.py", line 621, in read
v = self._sslobj.read(len or 1024)
SSLError: [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1752)
And here is the code:
def get_object(name, bucket=None):
retries = 0
while True:
try:
if not bucket:
bucket = storage.get_bucket(settings.UPLOADS_BUCKET)
return bucket.get_blob(name)
except InternalServerError as e: # Retry when Google has an error
logger.exception("Got a Google Storage error trying to upload a file.")
if retries < 3:
sleep = random.random() * 2
time.sleep(sleep)
retries += 1
continue
raise e
As you can see, I've tried to sidestep the errors using retires.
Just incase it is worth while, here is the Dockerfile I use to build the environment this runs from.
FROM ubuntu:15.04
MAINTAINER Ryan Stuart <ryan@kapiche.com>
# Install packages
RUN apt-get update && apt-get install -y \
build-essential \
git \
libffi-dev \
libjpeg-dev \
libsqlite3-dev \
libssl-dev \
libyaml-dev \
libxml2-dev \
libxslt1-dev \
python \
python-dev \
python-setuptools \
sqlite3 \
zlib1g-dev
# Setup ssh acces to private repos
RUN mkdir /root/.ssh/
ADD id_rsa /root/.ssh/id_rsa
RUN chmod 500 /root/.ssh/id_rsa && echo " IdentityFile ~/.ssh/id_rsa" >> /etc/ssh/ssh_config
RUN touch /root/.ssh/known_hosts && ssh-keyscan github.com >> /root/.ssh/known_hosts
# Setup the application
RUN easy_install -U pip
RUN pip install apsw==3.8.7.3.post1
RUN mkdir /research_portal
ADD . /research_portal
WORKDIR /research_portal
RUN pip install -r requirements.txt
RUN python -m nltk.downloader punkt
ENV PYTHONUNBUFFERED=1 \
STATIC_URL='https://storage.googleapis.com/static.kapiche.com/research/b0fa/'
And the requirements.txt:
caterpillar>=1.0.0.dev7
-e git+ssh://git@github.com/Kapiche/caterpillar-influence.git@master#egg=caterpillar-influence
-e git+ssh://git@github.com/Kapiche/caterpillar-topics.git@master#egg=caterpillar-topics
celery~=3.1.17
Django~=1.7.9
django-debug-toolbar~=1.3.2
django-gravatar2
gcloud~=0.4.2
gcloudoem>=0.1.0rc10
goose-extractor~=1.0.25
markdown~=2.6.2
opbeat
pdfminer==20140328
pycrypto~=2.6.1
pyopenssl~=0.15.1
python-social-auth~=0.2.7
requests~=2.6.0
stripe~=1.22.2
wsgiref==0.1.2
xlrd==0.9.3
uwsgi
It's worrisome. I'm curious if the error is
ssl
module from the stdlibhttplib2
Maybe we should call in @jcgregorio to see if he has encountered this anywhere else with httplib2
I'm curious if turning off stdin buffering (PYTHONUNBUFFERED=1
) makes a difference. I highly doubt it, but it's interesting to see.
Have you tried catching the ssl.SSLError
exception and entering a debugger (I prefer ipdb
over pdb
but either will work).
No, but I'll give that a try. It happens both inside and outside of docker (on the production server which uses docker and on my local laptop which doesn't), so I doubt it is docker related. My local machine is OSX and the server is created using docker-machine.
Very good to know! Local laptop is running which OS?
OSX
Nice. And the docker instance is running on OS X or on a linux VM somewhere?
Linux VM on Google Compute created using docker-machine.
Good. This (almost) certainly means the issue is in the libraries.
I've seen issues related to SSL with httplib2, but they are almost always related to httplib2 using its own bundled certificate store. I can't find anything related to these issues.
"these issues" meaning the "bundled certificate store" or the "SSL: WRONG_VERSION_NUMBER"
The two exceptions I get which are "SSL: WRONG_VERSION_NUMBER" and "SSL: DECRYPTION_FAILED_OR_BAD_RECORD_MAC"
Did you ever find a stracktrace for DECRYPTION_FAILED_OR_BAD_RECORD_MAC
?
Some places where the error occurs (they don't really illuminate): https://github.com/python/cpython/blob/2.7/Lib/ssl.py#L734 https://github.com/python/cpython/blob/2.7/Lib/ssl.py#L621 https://github.com/python/cpython/blob/2.7/Modules/_ssl.c#L1752 https://hg.python.org/cpython/file/v2.7.10/Lib/ssl.py#l621 https://hg.python.org/cpython/file/v2.7.10/Modules/_ssl.c
What versions of oauth2client
, httplib2
and gcloud-python
do you have? From what I can tell, you have version 0.4.3
of our library, which is very old.
Alternate theory: this error occurs because you are finding a Google server which doesn't speak the right SSL protocol.
From an openssl mailing list:
Versions in client/server SSL records do not agree. Probably your client sends SSL2 ... and server is configured only for SSL3/TLS1. In this situation server does not accept SSL2 ... is being manifested by "wrong version number" error. To resolve this error you may disable SSL2 on client or enable SSL2 handshake on server. tcpdump output from wrong session handshake may be helpful too.
UPDATE: Some sections of _ssl.c
may be relevant here.
UPDATE 2: It looks like other people have experienced this error using a Google API. (SSLv3 is deprecated).
You could use Wireshark to capture the traffic and make a Capture Filter host www.googleapis.com
and then see if SSL3 is used. I did it for some light traffic on my Ubuntu 14.04 laptop and only saw TLSv1.2.
It seems that Python ssl.py
has disabled SSLv3:
# SSLv3 has problematic security and is only required for really old
# clients such as IE6 on Windows XP
context.options |= OP_NO_SSLv3
Double-check that file on your systems to make sure SSLv3 isn't the problem?
I haven't had another stacktrace yet for DECRYPTION_FAILED_OR_BAD_RECORD_MAC
(all the recent ones have been WRONG_VERSION_NUMBER
) but I know they come from _ssl.c (1752).
The post from the mailing list is interesting but the only people that could elaborate on that would be Google Engineers I assume? Do all your servers speak SSL2?
I will check SSLv3 support.
After POODLE, SSLv1, SSLv2 and SSLv3 are deprecated (SSLv1 isn't even implemented). SSLv23 is actually the default in ssl.py
. TLSv1, TLSv1_1 and TLSv1_2 are the other supported protocol variants. The line _ssl.c (1752)
is just parroting back an error from the system SSL libraries so it doesn't tell us much.
I checked on my machine with Python 2.7.6 (default on Ubuntu 14.04) and SSLv3 is still enabled.
@rstuart85 Were you able to check your python -V
and the contents of ssl.py
? Also, were you able to reproduce the error with a debugger (pdb
or ipdb
)?
I may also try to reproduce on my machine and monitor with Wireshark to find out what was happening.
@krisrogers is getting this problem regularly from his development machine so he is going to debug it and report back here.
Great thanks. I've been running an infinite loop with just getting an object from a bucket and having no error (not even a 500) after
Iteration: 21463, 3368.84s
UPDATE: I killed the script after no failures in
Iteration: 34801, 5414.76s
This was on my local machine and it was connected to the web via my apartment WiFi.
I'm pretty sure it is a client side issue because it happens a lot more in development rather than production. Either way, @krisrogers should be able to give some more info shortly.
Good to know. Thanks.
We can now no longer reproduce this on our development environment. But this is a screenshot from our monitoring tool. All bar 1 error is related to Datastore/Storage. Not quite sure what to do next.
Here are some of the other types of errors.
Could this be a threading issue? When I used a connection across threads I'd get random SSL errors.
On Wed, Dec 9, 2015, 6:06 PM Ryan Stuart notifications@github.com wrote:
Here are some of the other types of errors. [image: screen shot 2015-12-10 at 11 54 26 am] https://cloud.githubusercontent.com/assets/2389518/11704767/72f2dcf6-9f36-11e5-8e74-53a628b56fd5.png
— Reply to this email directly or view it on GitHub https://github.com/GoogleCloudPlatform/gcloud-python/issues/1214#issuecomment-163463428 .
Ooohh goody I love a new lead! @jonparrott you mean a gcloud-python
Connection
?
@rstuart85 is that the case?
Yes, although I can't say for sure if it was the same kind of errors. But if the app this is occurring in is multithreaded (or is hosted by a multithreaded wsgi server) and shares a client/connection that can certainly cause inexplicable errors.
(I'm currently at a team offsite, so I can't really test it out)
On Wed, Dec 9, 2015, 9:18 PM Danny Hermes notifications@github.com wrote:
Ooohh goody I love a new lead! @jonparrott https://github.com/jonparrott you mean a gcloud-python Connection?
@rstuart85 https://github.com/rstuart85 is that the case?
— Reply to this email directly or view it on GitHub https://github.com/GoogleCloudPlatform/gcloud-python/issues/1214#issuecomment-163489567 .
Yes, it looks to be threading related. We can now reproduce reliably using our application from multiple browser tabs. @krisrogers is gathering information and posting it here shortly.
Also, we host the application using uwsgi in multithreaded mode. So @jonparrott is on the money there. Glad we are getting somewhere with this!
Wow. Total shot in the dark there. You could try installing from #1274 and seeing if that fixes the issue? (though that PR is not reviewed or approved yet).
I've re-implemented my test script that shows the issue. The issue appears every time for me.
Using the basis of the solution used by @jonparrott worked great for gcloudoem.
Awesome, @rstuart85. Glad to know that we both fixed your issue and confirmed that the patch works.
I love that we Jon found the cause. Since I found a ton of other bugs on the web that experienced this with Google APIs, I still want to create a very simple repro-script (likely won't use gcloud-python
or at least won't use much of it).
I am playing around and not able to force the error with 15 threads using the same httplib2.Http
object.
@rstuart85 Can you give a sense of how many threads were active?
(FWIW, I am using a different origin for all 15 threads and that may be mitigating the issue. I am going to try to make a datastore example as suggested by @rstuart85 and then extract the pieces that are not tied to gcloud-python
or oauth2client
)
@dhermes as I mentioned, it happens every time I run the revised version of my script linked to above. All you need is two active HTTP connections from the same origin talking to the same service at the same time. Hence why I used sleeps in the code. My code used 4 threads.
Thanks. Sorry for losing the thread of the conversation.
Yay! I got my first repro!
Exception in thread Thread-2:
...
SSLError: [Errno 1] _ssl.c:1429: error:1408F10B:SSL routines:SSL3_GET_RECORD:wrong version number
and
Exception in thread Thread-1:
...
SSLError: [Errno 1] _ssl.c:1429: error:1408F081:SSL routines:SSL3_GET_RECORD:block cipher pad is wrong
I'm working on throwing away as much of the extra (e.g. all of gcloud-python
) as possible so that we can dive deep on the failure (or more likely, someone from httplib2
can).
Sounds like a plan. I'm pretty gob smacked that more people aren't reporting this problem. I couldn't find much about it out there and I've personally being trying to convince people for a few months that it is an issue.
The lack of thread-safety in httplib2
is a well-known thing:
https://developers.google.com/api-client-library/python/guide/thread_safety
Though we don't really publicly advertise that we use httplib2
underneath.
I have a "broken down" isolated example with httplib2
only: https://gist.github.com/dhermes/aa7b6186ede2a9d0b471
I'm sure someone could use it to find out how the bits are going to the wrong thread and maybe even fix it.
@rstuart85 RE: "more people reporting" it seems there is chatter:
It looks like other people have experienced this error using a Google API. (SSLv3 is deprecated).
This is from a StackOverflow post. I've done some debugging from the Datastore side but I don't think these requests ever make it into the Datastore part of the stack. I'd appreciate if someone look at this from the gcloud-python team. Note that this issue was originally reported via gcd-discuss@google.com In July.
I have a Python Django application running on a Google Compute instance. It is using gcloudoem to interface from Django to Google Datastore. gcloudoem uses the same underlying code to communicate with Datastore as gcloud-python 0.5.x
At what seems to be completely random times, I will get SSL errors happening when trying to talk to Datastore. There is no pattern in where in my application code these happen. It's just during a random call to Datastore. Here are the two flavours of errors:
Unfortunately, for the second, I don't have a full stacktrace handy:
These errors don't happen when I am using the GCD tool. Does anyone have any idea what is happening here? Is this some sort of networking problem?