Closed vslavik closed 8 years ago
We have a soft fix for this issue on master. I tried a couple of ways to reproduce this, but unsuccessfully. What exception does your application raise that triggers this issue?
Sorry, didn't consider the possibility this could be affected by exceptions handling configuration (I should have). Here are the relevant bits:
from werkzeug.exceptions import default_exceptions
…
app = Flask(__name__, instance_relative_config=True)
…
class InvalidUsage(Exception):
def __init__(self, message, status=400):
self.status = status
self.message = message
def __str__(self):
return '%d %s' % (self.status, self.message)
@app.errorhandler(InvalidUsage)
def api_error_handler(error):
response = jsonify(status=error.status, message=error.message)
response.status_code = error.status
return response
def json_error_handler(error):
if isinstance(error, HTTPException):
message = error.description
status = error.code
else:
message = str(error)
status = 500
response = jsonify(status=status, message=message)
response.status_code = status
return response
for code in default_exceptions.keys():
app.register_error_handler(code, json_error_handler)
@app.route('/test')
def test():
raise InvalidUsage('Whatever', status=403)
Thanks for the complete test app. I can't reproduce the issue with Python 3.5 and Flask 0.10.1 on OSX.
I have a theory. I didn't mention (which I should have, but I didn't realize it could matter until I actually looked at the instrumentor code just now) that I use uWSGI+gevent. In my defense, you do claim to support gunicorn, so assuming you are greenlet-compatible wasn't that far-fetched…
Anyway: you use a context threading.local
object to store the tags. My hunch is that the .pth
file may be run before uWSGI performs gevent monkey-patching. If true, the code would be using non-patched threading.local
which would certainly explain the behavior. It would also make for another case-in-point for #35.
Let me try what happens if I bootstrap the instrumentor explicitly as discussed in #35.
Let me try what happens if I bootstrap the instrumentor explicitly as discussed in #35.
No such luck, sadly. While this was setup wrong (gvent monkey patching must be done before the .pth
file is evaluated, which is impossible), I'm still getting this error with properly setup system with threading.local
replaced. I'll try to make a reproducible test case.
Here is a full, 100% reproducible, test case that doesn't involve gevent and works with Flask's builtin server — just run this script and go to http://127.0.0.1:5000/testme:
from librato_python_web.instrumentor.bootstrap import init as librato_init
librato_init()
from flask import Flask
from time import sleep
import requests
from requests.exceptions import RequestException, HTTPError
from werkzeug.exceptions import ServiceUnavailable
app = Flask(__name__)
@app.route('/bogus')
def get_test_met_bogs():
sleep(2.0)
return 'ok'
@app.route('/testme')
def get_test_met():
try:
r = requests.get("http://127.0.0.1:5000/bogus", timeout=0.5)
r.raise_for_status()
except HTTPError as e:
raise ServiceUnavailable() from e
except RequestException as e:
raise ServiceUnavailable() from e
return 'ok'
if __name__ == '__main__':
app.run(debug=True)
The key is the interaction between flask and requests instrumentors. More specifically, the latter uses context.add_all_tags()
which is, to use a technical term, buggy as hell and screws things up at this place — instead of calling pop_tags()
in a finally
block, as would be the sensible thing to do, it completely clears the context in case of an exception.
I'm not submitting a PR for this fix, because the same cringe-inducting pattern is repeated in other places in the file as well (including get_tag()
, which not only does this, but eats the exceptions as well!!!) and fixing it will be a bit more involved (IMNSHO both the fail()
and succeed()
functions in this file are fundamentally flawed and must be eradicated).
Also, well, as a customer I had enough of doing free coding for you ;) I'd be happy to review your fix if you want, though.
Thanks, I can reproduce this now! Definitely appreciate the full test program.
Master now has some commits that ought to help. If you get a chance, let us know how that works out.
Thank you!
Fix to address the context stack issue above.
Using the nuclear option I see…
I am not encountering any issues with Gunicorn+Gevent
Well, you didn't before either, even though they were trivially observable (repr(context._globals.context)
would show threading.local
type, not gevent.local
). You're unlikely to hit them without sufficient load to encounter concurrency.
As a precaution, threading is now loaded lazily, upon first access, as opposed to during module loading
This is an unwise kludge. You're working around one specific demonstration of the underlying problem, without addressing (or as much as acknowledging) the actual underlying problem at all. Please reconsider. By doing this, you only hide symptoms of the problem. For this approach to work, you'd have to
That simply won't happen. And gevent patches a lot. You'll never be able to be 100% sure the library is gevent compatible this way. But you can, with very little effort, fix it to be compatible with no maintenance burden in the future:
The established, simplest, most robust and most reliable way of using it is to simply monkey patch as soon as possible, before importing your app's code, doing it as early as possible. The only problem here, and trivially fixable, is that you abuse a Python feature to do a non-standard hack that it wasn't meant for. If you didn't install the magically-importing .pth
file, there would be no gevent compatibility issue whatsoever.
Vaclav, I want to make sure I understand how the .pth file is impacting your situation. You're instrumenting your app from the code, correct? Wouldn't the .pth file be a no-op since LIBRATO_INSTRUMENT_PYTHON isn't set? Your suggestions are appreciated and we'll look at a way to auto-instrument without the .pth file magic. But I wan't to make sure I understand.
(Sorry, I missed the notification about your question in an inbox flood…) I don't use the .pth now, but I did at first and the consequences for gevent weren't immediately obvious.
Wouldn't the .pth file be a no-op since LIBRATO_INSTRUMENT_PYTHON isn't set?
In its current for, yes — almost. Some code is still being imported, including the os
module (which gevent patches) too, but it is or should be — again, in the current state of the code — harmless.
Using Python 3.4, Flask 0.10.1, in what appears to be error path when the application code throws an exception to indicate 403: