librato / librato-python-web

Report key app statistics to the Librato Metrics service
Other
4 stars 4 forks source link

IndexError: pop from empty list in Flask apps #34

Closed vslavik closed 8 years ago

vslavik commented 8 years ago

Using Python 3.4, Flask 0.10.1, in what appears to be error path when the application code throws an exception to indicate 403:

IndexError: pop from empty list
  File "flask/app.py", line 1817, in wsgi_app
    response = self.full_dispatch_request()
  File "flask/app.py", line 1477, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "flask/app.py", line 1381, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "flask/_compat.py", line 33, in reraise
    raise value
  File "flask/app.py", line 1475, in full_dispatch_request
    rv = self.dispatch_request()
  File "librato_python_web/instrumentor/instrument.py", line 338, in wrapper
    return wrapper_function(f)(*args, **keywords)
  File "librato_python_web/instrumentor/web/flask_.py", line 88, in decorator
    context.pop_tag()
  File "librato_python_web/instrumentor/context.py", line 224, in pop_tag
    o = _get_stack().pop()
vaidyg commented 8 years ago

We have a soft fix for this issue on master. I tried a couple of ways to reproduce this, but unsuccessfully. What exception does your application raise that triggers this issue?

vslavik commented 8 years ago

Sorry, didn't consider the possibility this could be affected by exceptions handling configuration (I should have). Here are the relevant bits:

from werkzeug.exceptions import default_exceptions
…
app = Flask(__name__, instance_relative_config=True)
…
class InvalidUsage(Exception):
    def __init__(self, message, status=400):
        self.status = status
        self.message = message
    def __str__(self):
        return '%d %s' % (self.status, self.message)

@app.errorhandler(InvalidUsage)
def api_error_handler(error):
    response = jsonify(status=error.status, message=error.message)
    response.status_code = error.status
    return response

def json_error_handler(error):
    if isinstance(error, HTTPException):
        message = error.description
        status = error.code
    else:
        message = str(error)
        status = 500
    response = jsonify(status=status, message=message)
    response.status_code = status
    return response

for code in default_exceptions.keys():
    app.register_error_handler(code, json_error_handler)

@app.route('/test')
def test():
    raise InvalidUsage('Whatever', status=403)
vaidyg commented 8 years ago

Thanks for the complete test app. I can't reproduce the issue with Python 3.5 and Flask 0.10.1 on OSX.

vslavik commented 8 years ago

I have a theory. I didn't mention (which I should have, but I didn't realize it could matter until I actually looked at the instrumentor code just now) that I use uWSGI+gevent. In my defense, you do claim to support gunicorn, so assuming you are greenlet-compatible wasn't that far-fetched…

Anyway: you use a context threading.local object to store the tags. My hunch is that the .pth file may be run before uWSGI performs gevent monkey-patching. If true, the code would be using non-patched threading.local which would certainly explain the behavior. It would also make for another case-in-point for #35.

Let me try what happens if I bootstrap the instrumentor explicitly as discussed in #35.

vslavik commented 8 years ago

Let me try what happens if I bootstrap the instrumentor explicitly as discussed in #35.

No such luck, sadly. While this was setup wrong (gvent monkey patching must be done before the .pth file is evaluated, which is impossible), I'm still getting this error with properly setup system with threading.local replaced. I'll try to make a reproducible test case.

vslavik commented 8 years ago

Here is a full, 100% reproducible, test case that doesn't involve gevent and works with Flask's builtin server — just run this script and go to http://127.0.0.1:5000/testme:


from librato_python_web.instrumentor.bootstrap import init as librato_init
librato_init()

from flask import Flask
from time import sleep
import requests
from requests.exceptions import RequestException, HTTPError
from werkzeug.exceptions import ServiceUnavailable

app = Flask(__name__)

@app.route('/bogus')
def get_test_met_bogs():
    sleep(2.0)
    return 'ok'

@app.route('/testme')
def get_test_met():
    try:
        r = requests.get("http://127.0.0.1:5000/bogus", timeout=0.5)
        r.raise_for_status()
    except HTTPError as e:
        raise ServiceUnavailable() from e
    except RequestException as e:
        raise ServiceUnavailable() from e
    return 'ok'

if __name__ == '__main__':
    app.run(debug=True)

The key is the interaction between flask and requests instrumentors. More specifically, the latter uses context.add_all_tags() which is, to use a technical term, buggy as hell and screws things up at this place — instead of calling pop_tags() in a finally block, as would be the sensible thing to do, it completely clears the context in case of an exception.

I'm not submitting a PR for this fix, because the same cringe-inducting pattern is repeated in other places in the file as well (including get_tag(), which not only does this, but eats the exceptions as well!!!) and fixing it will be a bit more involved (IMNSHO both the fail() and succeed() functions in this file are fundamentally flawed and must be eradicated).

Also, well, as a customer I had enough of doing free coding for you ;) I'd be happy to review your fix if you want, though.

vaidyg commented 8 years ago

Thanks, I can reproduce this now! Definitely appreciate the full test program.

vaidyg commented 8 years ago

Master now has some commits that ought to help. If you get a chance, let us know how that works out.

Thank you!

vslavik commented 8 years ago

Fix to address the context stack issue above.

Using the nuclear option I see…

I am not encountering any issues with Gunicorn+Gevent

Well, you didn't before either, even though they were trivially observable (repr(context._globals.context) would show threading.local type, not gevent.local). You're unlikely to hit them without sufficient load to encounter concurrency.

As a precaution, threading is now loaded lazily, upon first access, as opposed to during module loading

This is an unwise kludge. You're working around one specific demonstration of the underlying problem, without addressing (or as much as acknowledging) the actual underlying problem at all. Please reconsider. By doing this, you only hide symptoms of the problem. For this approach to work, you'd have to

  1. identify all uses of anything potentially patched by gevent (you won't) and delay-import them all
  2. never accidentally regress and add new code that isn't delayed (you inevitably will; case in point: reintroductions of py3 incompatibilities)

That simply won't happen. And gevent patches a lot. You'll never be able to be 100% sure the library is gevent compatible this way. But you can, with very little effort, fix it to be compatible with no maintenance burden in the future:

The established, simplest, most robust and most reliable way of using it is to simply monkey patch as soon as possible, before importing your app's code, doing it as early as possible. The only problem here, and trivially fixable, is that you abuse a Python feature to do a non-standard hack that it wasn't meant for. If you didn't install the magically-importing .pth file, there would be no gevent compatibility issue whatsoever.

vaidyg commented 8 years ago

Vaclav, I want to make sure I understand how the .pth file is impacting your situation. You're instrumenting your app from the code, correct? Wouldn't the .pth file be a no-op since LIBRATO_INSTRUMENT_PYTHON isn't set? Your suggestions are appreciated and we'll look at a way to auto-instrument without the .pth file magic. But I wan't to make sure I understand.

vslavik commented 8 years ago

(Sorry, I missed the notification about your question in an inbox flood…) I don't use the .pth now, but I did at first and the consequences for gevent weren't immediately obvious.

Wouldn't the .pth file be a no-op since LIBRATO_INSTRUMENT_PYTHON isn't set?

In its current for, yes — almost. Some code is still being imported, including the os module (which gevent patches) too, but it is or should be — again, in the current state of the code — harmless.