miracle2k / django-assets

Django webassets integration.
BSD 2-Clause "Simplified" License
89 stars 79 forks source link

Thread safety #12

Closed max-arnold closed 10 years ago

max-arnold commented 11 years ago

Django-assets is not thread safe. I've spent two days debugging this issue https://github.com/miracle2k/webassets/issues/217 and it looks like cause is somewhere within get_env() function. With the following print statement added to django_assets/env.py I was not able to reproduce the problem anymore:

        try:
            imp.find_module('assets', app_path)
            print "ASSETS FOUND:", app_path
        except ImportError:
            #if options.get('verbosity') > 1:
            #    print "no assets module"
            continue

I use mod_wsgi daemon mode with multiple processes and threads and temporarily solved this issue by using threads=1. I think it needs to be fixed and until then there should be big red warning in the docs about thread safety. Now it exists only in source code (TODO):

def autoload():
    """Find assets by looking for an ``assets`` module within each
    installed application, similar to how, e.g., the admin autodiscover
    process works. This is were this code has been adapted from, too.

    Only runs once.

    TOOD: Not thread-safe!
    TODO: Bring back to status output via callbacks?
    """
glensc commented 11 years ago

also "TOOD: Not thread-safe!" should be probably written as "TODO"

miracle2k commented 11 years ago

So the only way to fix this, that I can see, is to use a lock, to make sure the environment is initialized properly before any thread can go ahead and use it. Any other ideas?

max-arnold commented 11 years ago

Is it truly needs to be global? Maybe it is possible to use thread local data to store environment instead of global variable?

max-arnold commented 11 years ago

Yesterday there was a message on django-developers related to similar issues: https://groups.google.com/forum/?fromgroups#!topic/django-developers/07LHl-JAejU

Also it is worth looking to how django.contrib.admin does module discovery and initialization.

iafan commented 10 years ago

Hi folks, is there any update on the issue? I think this bug provides a way to hang the server quite easily, as I the same scenario twice: someone tries to attack a Django-based server that uses django-assets, by appending some known backdoor URLs to sniff potential vulnerabilities. All such URLs look like a garbage to the server, so it spits 404 errors, and we start to get thousands of "BundleError: 'xxxxx' doesn not exist" errors, after which the server stops responding until httpd is restarted. The server is able to cope with the regular (non-mailicious) load just fine, so my suspect is this specific module being not thread-safe and causing such side effects. Would be great to have this fixed so that this module could be safely used in production.

max-arnold commented 10 years ago

@iafan Igor, this issue should not hang the server - after all Django can easily handle exceptions and will just throw an 500 error. If you use mod_wsgi, you can disable threads and use processes instead as a workaround:

    WSGIDaemonProcess django user=www-data group=www-data processes=4 threads=1 python-path=/var/www/virtualenvs/example.com/lib/python2.7/site-packages home=/var/www/example.com display-name=django
    WSGIProcessGroup django
    WSGIScriptAlias "/" "/var/www/example.com/src/project/wsgi.py"
    <Directory "/var/www/example.com/src/project">
        <Files wsgi.py>
            Order allow,deny
            Allow from all
        </Files>
    </Directory>

If you get 404s, that is probably another case not directly related to this issue. Also I should note that we disabled on-the-fly assets compilation on production servers and do this offline before deployment. Something along these lines:

ASSETS_DEBUG = False
ASSETS_AUTO_BUILD = False
ASSETS_CACHE = False
ASSETS_ROOT = os.path.join(PROJECT_PATH, 'webassets')
ASSETS_MANIFEST = 'json:' + os.path.join(PROJECT_PATH, 'webassets/webassets_manifest.json')

We do ./manage.py assets build locally and then deploy webassets directory together with the source code.

If you need any assistance, feel free to contact me directly via arnold.maxim (at) gmail.com and I'll try to help you if I get some free time. I speak russian :)

jlovison commented 10 years ago

@miracle2k I don't think you need to use a global lock.

What seems to be happening is the following:

  1. The templatetag appends the bundle name to self.files
  2. The render method gets called on the node
  3. The resolve method gets called within the node (and is passed the context)
  4. The resolve_bundle function gets called within the resolve method
  5. If the resolve_bundle function fails because the environment isn't set up, it assumes it was a file and not a bundle
  6. When the render method subsequently tries to get the url for the bundle treated as a file, it fails spectacularly

Why not just add a template tag option to specify it really is a bundle? This would be backwards compatible with the thread unsafe version, as it's an additional option, but if added, when the KeyError exception fires, instead of assuming it's a file, just wait a bit to avoid the race condition, and then check again.

These race conditions seem to be happening very infrequently, so it's likely far better to just set a small wait for those rarer occurrences than to have every single request wait for lock releases.

miracle2k commented 10 years ago

It seems to me that admin.autodiscover() works because the admin site is a global object. That is, the user must call admin.autodiscover() during the module import sequence. When the first incoming request causes urls.py to be loaded to setup the routes, the admin.autodiscover() runs. A second request/thread will wait for that process the complete, because it also needs urls.py to be imported. I.e. the thread-safety of Pythons import mechanism prevents bad things from happening.

In django-assets, the environment is not global, but is created on demand in get_env(), when the modules involved are already fully imported. So with the import-mechanism not being there to do the the locking, get_env() would need to do itself.

As an aside, Django git has moved the admin autodiscovery process into a generic module - we could possibly use it in the future: https://github.com/django/django/commit/6feb75129f42fdac751d0b79a2a005b4c0ad5c2d

@kromem I don't think you should be concerned about the lock. It's only the initial assets.py auto loading process that needs to be protected. There won't be any locks involved once that is done.

miracle2k commented 10 years ago

I believe this should do it.

julen commented 10 years ago

Thanks @miracle2k ! It would be great if you're able to release a new version on pypi with the fix :sunglasses: