tornadoweb / tornado

Tornado is a Python web framework and asynchronous networking library, originally developed at FriendFeed.
http://www.tornadoweb.org/
Apache License 2.0
21.69k stars 5.5k forks source link

Inconsistent behaviour of static file serving #892

Open sonicisthebest opened 11 years ago

sonicisthebest commented 11 years ago

If Tornado (3.1) is not running in debug mode, when a static file is accessed for the first time the included StaticFileHandler will generate an MD5 hash for it and store that hash in perpetuity. The handler will then read the content of the file and serve it up with the corresponding version and ETag. The next time the same client requests the file we get a 304 response and the client uses its own copy from its cache.

Now let's modify the file and change something interesting in the static file. If the original client now requests the file again, we still get the cached copy from the local browser since the ETag is linked to the cached hash in Tornado.

Consider now what happens if another distinct client that has not accessed the file before tries to fetch the same static file. Tornado will happily serve up the file, but it reads it off the disk again, and that client gets the new version of the file, but since Tornado already has a cached hash for that file it serves the new file with the old ETag. So now we have two different versions of the same file with the same ETag being shown on two different clients.

Some ideas:

sonicisthebest commented 11 years ago

I notice that disabling ETags with a custom static file handler already handles if modified since cleanly.

bdarnell commented 11 years ago

This is a good point. For completeness, there's a fourth option: document this behavior and recommend deployment strategies that do not involve changing files out from under a live process (which can cause problems for templates or python modules, not just static files). However, since this is an easy mistake to make and it has ramifications for external caches, so it's probably worth statting the file before returning a cached etag.

There is a related issue with the version tag included by static_url: if a static_url is generated by one process but the resulting file is served by a second process with a different version of the data, we'll serve the wrong data but still mark it as cacheable forever. The static_url version tag should be checked against the expected version and if we discover that we're being asked for an inconsistent version we should give it a very short expiration.

rakslice commented 9 years ago

Looks like it already fetches the modified date of the file every time. It shouldn't be too difficult to add the modified time to the cache entry so that we ignore old entries...