ImageKit creates cache entry for thumb even if fatal error occured and thumb was not actually created

matthewwithanm / django-imagekit

Automated image processing for Django. Currently v4.0

http://django-imagekit.rtfd.org/

BSD 3-Clause "New" or "Revised" License

2.27k stars 275 forks source link

ImageKit creates cache entry for thumb even if fatal error occured and thumb was not actually created #296

Open a1tus opened 10 years ago

a1tus commented 10 years ago

I've had several types of errors. For example, MemoryError with really large photos or simply bad image files that PIL could not even open. But in both cases cache entries were created like nothing happned while according thumb files were not. And thus sequential attempts to use this thumb caused FileNotFound errors (for example when accessing thumb width in template).

That's quite annoying because even if initial error was a single accident, every page that tries to show this particular thumb will raise 500 error until you clear your cache from the stale entry.

If it matters the only related setting that I have is IMAGEKIT_SPEC_CACHEFILE_NAMER = 'imagekit.cachefiles.namers.source_name_dot_hash'.

matthewwithanm commented 10 years ago

Ah yeah. We should definitely fix that. Any chance of getting a failing test that illustrates the issue?

a1tus commented 10 years ago

Sure I can make it but can you refer me to the proper similar test or may be give some advice about what should we get in the end.

matthewwithanm commented 10 years ago

I don't know if there's really a similar test, but off the top of my head, it would work something like this:

Create an ImageCacheFile
Mock its _generate method to throw a MemoryError
Mock its backend's set_state
Call generate() on the ImageCacheFile
Assert that set_state was last called with the state CacheFileState.DOES_NOT_EXIST

We might be able to focus it more so that it's only going through the parts we want to test, but this seems fine. Then we'll have a nice failing test to work against!

a1tus commented 10 years ago

Sorry for delay. Concerning the test - shall I make in my fork and send via pull request or maybe just copy-paste it here?

I've investigated this issue a little more. I'll just describe what happens now for the purpose of clarity if you don't mind.

At the moment we have the following algorithm:

get current thumb state from cache
if cache entry found go to step 4
if NOT found - check thumb existence and set state to CacheFileState.DOES_NOT_EXIST or EXISTS
generate thumb if state is CacheFileState.DOES_NOT_EXIST or do nothing if not 4.1 set state to CacheFileState.GENERATING 4.2 actually generate thumb 4.3 set state to CacheFileState.EXISTS

So the initial problem that I reported is that if we get some error in generating (step 4.2) cache entry with CacheFileState.GENERATING state remains there forever and thus does not give any chance for our thumb to be created. Also it's worth noting that in this situation we can access thumb url (which anyway is 404 but at least page can be loaded) but trying to access width/height triggers 500 error with FileNotFoundError.

On the site that I'm now developing I've encountered another problem (or may be the same). I have a page with lots of quite big photos that are converted to small thumbs so that takes quite a while. And it's now very easy to break the page simply by starting loading it (say, click on a link to this page) and then after a few seconds clicking on the link again (or several times). Absolutely the same result: cache entry GENERATING exits so we don't have any synchronous thumb generation and page (code) just continues to execute until it meets .width which is trying to access unexisting file and causes 500 error. The only difference in that case is that this error will dissappear after the initial page will end generating thumbs... but it's not good anyway. Without using width/height (which is not acceptable in some cases) we won't see 500 errors but we will see broken images.

So I think some design decision is needed (for JustInTime caching strategy).

matthewwithanm commented 10 years ago

Thanks for the detailed writeup! A PR with the test would be awesome.

The first problem (the original one for this issue) should be pretty simple to fix. Basically, step 4.2 just gets wrapped in a try/except and we set the state to DNE in the case of an error. A more robust handling would store the start time in the cache so that it could be expired if the process crashes.

As for the rest…

Also it's worth noting that in this situation we can access thumb url (which anyway is 404 but at least page can be loaded) but trying to access width/height triggers 500 error with FileNotFoundError.

The error is actually expected behavior—not the 404. Like with regular ImageFields, if you try to get the URL of an image that doesn't exist, you should get an error. We should add a test to make sure that this is the case when the state is GENERATING.

If you want to avoid this, you should be checking the truthiness of the file before accessing its URL. There are a few open feature requests to change this behavior or add a "safe" variant—and we may end up doing that—but honestly I'm a little confused by it as it seems that 1) everybody would be used to the behavior from ImageFields and 2) showing a broken URL (like an empty string) does seem like an error.

a1tus commented 10 years ago

What do you think, should we manually list all available exceptions in that except block? Definately this list includes MemoryError and IOError (cannot identify image file). Maybe smth else? Concerning generation time handling. If we go this way maybe we need another option? Like GENERATION_TIMEOUT with some sensible deafult value (e.g. 30 secs). And store unix timestamp for example to keep things simple.

In general I agree that error is better (because we can at least track it). But we have some sort of inconsistency here bacause we can access url but but we can't access other attrs. So we would get broken images on our page if we don't use width/height and 500 otherwise. But in many cases image generation will end until another page loads so I'm not sure we shoud change smth here.

The problem is more in that GENERATING phase. I mean its purpose is to prevent unnecessary parallel thumb creation but in fact we pay for it by introducing errors in edge cases that we can't handle. In my opinion there's nothing bad in processing some thumb twice. To compare: we sometimes consume a little bit more resourses and user waits a little more time OR in the same edge case we definately get page with broken images or even 500 error. Maybe we need some safe strategy that does not use GENERATING phase at all? Also in docs we can describe all this stuff.

matthewwithanm commented 10 years ago

What do you think, should we manually list all available exceptions in that except block? Definately this list includes MemoryError and IOError (cannot identify image file). Maybe smth else?

I don't think so. If any kind of error happens, we need to assume the image wasn't generated. We actually want a try…finally, not a try…except.

Concerning generation time handling. If we go this way maybe we need another option? Like GENERATION_TIMEOUT with some sensible deafult value (e.g. 30 secs). And store unix timestamp for example to keep things simple.

:thumbsup: Yeah, that would be good.

In general I agree that error is better (because we can at least track it). But we have some sort of inconsistency here bacause we can access url but but we can't access other attrs.

I agree that the inconsistency should be resolved, but the way I think we should resolve it is to always throw an error if the file doesn't exist. I've added some (currently failing) tests for this in the dne-errors branch.

One thing I'm not quite sure about yet is how this fits in with async backends and the optimistic caching strategy…the optimistic strategy says "don't bother checking; I know they're there" but async backends means there's definitely a time when they're not…

The problem is more in that GENERATING phase. I mean its purpose is to prevent unnecessary parallel thumb creation but in fact we pay for it by introducing errors in edge cases that we can't handle. In my opinion there's nothing bad in processing some thumb twice.

I think this will be fixed with the changes that I've described: you will always get an error if you try to access url or width for an image in the GENERATING phase (which, after all, is really just a special case of not existing). To make sure that an image exists, you'd just check the truthiness of it:

{% if person.avatar_thumbnail %}
    <img src="{{ person.avatar_thumbnail.url }}" width="{{ person.avatar_thumbnail.width }}" />
{% else %}
    <img src="{% static 'default_avatar.jpg' %}" width="50" />
{% endif %}

a1tus commented 10 years ago

I don't think so. If any kind of error happens, we need to assume the image wasn't generated. We actually want a try…finally, not a try…except.

Yeah, sure. I thought we could hide an exception but we will pass it through actually... never mind :)

I think we should resolve it is to always throw an error

Agree

I think this will be fixed with the changes that I've described

Agree but anyway I see some benefits of having another strategy that i described. Consider: if we have some image model with required ImageField field then in general we can rely on it in our html/design etc. And we do it - we don't check for existence etc. So the same logic can be applied to its thumbs - if source is 100% there so thumb is 100% can be accessed (synchronously). I mean some apps need to rely on that. E.g., if I use my thumb only for applying watermark and I need to show it (with no matter how long will it take to create thumb). So if 2 processes really will start generating the same thumb... well, it won't hurt but will give us the ensurance that on that page this particular thumb will be shown.

matthewwithanm commented 10 years ago

Maybe we can spin that strategy off into another issue since this one already has two things. (Catching errors during generation and raising errors when image doesn't exist.)