matthewwithanm / django-imagekit

Automated image processing for Django. Currently v4.0
http://django-imagekit.rtfd.org/
BSD 3-Clause "New" or "Revised" License
2.26k stars 276 forks source link

Fix image cachefile serializtion #437

Closed ron8mcr closed 6 years ago

ron8mcr commented 6 years ago

Fix pickle serialization for ImageCacheFile

When Celery CachedFileBackend used with filesystem storage (django.core.files.storage.FileSystemStorage), everything works fine. But there are issues with storages.backends.s3boto3.S3Boto3Storage (and it's fix from #391), as well as with django_s3_storage.storage.S3Storage.

I didn't debug it to the real core reason, but seem like pickle serializer used for celery task can't serialize ImageCacheFile.storage. So passing instance of ImageCacheFile was replaced with passing its generator without storage attrs, which serializes without any errors

It seems like a hacky solution, but it works

Exception was:

Traceback (most recent call last):
  ...

  File "/src/django-imagekit/imagekit/cachefiles/__init__.py", line 131, in __bool__
    existence_required.send(sender=self, file=self)
  ...
  File "/libs/utils.py", line 380, in on_existence_required
    file.generate()
  File "/src/django-imagekit/imagekit/cachefiles/__init__.py", line 94, in generate
    self.cachefile_backend.generate(self, force)
  File "/src/django-imagekit/imagekit/cachefiles/backends.py", line 136, in generate
    self.schedule_generation(file, force=force)
  File "/src/django-imagekit/imagekit/cachefiles/backends.py", line 165, in schedule_generation
    _celery_task.delay(self, file.generator, force=force)
  ...
  File "/lib/python3.6/site-packages/kombu/serialization.py", line 221, in dumps
    payload = encoder(data)
  File "/lib/python3.6/site-packages/kombu/serialization.py", line 350, in pickle_dumps
    return dumper(obj, protocol=pickle_protocol)
kombu.exceptions.EncodeError: can't pickle _thread._local objects
vstoykov commented 6 years ago

From the traceback I see that the problem is that _thread._local can't be pickled. This is probably due to this https://github.com/jschneier/django-storages/blob/1.6.5/storages/backends/s3boto3.py#L240

In django_s3_storage is probably due to this https://github.com/etianen/django-s3-storage/blob/0.12.0/django_s3_storage/storage.py#L137-L140 because probably s3_connection (result from boto3.client) is not pickleable.

From one point of view every custom storage need to behave as close to Django builtin storage in order to be used interchangeably, but the reality is that only the basic operations are the same.

At second look in order to provide smaller pickle representations we need to tune __getstate__ metod of objects in ImageKit wich will be pickled, to include only attributes which are important and can't be recreated.

In this case a better approach can be changing ImageCacheFile.__getstate__ (https://github.com/matthewwithanm/django-imagekit/blob/4.0.1/imagekit/cachefiles/__init__.py#L140-L146) to not include the storage and backend if they are the same as these in the settings and can be recreated from there.

What you think about that?

ron8mcr commented 6 years ago

@vstoykov thanks, your solution is much better and this really explains the issue.

But I'm not sure how it should act in case of custom storage (i.e. passed to __init__)

vstoykov commented 6 years ago

Yep that will be a problem and probably separate issues need to be made for django-storages and django-s3-storage in order to pickleable.

Probably we also need to add in documentation that wen non default storage is used (storage passed to the field different than configured one) it's required the storage to be pickleable if celery or rq backend are used (which I hope that is not so common scenario).

Can you try to change the PR (or create a new one) with modified ImageCacheFile.__reduce__ method and see if it really works for your case?

ron8mcr commented 6 years ago

Thanks, now it looks better and working

vstoykov commented 6 years ago

@ron8mcr thank you very much for your work on this issue.

One last thing. Can you try to add a test case that will pickle (see if there are test that will test pickleability) of ImageCacheFile?

ron8mcr commented 6 years ago

@vstoykov, there are few test cases already

vstoykov commented 6 years ago

Yes there are test cases for cache files but there is no test that will ensure that if ImageCacheFile is pickled and then restored will use the same storage (same instance and not similar instance).

The change in this merge request will ensure that if the storage is default storage than there is no need to pickle it, and if there is no storage to restore then use default storage (which is singleton). Then if checked with is before pickle and after restore it should be the same instance. Something like:

assert before.storage is after.storage

Without your change this assertion should fail.

vstoykov commented 6 years ago

@ron8mcr are you still interested this to be merged?

If not I can make tests for it when I have some time available. Still I will be very glad if you can make the tests for this issue?

Thanks in advance.

ron8mcr commented 6 years ago

@vstoykov sorry for late response. Yes, I'm interested in, but have not enough time to finish it. I'll do my best to finish on this weekend

MikeVL commented 6 years ago

This pr closes https://github.com/matthewwithanm/django-imagekit/issues/451

bkvirendra commented 6 years ago

@ron8mcr any updates on this PR? Thank you :)