Open zerolab opened 8 months ago
@zerolab It would be helpful to have an image file that causes this bug.
Yeah.. I have been trying.. https://wagtailcms.slack.com/archives/C81FGJR2S/p1709908265029289
I suspect they are uploading images without an extension, and then there's potentially something with the system that filetype
cannot determine the file so Willow thinks it may be svg-ish
OK, so finally got confirmation. It was images without an extension, so https://github.com/wagtail/Willow/blob/830aa3d386fd2ef2aa48c8032ba7d62bf2e04fc1/willow/image.py#L86-L87 is kicking in
Honestly - does it matter if we have a file that fails?
I thought he mentioned there was a file extension visible in the database. (his examples also show extension)
We should probably fallback on extensions after checking the mimetype; and then try to guess if it's SVG, no? Seems like a generally good improvement to me.
We have a custom model inheriting from AbstractImage
:
class Asset(AbstractImage):
file = models.ImageField(
verbose_name=_('file'),
upload_to=settings.ASSET_UPLOAD_PREFIX,
width_field='width',
height_field='height',
storage=MediaStorage()
)
MediaStorage
here is using django-gcloud-storage
When I upload an image I get the following error:
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 0
Which is weird as I am uploading a png or jpeg image.I tried different images on different environments so this issue is not arising from a corrupted image file.
After further investigation and @zerolab and @Nigel2392 think the problem is arising when willow is guessing the wrong file type /extension in https://github.com/wagtail/Willow/blob/830aa3d386fd2ef2aa48c8032ba7d62bf2e04fc1/willow/image.py#L86-L87
The code blame shows that this part of the code was added last year. I am upgrading from Wagtail 4.2 to Wagtail 5, which would explain why it never occurred before.
When I run the following:
import filetype
from wagtail.images import get_image_model
Image = get_image_model()
image = Image.objects.get(file="dev/chair-unsplash.jpg")
with image.open_file() as image_file:
ext = filetype.guess_extension(image_file)
print(f"Image {image.pk} {image.file} has extension {ext}")
I get Image 149277 dev/chair-unsplash.jpg has extension None
My guess is that it is unable to guess the correct file type for the BLOB when opening a file from gcloud storage.
I still get the issue even if the extension
column in DB is jpeg
. The file itself has the extension as well.
Hi, I am doing some more investigation and here's my findings
In [4]: image = Image.objects.get(file="cms-dev/chair-unsplash_TAJSD0d.jpg")
In [5]: image.file
Out[5]: <ImageFieldFile: cms-dev/chair-unsplash_TAJSD0d.jpg>
When I try to get the path I get unimplemented error, which is understandable as it is being read from Gcloud into a temp fle:
In [11]: image.file.path
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
Cell In[11], line 1
----> 1 image.file.path
File /usr/local/lib/python3.10/site-packages/django/db/models/fields/files.py:59, in FieldFile.path(self)
56 @property
57 def path(self):
58 self._require_file()
---> 59 return self.storage.path(self.name)
File /usr/local/lib/python3.10/site-packages/django/core/files/storage.py:128, in Storage.path(self, name)
122 def path(self, name):
123 """
124 Return a local filesystem path where the file can be retrieved using
125 Python's built-in open() function. Storage systems that can't be
126 accessed using open() should *not* implement this method.
127 """
--> 128 raise NotImplementedError("This backend doesn't support absolute paths.")
NotImplementedError: This backend doesn't support absolute paths.
open_file
method I get the extension is None
:In [21]: with image.open_file() as image_file:
...: ext = filetype.guess_extension(image_file)
...: print(f"Image {image.pk} {image.file} has extension {ext}")
...:
[24/Mar/2024 11:28:35] | urllib3.connectionpool._get_conn:292 | DEBUG | Resetting dropped connection: storage.googleapis.com
[24/Mar/2024 11:28:35] | urllib3.connectionpool._make_request:549 | DEBUG | https://storage.googleapis.com:443 "GET /storage/v1/b/staging/o/cms-dev%2Fchair-unsplash_TAJSD0d.jpg?projection=noAcl&prettyPrint=false HTTP/1.1" 200 839
[24/Mar/2024 11:28:36] | urllib3.connectionpool._make_request:549 | DEBUG | https://storage.googleapis.com:443 "GET /download/storage/v1/b/staging/o/cms-dev%2Fchair-unsplash_TAJSD0d.jpg?generation=1710246298001500&alt=media HTTP/1.1" 200 323686
Image 149277 cms-dev/chair-unsplash_TAJSD0d.jpg has extension None
default_storage.open
method I am getting the extension correctly:In [22]: with default_storage.open(image.file.name, 'rb') as file:
...: extension = filetype.guess_extension(file.read())
...: print(f"Image {image.pk} {image.file} has extension {extension}")
...:
[24/Mar/2024 11:32:10] | urllib3.connectionpool._get_conn:292 | DEBUG | Resetting dropped connection: storage.googleapis.com
[24/Mar/2024 11:32:11] | urllib3.connectionpool._make_request:549 | DEBUG | https://storage.googleapis.com:443 "GET /storage/v1/b/staging/o/cms-dev%2Fchair-unsplash_TAJSD0d.jpg?projection=noAcl&prettyPrint=false HTTP/1.1" 200 839
[24/Mar/2024 11:32:11] | urllib3.connectionpool._make_request:549 | DEBUG | https://storage.googleapis.com:443 "GET /download/storage/v1/b/staging/o/cms-dev%2Fchair-unsplash_TAJSD0d.jpg?generation=1710246298001500&alt=media HTTP/1.1" 200 323686
Image 149277 cms-dev/chair-unsplash_TAJSD0d.jpg has extension jpg
storage = image.file.storage
with storage.open(image.file.name, 'rb') as file:
extension = filetype.guess_extension(file.read())
print(f"Image {image.pk} {image.file} has extension {extension}")
# Image 149277 cms-dev/chair-unsplash_TAJSD0d.jpg has extension jpg
Does that mean that the open_file
method is not reading the file correctly?
Just an update, I was able to solve the issue by making changes to the storage class. The package I was using was downloading the image file from Gcloud and using a SpooledTemporaryFile
to store it temporarily. See the code.
I changed it o use a NamedTemporaryFile
and willow's open_file
method started to read the mime type correctly.
May be someone can add compatibility in Willow for SpooledTemporaryFile
.
But thanks for all your time and help. Really appreciate it.
https://pastebin.com/jNWdCZ2N
Update: the issue seems to be with https://github.com/wagtail/Willow/blob/830aa3d386fd2ef2aa48c8032ba7d62bf2e04fc1/willow/image.py#L82-L99 (and specifically https://github.com/wagtail/Willow/blob/830aa3d386fd2ef2aa48c8032ba7d62bf2e04fc1/willow/image.py#L86-L87) where the user is uploading a PNG (allegedly)