Closed simonw closed 4 years ago
Resizing arguments could also be supported as coming back from the SQL table - so a table could specify the desired width of an image.
Maybe this would be better than the free form URL arguments, which could be abused?
Could even make URL parameters available to the SQL query so it could chose to support them or not.
HEIC to JPEG should be configured by a plugin setting of some sort rather than automatically happening to all HEIC files (a weird special case).
Features to support:
?w=
and ?h=
parameters?format=
parameter?bw=1
(with room to add more in the future)Once I've built this I'll be able to use it to provide thumbnails in dogsheep/photos-to-sqlite.
These resize operations should probably run in a thread pool to avoid blocking the main serving thread. Pillow releases the GIL so this should be multi-core friendly.
Here's rendering code from my hacked-together not-yet-released S3 image proxy:
from starlette.responses import Response
from PIL import Image, ExifTags
import pyheif
for ORIENTATION_TAG in ExifTags.TAGS.keys():
if ExifTags.TAGS[ORIENTATION_TAG] == "Orientation":
break
...
# Load it into Pillow
if ext == "heic":
heic = pyheif.read_heif(image_response.content)
image = Image.frombytes(mode=heic.mode, size=heic.size, data=heic.data)
else:
image = Image.open(io.BytesIO(image_response.content))
# Does EXIF tell us to rotate it?
try:
exif = dict(image._getexif().items())
if exif[ORIENTATION_TAG] == 3:
image = image.rotate(180, expand=True)
elif exif[ORIENTATION_TAG] == 6:
image = image.rotate(270, expand=True)
elif exif[ORIENTATION_TAG] == 8:
image = image.rotate(90, expand=True)
except (AttributeError, KeyError, IndexError):
pass
# Resize based on ?w= and ?h=, if set
width, height = image.size
w = request.query_params.get("w")
h = request.query_params.get("h")
if w is not None or h is not None:
if h is None:
# Set h based on w
w = int(w)
h = int((float(height) / width) * w)
elif w is None:
h = int(h)
# Set w based on h
w = int((float(width) / height) * h)
w = int(w)
h = int(h)
image.thumbnail((w, h))
# ?bw= converts to black and white
if request.query_params.get("bw"):
image = image.convert("L")
# ?q= sets the quality - defaults to 75
quality = 75
q = request.query_params.get("q")
if q and q.isdigit() and 1 <= int(q) <= 100:
quality = int(q)
# Output as JPEG or PNG
output_image = io.BytesIO()
image_type = "JPEG"
kwargs = {"quality": quality}
if image.format == "PNG":
image_type = "PNG"
kwargs = {}
image.save(output_image, image_type, **kwargs)
return Response(
output_image.getvalue(),
media_type="image/jpeg",
headers={"cache-control": "s-maxage={}, public".format(365 * 24 * 60 * 60)},
)
Resizing that's specified by columns returned from the SQL query will always be respected - no additional configuration needed. Those columns will be:
resize_width
- a width to resize toresize_height
- a height to resize tooutput_format
- the format to convert to and output, e.g. jpeg
or png
Other features will be controlled by metadata settings:
{
"plugins": {
"datasette-media": {
"photo": {
"sql": "select filepath from apple_photos where uuid=:key",
"enable_query_parameters": true,
"default_convert": {
"heic": "jpeg"
}
}
}
}
}
So enable_query_parameters
turns on the ability to reformat with ?w=
and ?h=
and ?format=
in the querystring parameters. default_convert
can be used to default to converting certain formats.
I should default to stripping out EXIF data, because leaking latitude/longitude in GPS tags is a potential privacy violation. To allow EXIF through I can support this option:
"strip_exif": false
I'll open these as sub-tickets.
I'm going to always detect the image format by inspecting the bytes, rather than assuming the file extension might be correct. Since I only need to know the format if I'm planning on modifying it (which will require reading in the bytes) this seems like the simplest option.
The Python standard library imghdr
module can do this for jpeg/gif/png - but it doesn't yet support HEIC.
So I need my own code to detect if a chunk of bytes is HEIC.
https://github.com/strukturag/libheif/issues/83 has some clues, in particular this commit: https://github.com/GNOME/gimp/commit/e4bff4c8016f18195f9a6229f59cbf41740ddb8d
I opened an HEIC file of my own and saw this:
b'\x00\x00\x00 ftypheic\x00\x00\x00\x00mif1miafMiHBheic\x00\x00\r4meta\x00\x00\x00\x00\x00\x00\x00`
/* HEIF is an ISOBMFF format whose "brand" (the value after "ftyp")
* can be of various values. I added the "mif1" brand as I saw some
* HEIF files with this value, and it loaded fine (though it may not
* be valid theoretically, according to libheif developers).
* See also: https://gitlab.gnome.org/GNOME/gimp/issues/2209
*/
gimp_register_magic_load_handler (LOAD_PROC,
"heif,heic",
"",
"4,string,ftypheic,4,string,ftypheix,"
"4,string,ftyphevc,4,string,ftypheim,"
"4,string,ftypheis,4,string,ftyphevm,"
"4,string,ftyphevs,4,string,ftypmif1");
I think the way to interpret that magic string is that it means look for any of these values at an offset of 4 from the start:
ftypheic, ftypheix, ftyphevc, ftypheim, ftypheis, ftyphevm, ftyphevs, ftypmif1
Frustratingly I couldn't find documentation for gimp_register_magic_load_handler
that explained the magics
parameter beyond this: http://oldhome.schmorp.de/marc/pdb/gimp_register_magic_load_handler.html
STRING | magics | comma separated list of magic file information this handler can load (i.e. "0,string,GIF")
Also found a few examples like this in different Objective-C projects on GitHub: https://github.com/lyonxu/LXKit/blob/9f89786bc87457c925a8c2349c90615f02dd6430/LXKit/Category/Foundation/NSData%2BLXImageContentType.m#L42-L52
case 0x00: {
if (data.length >= 12) {
//....ftypheic ....ftypheix ....ftyphevc ....ftyphevx
NSString *testString = [[NSString alloc] initWithData:[data subdataWithRange:NSMakeRange(4, 8)] encoding:NSASCIIStringEncoding];
if ([testString isEqualToString:@"ftypheic"]
|| [testString isEqualToString:@"ftypheix"]
|| [testString isEqualToString:@"ftyphevc"]
|| [testString isEqualToString:@"ftyphevx"]) {
return SDImageFormatHEIC;
}
}
break;
}
I'm just going to go with ftypheic, ftypheix, ftyphevc, ftyphevx for the moment.
In [31]: jpeg_bytes = open('/Users/simon/Pictures/Photos Library.photoslibrary/originals/0/0A9EB544-AC29-4D70-BDD2-4DF2E53D6E1A.jpeg', 'rb'
...: ).read(1024)
In [32]: imghdr.what(None, jpeg_bytes)
Out[32]: 'jpeg'
In [33]: heic_bytes = open('/Users/simon/Pictures/Photos Library.photoslibrary/originals/0/0FA832F4-92B2-4234-A1A8-3E7FE373E1F7.heic', 'rb'
...: ).read(1024)
In [34]: imghdr.what(None, jpeg_bytes)
Out[34]: 'jpeg'
In [35]: imghdr.what(None, heic_bytes)
In [36]: type(imghdr.what(None, heic_bytes))
Out[36]: NoneType
So imghdr
can tell me if a file is a JPEG/PNG/etc. If it returns None
I can check for HEIC by looking at the bytes myself.
I can write a test using https://github.com/mathiasbynens/small/blob/master/heif.heif
Closing this for #5 and #6.
Roll in the functionality from https://github.com/simonw/heic-to-jpeg - do this after #2