simonw / datasette-media

Datasette plugin for serving media based on a SQL query
Apache License 2.0
19 stars 1 forks source link

Support image resizing and conversion #3

Closed simonw closed 4 years ago

simonw commented 4 years ago

Roll in the functionality from https://github.com/simonw/heic-to-jpeg - do this after #2

simonw commented 4 years ago

Resizing arguments could also be supported as coming back from the SQL table - so a table could specify the desired width of an image.

Maybe this would be better than the free form URL arguments, which could be abused?

Could even make URL parameters available to the SQL query so it could chose to support them or not.

HEIC to JPEG should be configured by a plugin setting of some sort rather than automatically happening to all HEIC files (a weird special case).

simonw commented 4 years ago

Features to support:

simonw commented 4 years ago

Once I've built this I'll be able to use it to provide thumbnails in dogsheep/photos-to-sqlite.

simonw commented 4 years ago

These resize operations should probably run in a thread pool to avoid blocking the main serving thread. Pillow releases the GIL so this should be multi-core friendly.

simonw commented 4 years ago

Here's rendering code from my hacked-together not-yet-released S3 image proxy:

from starlette.responses import Response
from PIL import Image, ExifTags
import pyheif

for ORIENTATION_TAG in ExifTags.TAGS.keys():
    if ExifTags.TAGS[ORIENTATION_TAG] == "Orientation":
        break
    ...
    # Load it into Pillow
    if ext == "heic":
        heic = pyheif.read_heif(image_response.content)
        image = Image.frombytes(mode=heic.mode, size=heic.size, data=heic.data)
    else:
        image = Image.open(io.BytesIO(image_response.content))

    # Does EXIF tell us to rotate it?
    try:
        exif = dict(image._getexif().items())
        if exif[ORIENTATION_TAG] == 3:
            image = image.rotate(180, expand=True)
        elif exif[ORIENTATION_TAG] == 6:
            image = image.rotate(270, expand=True)
        elif exif[ORIENTATION_TAG] == 8:
            image = image.rotate(90, expand=True)
    except (AttributeError, KeyError, IndexError):
        pass

    # Resize based on ?w= and ?h=, if set
    width, height = image.size
    w = request.query_params.get("w")
    h = request.query_params.get("h")
    if w is not None or h is not None:
        if h is None:
            # Set h based on w
            w = int(w)
            h = int((float(height) / width) * w)
        elif w is None:
            h = int(h)
            # Set w based on h
            w = int((float(width) / height) * h)
        w = int(w)
        h = int(h)
        image.thumbnail((w, h))

    # ?bw= converts to black and white
    if request.query_params.get("bw"):
        image = image.convert("L")

    # ?q= sets the quality - defaults to 75
    quality = 75
    q = request.query_params.get("q")
    if q and q.isdigit() and 1 <= int(q) <= 100:
        quality = int(q)

    # Output as JPEG or PNG
    output_image = io.BytesIO()
    image_type = "JPEG"
    kwargs = {"quality": quality}
    if image.format == "PNG":
        image_type = "PNG"
        kwargs = {}

    image.save(output_image, image_type, **kwargs)
    return Response(
        output_image.getvalue(),
        media_type="image/jpeg",
        headers={"cache-control": "s-maxage={}, public".format(365 * 24 * 60 * 60)},
    )
simonw commented 4 years ago

Resizing that's specified by columns returned from the SQL query will always be respected - no additional configuration needed. Those columns will be:

Other features will be controlled by metadata settings:

{
    "plugins": {
        "datasette-media": {
            "photo": {
                "sql": "select filepath from apple_photos where uuid=:key",
                "enable_query_parameters": true,
                "default_convert": {
                    "heic": "jpeg"
                }
            }
        }
    }
}

So enable_query_parameters turns on the ability to reformat with ?w= and ?h= and ?format= in the querystring parameters. default_convert can be used to default to converting certain formats.

I should default to stripping out EXIF data, because leaking latitude/longitude in GPS tags is a potential privacy violation. To allow EXIF through I can support this option:

    "strip_exif": false
simonw commented 4 years ago

I'll open these as sub-tickets.

simonw commented 4 years ago

I'm going to always detect the image format by inspecting the bytes, rather than assuming the file extension might be correct. Since I only need to know the format if I'm planning on modifying it (which will require reading in the bytes) this seems like the simplest option.

The Python standard library imghdr module can do this for jpeg/gif/png - but it doesn't yet support HEIC.

So I need my own code to detect if a chunk of bytes is HEIC.

https://github.com/strukturag/libheif/issues/83 has some clues, in particular this commit: https://github.com/GNOME/gimp/commit/e4bff4c8016f18195f9a6229f59cbf41740ddb8d

I opened an HEIC file of my own and saw this:

b'\x00\x00\x00 ftypheic\x00\x00\x00\x00mif1miafMiHBheic\x00\x00\r4meta\x00\x00\x00\x00\x00\x00\x00`
simonw commented 4 years ago

https://github.com/GNOME/gimp/blob/e4bff4c8016f18195f9a6229f59cbf41740ddb8d/plug-ins/common/file-heif.c#L123-L135

  /* HEIF is an ISOBMFF format whose "brand" (the value after "ftyp")
   * can be of various values. I added the "mif1" brand as I saw some
   * HEIF files with this value, and it loaded fine (though it may not
   * be valid theoretically, according to libheif developers).
   * See also: https://gitlab.gnome.org/GNOME/gimp/issues/2209
   */
  gimp_register_magic_load_handler (LOAD_PROC,
                                    "heif,heic",
                                    "",
                                    "4,string,ftypheic,4,string,ftypheix,"
                                    "4,string,ftyphevc,4,string,ftypheim,"
                                    "4,string,ftypheis,4,string,ftyphevm,"
                                    "4,string,ftyphevs,4,string,ftypmif1");
simonw commented 4 years ago

I think the way to interpret that magic string is that it means look for any of these values at an offset of 4 from the start:

ftypheic, ftypheix, ftyphevc, ftypheim, ftypheis, ftyphevm, ftyphevs, ftypmif1

Frustratingly I couldn't find documentation for gimp_register_magic_load_handler that explained the magics parameter beyond this: http://oldhome.schmorp.de/marc/pdb/gimp_register_magic_load_handler.html

STRING | magics | comma separated list of magic file information this handler can load (i.e. "0,string,GIF")

simonw commented 4 years ago

Also found a few examples like this in different Objective-C projects on GitHub: https://github.com/lyonxu/LXKit/blob/9f89786bc87457c925a8c2349c90615f02dd6430/LXKit/Category/Foundation/NSData%2BLXImageContentType.m#L42-L52

        case 0x00: {
            if (data.length >= 12) {
                //....ftypheic ....ftypheix ....ftyphevc ....ftyphevx
                NSString *testString = [[NSString alloc] initWithData:[data subdataWithRange:NSMakeRange(4, 8)] encoding:NSASCIIStringEncoding];
                if ([testString isEqualToString:@"ftypheic"]
                    || [testString isEqualToString:@"ftypheix"]
                    || [testString isEqualToString:@"ftyphevc"]
                    || [testString isEqualToString:@"ftyphevx"]) {
                    return SDImageFormatHEIC;
                }
            }
            break;
        }
simonw commented 4 years ago

I'm just going to go with ftypheic, ftypheix, ftyphevc, ftyphevx for the moment.

simonw commented 4 years ago
In [31]: jpeg_bytes = open('/Users/simon/Pictures/Photos Library.photoslibrary/originals/0/0A9EB544-AC29-4D70-BDD2-4DF2E53D6E1A.jpeg', 'rb'
    ...: ).read(1024)                                                                                                                      

In [32]: imghdr.what(None, jpeg_bytes)                                                                                                     
Out[32]: 'jpeg'

In [33]: heic_bytes = open('/Users/simon/Pictures/Photos Library.photoslibrary/originals/0/0FA832F4-92B2-4234-A1A8-3E7FE373E1F7.heic', 'rb'
    ...: ).read(1024)                                                                                                                      

In [34]: imghdr.what(None, jpeg_bytes)                                                                                                     
Out[34]: 'jpeg'

In [35]: imghdr.what(None, heic_bytes)                                                                                                     

In [36]: type(imghdr.what(None, heic_bytes))                                                                                               
Out[36]: NoneType

So imghdr can tell me if a file is a JPEG/PNG/etc. If it returns None I can check for HEIC by looking at the bytes myself.

simonw commented 4 years ago

I can write a test using https://github.com/mathiasbynens/small/blob/master/heif.heif

simonw commented 4 years ago

Closing this for #5 and #6.