saucecontrol / PhotoSauce

MagicScaler high-performance, high-quality image processing pipeline for .NET
http://photosauce.net/
MIT License
609 stars 49 forks source link

Smart crop #54

Open xMANIGHTx opened 4 years ago

xMANIGHTx commented 4 years ago

It would be nice to have a "smart crop" feature. The main function of those kind of apps is generating thumbnails, so the resize/crop part functionality is the most important I guess. It would be nice to have a smart resize/crop feature that for example crops around where there is more "pixel density" in an image, or face recognition for example if there are people involved.

PS. your software and "server" are wonderful

saucecontrol commented 4 years ago

It would be nice to have a "smart crop" feature

This is another good idea, however I'm not sure how it fits in the philosphy of this project. MagicScaler goes to great lengths to avoid decoding the entire source image into memory, and any kind of analysis of the image contents necessarily breaks that model.

I tend to see this library as being used to perform many processing operations against a finite library of images. In that model, it usually makes more sense to do expensive analysis like face or object-of-interest detection once and store that metadata with the image library rather than to perform the analysis with each processing operation.

In either model, it would be nice to have a bit more control over auto-cropping. For example, being able to define a focal point for the crop rather than having it anchored to the image center or edges would be valuable. And being able to take face data from another source (e.g. OpenCV or a ML-based web API) and use that to do smart crops would be valuable. Perhaps a utility that takes care of the math to customize the crop before passing it to MagicScaler would fit the bill.

I have a sample that does smart face crops using Azure Computer Vision that I should publish in this repo. That's an example where the face detection is expensive -- not just in time but also possibly in Azure fees -- so you'd preferably do it only once per source image rather than once per crop/resize operation.

PS. your software and "server" are wonderful

Thanks! 😃

xMANIGHTx commented 4 years ago

Hallo! You are a very "responsive" developer, this is a great thing! Didn't know those infos could be embedded in the "source" images. This could be a good thing, being able to read meta-data. But for the more common users, I guess the ability to do this kind of processing would be a great add on. If the facial recognition is too "complex"/heavy, maybe something that finds where there is more "density" could be a bit more lightweight. After all there is a cache, so this processing would be done one time. As I said those kind of scripts are mostly used for thumbnailing and so resizing/cropping are the core of them. Having this flexiblity on the cropping algorithm would be a big plus. Of course if the user wants a more lightweight implementation will not make use of the smart crop feature, nonetheless I think it's worth investing some resources in developing it. At the moment imageresizer is the only "professional" solution around but prices are out of the world for single developers like me that cannot go down the route of full apache 2 licensing model. So there is plenty of space in this market in my opinion.

xMANIGHTx commented 4 years ago

I found this if it can be useful: https://imageprocessor.org/imageprocessor/imagefactory/entropycrop/

saucecontrol commented 4 years ago

Yeah, it's not especially difficult to do a basic edge detection based crop, and it's not that expensive computationally if you have all the pixels available to examine.

The main issue is philosophical rather than technical. We could materialize the entire image in order to examine it, but MagicScaler specifically avoids doing that unless absolutely necessary. I'm open to the idea of saying its necessary in order to do an auto-crop (perhaps after resize if the image is being scaled down), but I also try to avoid performance traps, where the user doesn't know that what they're asking is going to kill performance. Other libraries that have less performance focus don't make that distinction, but that's why we have multiple solutions to this problem.

iamcarbon commented 4 years ago

If you're able to get the crop box from another library, you can also pass that directly too MagicScaler to do the final crop.

There's a lot of variables that can be considered when cropping (prioritizing objects, colors, edges, entropy, applying the rule of thirds, etc). It would be hard to pick one specific algorithm that works well for everything.

loudenvier commented 3 weeks ago

If you're able to get the crop box from another library, you can also pass that directly too MagicScaler to do the final crop.

There's a lot of variables that can be considered when cropping (prioritizing objects, colors, edges, entropy, applying the rule of thirds, etc). It would be hard to pick one specific algorithm that works well for everything.

I also don't think that Face Detection should be part of this library. It (normally) is a lot more demanding on resources and falls into Computer Vision territory. That said I did exactly what @iamcarbon suggested in a current project of mine. I've used Emgu.CV (which wraps OpenCV) to find faces on a photo, then cropped around the largest one. Here is the relevant code:

using PhotoSauce.MagicScaler;
using Emgu.CV;
using Emgu.CV.CvEnum;

...

public static ReadOnlySpan<byte> CropFace(byte[] imgBytes, Options o) {
    using var img = new Mat();
    using var gray = new UMat();
    CvInvoke.Imdecode(imgBytes, ImreadModes.AnyColor, img); 
    CvInvoke.CvtColor(img, gray, ColorConversion.Bgr2Gray);
    var faces = faceDetector.DetectMultiScale(gray);
    if (faces.Length == 0)
        return [];
    var largest = faces.MaxBy(f => f.Height * f.Width);
    largest.Inflate((int)(largest.Width * 0.2), (int)(largest.Height * 0.2));
    largest.Intersect(new(0, 0, img.Width, img.Height));
    var imageBytes = CvInvoke.Imencode(".jpg", largestFace);

    using var dest = new MemoryStream();
    MagicImageProcessor.ProcessImage(imageBytes, dest, new() {
        Width = o.Width ?? 0,
        Height = o.Height ?? 0,
        ResizeMode = CropScaleMode.Contain,
        Anchor = CropAnchor.Center,
        HybridMode = o.ScaleMode,
        MatteColor = o.BackColor,
    });
    return dest.ToArray();  
}

(I have inlined some extension methods of mine and stripped error handling to make it easier to understand the idea)