mono / SkiaSharp.Extended

SkiaSharp is a cross-platform, comprehensive 2D graphics API for all .NET platforms. And, here is where you will find all sorts of extras that you can use with it.
https://mono.github.io/SkiaSharp.Extended
MIT License
235 stars 69 forks source link

[FEATURE] Look at integrating a higher quality resizing #102

Open mattleibow opened 4 years ago

mattleibow commented 4 years ago

By default, SkiaSharp aims for the fastest resizing, not the best. There are quality levels, but it pretty much is a low, medium and high type of level.

There was a discussion on skia-discuss about this: https://groups.google.com/forum/#!topic/skia-discuss/2du6tuE3eds

Seems there is a nice library that can be used to actually allow for a better quality: https://github.com/avaneev/avir

Probably need to look at integration, licensing and alternatives for this kind of thing.

ziriax commented 4 years ago

I already made a .net standard wrapper for it, and added some high speed 8-bit gamma correction code, would it be useful if I published this?

mattleibow commented 4 years ago

Oh, yes! SkiaSharp is very much a library to create graphics to display on screen. One of the main "limitations" (or rather just a non-requirement) is that the saving to disk is pretty much PNG and JPEG. So this would be VERY useful.

ziriax commented 4 years ago

Okay, I'll publish it on github then, if I get the green light from my customer. It is rough around the edges, not yet ready to publish on nuget, but I did get it working on both Windows and Linux, so a good start.

I would love to tweak this library, supporting 16-bit floats to improve memory bandwidth, using ISPC to target any SIMD ISA, using the write combine buffer to avoid cache population, etc, but I'm not sure if I can find the time for this ;-) The AVIR author (Aleksey Vaneev, @avaneev) already did an incredible job providing fast SIMD implementations, so I'm not sure if further optimizations are needed.

ziriax commented 4 years ago

Okay, I got approval, I will publish the wrapper asap.

mattleibow commented 4 years ago

Once you get it out there, the community will help. I'll share links to it as well, because this is actually a pretty common request.

ziriax commented 4 years ago

It seems the author of PhotoSauce always has his pipeline ready for doing this: https://github.com/saucecontrol/PhotoSauce/issues/16#issuecomment-616182080

So no need to wrap AVIR anymore I guess, PhotoSauce gives much more options, is 100% .NET, and it even uses optimized SIMD intristics these days

ziriax commented 4 years ago

PhotoSauce certainly looks great, but it offers some overlap with Skia. Nevertheless, a very good candidate.

Yesterday I also started writing my own C# high quality downscaler, that can target fractional pixel sizes. Having fun with C# SIMD AVX2 instristics ;-)

First results with my code, downscaling this 30MP photo in tiny steps using a Lancos filter:

https://www.youtube.com/watch?v=VTr1ud4dGLY

Best watched at 1080p@60Hz

Of course Youtube's compression creates a lower quality image.

On my old i7 7700K it takes about 80ms to heavily downsample the image (excluding decoding the JPEG), but I haven't further optimized anything

I'm not sure what option would be best for users:

mattleibow commented 4 years ago

Awesome!

In all honesty, having a separate library that is .net standard only might be the best. PhotoSource is nice and all, but is fairly large and does things. If you have this tiny library that just resizes something, then maybe a small nuget will be great. You can even make the library not actually depend on SkiaSharp but just take some Span or IntPtr. Maybe add an extensions method for a few types. Then your logic can be used on ANY bitmap anywhere.

You can fix bugs and release at your own pace without waiting for SkiaSharp or Google.

People either like huge monolithic libraries that do EVERYTHING or lots of tiny ones that gust do that one thing they really want it for.

ziriax commented 4 years ago

Thanks!

If this becomes a tiny standalone nuget, then I think it makes more sense to use Intel's ISPC compiler, and make it a native library, with a tiny .NET Standard wrapper. Because all this C# SIMD is ubercool, but

  1. is .NET Core 3 only
  2. you have to manually write optimized code for each SIMD architecture
  3. .NET doesn't handle 16-bit floats, the best format for the linear rendering workflow my customer needs, and the best format according to Google for this...

Intel's ISPC is amazing in that you write your code once in a C-like language, and then it generates AOT optimized code for each architecture that you care about. Then at runtime, it efficiently picks the function matching the runtime CPU.

Gillibald commented 4 years ago

Intel's ISPC is amazing in that you write your code once in a C-like language, and then it generates AOT optimized code for each architecture that you care about. Then at runtime, it efficiently picks the function matching the runtime CPU.

That sounds great.

ziriax commented 4 years ago

@mattleibow Do you have good refereences on how to make a nuget package containing a .NET Standard native wrapper (like SkiaSharp) from scratch? I never did that before. Would you use CMake, or Cake as you do? Ninja/GN or just Visual Studio 2019's cross platform native toolchain?

I do have experience making native wrappers for .NET Standard, just not with packaging it for nuget, targetting all platforms, etc...

ziriax commented 4 years ago

Did some further experiments and contrary to what I've read on the net, the good old System.Numerics.Vectors also gives very good performance on my PC!

So to get started, this can become a really tiny pure .NET NuGet package, or I can just provide a PR for SkiaSharp (that is the easiest for me 😉). The API can be as simple as SKPixMap SKPixMap.ResampleTo(float width, float height, int kernelSize = 2)

Further optimization is always possible afterwards.

PS: As the author of PhotoSauce cleverly pointed out, one can use Skia's JPEG decoder to request an already downscaled-by-power-of-two image, and start the HQ downsampler from that (good old JPEG DCT decoding trick). But that of course doesn't work with PNGs etc, so should be kept outside of the downsampler logic.

mattleibow commented 4 years ago

A PR will be good too! Right now our scaling options are pretty much "eh". So, it will be really nice to get this in.

How (are?) are you handling different color types and alpha types? There is also the case where the endianness of the CPU mixes up the pixels...

What is the code actually doing? A resize or a resample? Right now, the current APIs for this are a bit messy... 😢 I have started on the path to unifying them with a Resize returning a new object and ScalePixels resizing the pixels into a destination object:

SKBitmap Resize();
void ScalePixels(SKBitmap destination);

(this is probably bad names, but it is what we there in the native code at the start)

With SKPixmap, there is no actual backing memory, so typically you only have ScalePixels for that one - because you need to provide a destination object. But this is obviously not the same for SKBitmap or SKImage - you can just new up a fresh object any day.

ziriax commented 4 years ago

Well, my first version will be really simple: always resample RGBA channels, and use Skia whenever possible. Further optimizations are possible when the alpha channel is known to be opaque, or for gray-scaling images, etc...

So I don't care if the layout is RGBA or BGRA, each channel is treated the same.

And I use Skia to do the linear color space conversion, basically all pixel formats are converted by Skia's SKPixmap.ReadPixels. The tricky part is tweaking how many rows to keep in memory, as a trade-off between the overhead of calling the Skia API and CPU cache usage. I my current experiments I don't use Skia yet for doing this, so it remains to be seen how fast this will be.

You're right, I can't return an SKPixmap, that should be an SKImage or SKBitmap... And the user should be able to specify the pixel format of the output too...

I'm doing resampling. I made this reference image for myself. Basically multiply the input (orange) pixel values with weights from the Lanczos kernel, and add them all up, to get the output (green) pixel. In other words, a large dot product. Resampling

But it is explained in nice detail here: https://entropymine.com/imageworsener/resample

So the bulk of the CPU operations are the large dot products that need to be made between the kernel weights and source pixel values, but the bottleneck is the RAM... In my AVX2 experiments I tried all kinds of bulk operations before writing the output (tried aligned non-temporal writes). But we really need 16-bit floats ("brain floats") to make this really fast IMO. Or the GPU of course, that will make it near real-time ;-)

I'll have a PR ready soon, we can discuss code then.

PS: One thing crossed my mind: deep learning networks use exactly the same algorithm to combine input values with weights, so image resampling is going to become faster and faster since CPU and GPU makers are spending a lot of money on making AI faster :)

mattleibow commented 4 years ago

Looking forward to that PR!

ziriax commented 4 years ago

Oh well, so far my plan to use Skia for doing the pixel conversions...

My original AVX2 code that starts from an 30MP 8888 bitmap, converts to linear space 32-bit floats, and does a huge convolution to create a 192px thumbnail takes 60ms

The System.Numerics.Vectors code that does the same, not using any intrinsics, takes 80ms.

So I tried to replace the linear color space conversion with Skia's native code, but... just doing that conversion alone already takes 90ms!!!

I will first debug the native code, because this is simply impossible, unless Skia is not using a lookup table...

Bummer.

Since loading JPEGs, downscaling these, and rendering them is a hot path for most users, this scenario must be made fast IMHO...

entdark commented 4 years ago

Isn't AVX2 limited to x86 and x86_64 while skia targets other ABIs as well?

ziriax commented 4 years ago

Sure, but even a simple for loop with a lookup table is much faster... And Skia should be using SIMD too according to their mailing list feedback. I am going to debug this to figure out the bottleneck.

ziriax commented 4 years ago

Indeed the pipeline uses SIMD, but is very generic: it first loads the byte4 into float4, then swaps two channels (because by default, JPEGS are loaded in BGRA and not RGBA it seems), and then converts the float4 into linear colors float4 using heavy math.

I asked the Skia team for advice: https://groups.google.com/forum/#!topic/skia-discuss/zOWhMHONG98

For now I'm going to ignore this, and get a working PR out first ;-)

PS: Having debugged the pipeline, obviously it feels this image resampler should be something that is native to Skia. Actually, the SkBlurImageFilter is a special case of such a resampler...

ziriax commented 4 years ago

@mattleibow Is the upcoming beta SkiaSharp going to be a .NET Standard 2.1 library, or will it remain to be a .NET Standard 1.0/2.0 assembly? Because only .NET Standard 2.1 has goodies like Vector4 it seems

mattleibow commented 4 years ago

I am planning on 2.0, but we can always 2.1 as another target. The we can have a "slow" version in 2.0 and the "fast" version in 2.1. This should be fine since we will be able to use 2.1 bits in iOS and Android and netcoreapp3.1.

In fact, we might just be good to install this NuGet: https://www.nuget.org/packages/System.Numerics.Vectors/ That brings things to netstandard2.0. I already am using the System.IO.UnmanagedMemoryStream and System.Memory packages. Just add that package reference to the csproj: https://github.com/mono/SkiaSharp/blob/master/binding/SkiaSharp/SkiaSharp.csproj#L18

ziriax commented 4 years ago

Just lost hours trying to find the reason why my resampled images had color banding...

Turns out that no images viewer besides GIMP can display the HQ PNGs saved by Skia correctly ;-) Even Chrome doesn't display it correctly.

As a rule of thumb, the images should be converted to sRGB 8-bit per channel before saving... Nope, the above doesn't seem to help...

Unfortunately, Skia doesn't support dithering yet, so banding is still visible in some cases for 8-bit color channels, but that is another issue.

ziriax commented 4 years ago

Good news and bad news.

Good news: I'm ready with the PR of the HQ image resampler, pushing it asap

Bad (well) news: Skia m80 already contains a reasonable quality downsampler! It uses mip-maps to downsample using a box filter first, then it uses a high quality filter to resize that mip-map. That gives reasonable results.

Of course for images with a lot of high frequencies, or for artificially generated images, Skia can generate bad results, as this video I made shows (left side is my resampler, right side is Skia's)

https://www.youtube.com/watch?v=KNzMj7frzcE

But compared to m68, m80 is already a lot better it seems.

ziriax commented 4 years ago

Well, the difference can be rather large with e.g. the baboon in the samples gallery:

Capture2

Capture1

The whiskers and overall sharpness of the resampled photo is much higher IMHO (note that on high-dpi screens your browser will unfortunately also scale these images, making them fuzzy).

Pushing the PR now.

runxc1 commented 1 year ago

So is there some sample code that shows a way to get a higher quality Resizing of an image? I'm converting some code from System.Drawing to SkiaSharp that primarily created multiple smaller versions of an uploaded image and have just used SKBitmap.Resize and am finding the quality to be a little lower than what I had before.