nagadomi / nunif

Misc; latest version of waifu2x; 2D video to stereo 3D video conversion
MIT License
1.58k stars 142 forks source link

[unlimited:waifu2x] AlphaBorderPadding #32

Closed nagadomi closed 1 year ago

nagadomi commented 1 year ago

alpha_border_padding(related to the handling of transparent PNG) is not fully working on unlimited:waifu2x. ONNX model output is different between Python and JavaScript version, resulting in jagged border with transparent area.

related: https://github.com/nagadomi/waifu2x/issues/197#issuecomment-1531036741 to several comments

nagadomi commented 1 year ago

This seems to be fixed already. Maybe due to some change of ONNX exporter on PyTorch side or updating the hosted onnx runtime

nagadomi commented 1 year ago

I have tried with other images, and still have problems

nagadomi commented 1 year ago

About AlphaBorderPadding AlphaBorderPadding pads RGB values of fully transparent areas (alpha channel=0). Even fully transparent areas have RGB values. But it is generally never displayed to users, so it has unstable values, such as fixed values, color palettes, signatures, uninitialized values, etc. So convolving an image without AlphaBorderPadding may cause artifacts on the alpha border.

Examples: input Mozilla_Firefox_3 5_logo rgb Mozilla_Firefox_3 5_logo_rgb padded Mozilla_Firefox_3 5_logo_pad

input gimp-icon rgb gimp-icon_rgb padded

gimp-icon_pad

input blender_icon_256x256 rgb

blender_logo_rgb

padded blender_logo_pad

nagadomi commented 1 year ago

copy from https://github.com/nagadomi/nunif/issues/36#issuecomment-1537748115

And there still seems to be a problem. The output of the Blender logo is as follows in unlimited:waifu2x. alpha_border_pad_onnx There is a strange strong red line. (ignore reflection padding).

pytorch version output blender_logo_pad

Change script.js as follows to show the result of AlphaBorderPadding.

         // create temporary canvas for tile input
-        image_data = this.to_image_data(x.data, alpha3.data, x.dims[3], x.dims[2]);
+        image_data = this.to_image_data(x.data, null, x.dims[3], x.dims[2]);
         var input_canvas = document.createElement("canvas");
         input_canvas.width = w;
         input_canvas.height = h;
         var input_ctx = input_canvas.getContext("2d", {willReadFrequently: true});
         input_ctx.putImageData(image_data, 0, 0);
+        document.body.appendChild(input_canvas);
         var all_blocks = p.h_blocks * p.w_blocks;

         // tiled rendering
nagadomi commented 1 year ago

Additionally, using the Python version of onnxruntime, it shows correctly.

ed0486d351699b1255a03145ca08cb43

I change the file name and offset=8 in https://github.com/nagadomi/nunif/blob/d5ede7b19d57528c4e01c78ca8459a181edcd825/nunif/models/onnx_helper_models.py#L310-L347 and run it.

LoganDark commented 1 year ago

Additionally, using the Python version of onnxruntime, it shows correctly.

ed0486d351699b1255a03145ca08cb43

I change the file name and offset=8 in

https://github.com/nagadomi/nunif/blob/d5ede7b19d57528c4e01c78ca8459a181edcd825/nunif/models/onnx_helper_models.py#L310-L347

and run it.

What if you use the CPU execution provider from Python?

nagadomi commented 1 year ago

I tried CPUExecutionProvider and it works correctly. 2f1028cc03d47482682b8846cd986cf8


This problem can have two major causes.

  1. pixel values of images returned from Canvas
  2. ONNXRuntime execution providers

The hard part of this problem is that I don't know what is happening in Python->TorchScript->ONNX process. The easiest solution is to reimplement AlphaBorderPadding in pure JavaScript.

LoganDark commented 1 year ago

The easiest solution is to reimplement AlphaBorderPadding in pure JavaScript.

I actually just finished doing this and it looks like the problem might be something like canvas using premultiplied alpha internally

blender

image

This happens even if I use an image with this rgb:

blender-bled-opaque

The image is here:

blender-bled

And bleed output:

image

LoganDark commented 1 year ago

image See these artifacts on the rgb input when it is not bled I am not sure where these come from, but if I import an output (with no model or padding or bleeding) into an image editor you can see dark edges indicative of premultiplication image which is probably lowering precision and if I removed the alpha channel in my image editor you can see it looks pretty premultiplied

image

there are parameters that can be used to combat this I will run some experiments

also see https://stackoverflow.com/questions/23497925/how-can-i-stop-the-alpha-premultiplication-with-canvas-imagedata

LoganDark commented 1 year ago

WebGL can be used to avoid the forced premultiplication without using a PNG decoder. https://stackoverflow.com/a/60564905

I will try to implement this solution and report my results

LoganDark commented 1 year ago

image

It is working. Here is the code to use webgl1 (not webgl2, which returns null on my browser) to read pixels data from an image

        const gl = new OffscreenCanvas(0, 0).getContext('webgl') // this can be reused arbitrarily many times

        gl.activeTexture(gl.TEXTURE0)
        const texture = gl.createTexture()
        gl.bindTexture(gl.TEXTURE_2D, texture)
        gl.texImage2D(gl.TEXTURE_2D, 0, gl.RGBA, gl.RGBA, gl.UNSIGNED_BYTE, bitmap)

        const framebuffer = gl.createFramebuffer()
        gl.bindFramebuffer(gl.FRAMEBUFFER, framebuffer)
        gl.framebufferTexture2D(gl.FRAMEBUFFER, gl.COLOR_ATTACHMENT0, gl.TEXTURE_2D, texture, 0)

        const imageData = new ImageData(width, height)
        gl.readPixels(0, 0, width, height, gl.RGBA, gl.UNSIGNED_BYTE, imageData.data)

        gl.deleteTexture(texture)
        gl.deleteFramebuffer(framebuffer)
nagadomi commented 1 year ago

If putImageData is the cause, then implementing crop(x, y, width, height) for Float32Array without creating a temporary canvas will work fine. However, if the pixel values are already broken at https://github.com/nagadomi/nunif/blob/d5ede7b19d57528c4e01c78ca8459a181edcd825/waifu2x/unlimited_waifu2x/public_html/script.js#L640-L651 , then PNG decoder without Canvas is needed.

LoganDark commented 1 year ago

If putImageData is the cause, then implementing crop(x, y, width, height) for Float32Array without creating a temporary canvas will work fine. However, if the pixel values are already broken at

https://github.com/nagadomi/nunif/blob/d5ede7b19d57528c4e01c78ca8459a181edcd825/waifu2x/unlimited_waifu2x/public_html/script.js#L640-L651

, then PNG decoder without Canvas is needed.

See the reply I just made. WebGL1 works. But you need to be able to process the input data without tripping through canvas. This is only possible currently within my rewritten codebase that uses ImageBitmap and ImageData.

ImageBitmaps can be safely manipulated and cropped using createImageBitmap as long as you specify the options {premultiplyAlpha: 'none'}

For displaying to canvas (eg. previewing inputted image in the src canvas) you can efficiently use the bitmaprenderer context and create a new premultiplied bitmap to display there.


inputCanvas.getContext('bitmaprenderer')!.transferFromImageBitmap(await createImageBitmap(imageBitmap, {premultiplyAlpha: 'premultiply'}))```
LoganDark commented 1 year ago

By the way, some images, even with perfect pixel reading, will behave horribly with edge bleeding.

Take this image for example, which I have seved to my computer:

hai

Edge bleeding results in:

image

Because, if you look at the raw RGB data in an image editor, it really does have bad pixels in the data:

image

Even though it is not visible whatsoever in the source:

image

The solution is, instead of using all pixels with an alpha above 0, define the boundary to be something like 0.5.

Then the result is much nicer:

image

A threshold of 0.1 is perhaps too low:

image

A theshold of 0.25 is:

image

0.2:

image

Not sure which one I like more or whether it should be adjustable, maybe I will make it adjustable. I think 0.5 is good but it might be too aggressive for some images, I am not sure.

nagadomi commented 1 year ago

Thank you. I understand the cause of this problem and some solutions.

The solution is, instead of using all pixels with an alpha above 0, define the boundary to be something like 0.5.

It may depend on the implementation, but changing pixel values other than alpha 0 will affect the visibility of the image. If it is just for reference, no problem. In a face image dataset I saw, it applied blur filter to the padded pixels to prevent jaggedness. (appling blur filter to the entire image and then mask it with alpha==0.)

LoganDark commented 1 year ago

changing pixel values other than alpha 0 will affect the visibility of the image

This is true, but so will upscaling itself.

Maybe it should be adjustable.

nagadomi commented 1 year ago

Sometimes the quality of the alpha channel is not good enough, so it might be useful to be able to fix it, but it is not a problem with waifu2x. What does not work is the case where the alpha channel of the entire image is such that alpha = 0.1.

LoganDark commented 1 year ago

What does not work is the case where the alpha channel of the entire image is such that alpha = 0.1.

Yes, indeed. User adjustability would be needed in that case.

LoganDark commented 1 year ago

I found a way to avoid the premultiplication on saving too.

upscaled result:

haie

alpha channel removed (except for transparent black):

haie2

(ignore the missing spots - I do edge bleeding on individual tiles to reduce memory pressure, I never convert the entire image to a tensor)

job summary:

―――― unlimited:waifu2x job completed ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
· Input: 560x560 (313600px)
· Output: 1120x1120 (1254400px)
· Model: cunet.art
· Denoise: 3
· Scale: 2
· Tile size: 64
· TTA level: 0
· Alpha: true
· Threads: 12

→ run: 8860.85ms
    → run.canvas: 0.80ms
        → run.canvas.getContext: 0.00ms
        → run.canvas.clear: 0.00ms
        → run.canvas.backdrop: 0.79ms
    → run.glReader: 13.96ms
        → run.glReader.input: 5.32ms
        → run.glReader.output: 8.59ms
    → run.tiledRender: 8845.63ms
        → run.tiledRender.collectTiles: 8835.88ms
            → run.tiledRender.collectTiles.calculateTotalPixels: 0.03ms
            → run.tiledRender.collectTiles.newScratch: 0.02ms
            → run.tiledRender.collectTiles.tile: 81 occurrences; min 75.67ms, max 331.64ms, avg 109.08ms, total 8835.56ms
                → run.tiledRender.collectTiles.tile.pick: 81 occurrences; min 0.00ms, max 0.01ms, avg 0.01ms, total 0.54ms
                → run.tiledRender.collectTiles.tile.calculateTileMetrics: 81 occurrences; min 0.00ms, max 0.07ms, avg 0.00ms, total 0.35ms
                → run.tiledRender.collectTiles.tile.reportCallbackStarted: 81 occurrences; min 0.01ms, max 0.23ms, avg 0.02ms, total 1.77ms
                → run.tiledRender.collectTiles.tile.indicator: 81 occurrences; min 0.00ms, max 0.04ms, avg 0.01ms, total 0.71ms
                → run.tiledRender.collectTiles.tile.capture: 81 occurrences; min 1.10ms, max 24.37ms, avg 3.86ms, total 312.46ms
                → run.tiledRender.collectTiles.tile.process: 81 occurrences; min 72.36ms, max 326.75ms, avg 104.85ms, total 8492.89ms
                    → run.tiledRender.collectTiles.tile.process.toRgbAlpha: 81 occurrences; min 0.06ms, max 0.83ms, avg 0.14ms, total 11.48ms
                    → run.tiledRender.collectTiles.tile.process.bleedEdges: 81 occurrences; min 0.04ms, max 4.04ms, avg 0.38ms, total 31.05ms
                    → run.tiledRender.collectTiles.tile.process.stretchAlpha: 81 occurrences; min 0.01ms, max 0.33ms, avg 0.05ms, total 3.76ms
                    → run.tiledRender.collectTiles.tile.process.pad: 81 occurrences; min 0.57ms, max 10.12ms, avg 1.48ms, total 120.03ms
                        → run.tiledRender.collectTiles.tile.process.pad.rgb: 81 occurrences; min 0.32ms, max 9.82ms, avg 1.07ms, total 86.34ms
                        → run.tiledRender.collectTiles.tile.process.pad.alpha: 81 occurrences; min 0.20ms, max 3.49ms, avg 0.40ms, total 32.75ms
                    → run.tiledRender.collectTiles.tile.process.model: 81 occurrences; min 70.40ms, max 306.33ms, avg 102.46ms, total 8299.40ms
                        → run.tiledRender.collectTiles.tile.process.model.batch: 81 occurrences; min 0.03ms, max 0.17ms, avg 0.09ms, total 7.29ms
                        → run.tiledRender.collectTiles.tile.process.model.run: 81 occurrences; min 70.13ms, max 306.07ms, avg 102.21ms, total 8279.30ms
                        → run.tiledRender.collectTiles.tile.process.model.unbatch: 81 occurrences; min 0.05ms, max 0.50ms, avg 0.14ms, total 11.65ms
                    → run.tiledRender.collectTiles.tile.process.rgbToImageData: 81 occurrences; min 0.17ms, max 5.31ms, avg 0.32ms, total 25.86ms
                → run.tiledRender.collectTiles.tile.crop: 81 occurrences; min 0.04ms, max 0.23ms, avg 0.06ms, total 5.03ms
                → run.tiledRender.collectTiles.tile.writePixels: 81 occurrences; min 0.02ms, max 0.10ms, avg 0.07ms, total 5.29ms
                → run.tiledRender.collectTiles.tile.clearRect: 81 occurrences; min 0.00ms, max 0.06ms, avg 0.01ms, total 1.05ms
                → run.tiledRender.collectTiles.tile.drawImage: 81 occurrences; min 0.07ms, max 0.61ms, avg 0.09ms, total 7.44ms
                → run.tiledRender.collectTiles.tile.updateCounters: 81 occurrences; min 0.00ms, max 0.01ms, avg 0.00ms, total 0.10ms
                → run.tiledRender.collectTiles.tile.reportCallbackCompleted: 81 occurrences; min 0.06ms, max 0.21ms, avg 0.07ms, total 5.61ms
        → run.tiledRender.readPixels: 7.74ms
        → run.tiledRender.outputBitmap: 1.99ms
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――

Here is the result from official unlimited:waifu2x:

hai_waifu2x_noise3_scale2x

hai_waifu2x_noise3_scale2xe

this is because it is saving the premultiplied data in the output canvas instead of using a hack to smuggle non-premultiplied data into the canvas. This is actually possible using bitmaprenderer context, you can upload a non-premultiplied ImageBitmap to the canvas, and convert it to a blob, and it will be completely lossless.

I use an offscreen canvas for this because you can't mix multiple contexts in a canvas, but also because uploading the non-premultiplied data has an undesirable visual effect as well. Best only to use it for downloads.

I wonder how many online image processors actually do not take into account the alpha premultiplication, it must be a lot. It's very difficult to avoid. I might be the only person who cares

nagadomi commented 1 year ago

Fixed by https://github.com/nagadomi/nunif/commit/2bbdef93fab6ed5e42ad84c37f77e41ea4698f7chttps://github.com/nagadomi/nunif/commit/55ce24be1408da35c9b64aa4c5278d75950a1797https://github.com/nagadomi/nunif/commit/8136f645ed4ce905d26ec7662292c703183934ff Thank you for all your help.

I changed the top input image area from Canvas to HTMLImageElement and decoded img with WebGL when upscaling is executed. In tiled_render, tile image is cropped from tensor without using temporary Canvas.