kunzmi / ImageStackAlignator

Implementation of Google's Handheld Multi-Frame Super-Resolution algorithm (from Pixel 3 and Pixel 4 camera)
GNU General Public License v3.0
392 stars 65 forks source link

ImageStackAlignator

Implementation of Google's Handheld Multi-Frame Super-Resolution algorithm (from Pixel 3 and Pixel 4 camera)

This project aims in implementing the algorithm presented in “Handheld Multi-Frame Super-Resolution” by Wronski et al. from Google Research (https://doi.org/10.1145/3306346.3323024).

The paper contains several errors and inaccuracies, which is why an implementation is not straight forward and simple. One can only wonder why and how this could have passed the peer review process at ACM, but also the authors never replied my mails trying to get clarification. Nevertheless, I managed to get a working implementation that is capable of reproducing the results, my assumptions and guesses about what the authors actually meant, thus seem to be not too far off. As neither the authors nor ACM (the publisher) seem to be interested in having a corrected version of the paper, I will go through the individual steps and try to explain the reasoning in my implementation. I also improved things here and there compared to the version in the paper and I will point out what I changed.

The algorithm is mainly implemented in CUDA, embedded in a framework able to read RAW files coming from my camera, a PENTAX K3. DNG files are also supported, so many raw files coming from mobile phones should work and most DNG files converted from other camera manufacturers should also work, but I couldn’t test every file. The RAW file reading routines are based on DCRAW and the Adobe DNG SDK, and just for my own fun I implemented a while ago everything in C# and don’t use a ready-to-use library. As the GUI is based on WPF, you need a Windows PC with powerful NVIDIA GPU to play with the code. Main limitation for the GPU is memory, the more you have, the better. Having an older GeForce TITAN with 6GB of RAM, I can align 50 images with 24 mega pixels each.

The algorithm / overview

I will describe the single steps and my modifications of the algorithm based on that figure taken from the original paper: Overview

And finally, as a little fun fact: despite that I had to re-invent the wheel at some important steps using assumptions on what could be possibly meant, the parameter ranges given in the paper's supplementary material work perfectly. My guesses thus seem to be right. But once more: Why Wronski et al. publish so many errors, given that they obviously know that it is wrong? One doesn't replace edge by fraclam1lam2 by mistake...

The application: Aligning an image stack

1) Click on “Select images...” to open a few RAW images. If you click on a file in the list you can see a simple debayered preview image in the left image pane. Note that image size and exposure time should be identical in all files (this is not verified...)!\ app1 2) The pre-align tab allows to set the following settings:\ Sigma LP: the standard deviation for a gaussian blur filter (in real space)\ High pass: The cut-off frequency for a high pass filter in Fourier space as radius (maximum is thus 0.5, but limited to reasonable values here).\ HP sigma: Standard deviation of the blur of the previous cut-off frequency.\ Clear axis: Sets the pixels with the given distance to the main axis to zero in Fourier space. This was for some testing and shouldn’t be used.\ Green channel only: Create the B/W image only using the green channel and not a weighted average of RGB. Might help in case of heavy chromatic aberrations.\ Rot range and incr.: the range (+/- the value) and search increment for rotational search. Pentax camera measure an absolute roll angle during acquisition, why this search then only tries to determine a small offset. For other cameras the values must be chosen larger.\ app21 Clicking on Test values or Test CC shows the selected image filtered with these parameters or performs a cross-correlation check where one image is intentionally shifted by 5 pixels and this shift must be found by cross-correlation.\ Finally click on Compute shifts. Browsing through the files shows them now including pre-alignment.\ app2 3) Having done the pre-alignment, we move on to patch tracking. First, we define the patch sizes, scaling factors and maximum allowed shift per frame.\ Reference chooses the reference image to use. By default this is the one determined in the step before. Block size is the size of a block to measure the displacements. Choosing strategy Full cancels out the block size parameter as the entire series is one big block. But memory is restricted and you might run out of memory if choosing Full for large image stacks. Only on reference is the simple tracking routine as used by Google research, On reference block groups images of block size around the reference frame and every frame is compared to each frame in the block. Blocks measures distances only in frames inside a moving block around the current frame. The reference frame is neglected here. Threshold: If the minimum L2 distance plus the threshold is larger than the maximum L2 distance of the same patch (for a different shift), the found shift is set to zero. This filters out flat areas. After Track patches one can inspect the result by clicking on Show tracked patches. By switching between reference frame and another frame one can verify the correct tracking. The checkbox Include pre-alignment adds the pre-alignment to the shift vector and removes it from the image.\ It is to note that Max. Shift has to be given as the desired value + 1 as the largest shift in the displacement / cross-correlation map cannot be used due to the interpolation. For a value of 2 the largest possible shift is thus 1.5 pixels.\ app3 4) Accumulation:\ Sigma ST: standard deviation to blur the derivates to compute the structure tensor.\ Sigma LP: standard deviation to apply before computing the deviations. (This time no high-pass filter is applied)\ Dth, Dtr, kDetail, kDenoise are parameters as described in the paper. In short: on the left side for low noise pictures, on the right side for noisy pictures.\ Iteration LK: How many iterations of Lucas-Kanade optical flow to perform for final precise alignment.\ Window LK: The window size for Lucas Kanade.\ Min. Det. LK: Determinant threshold for Lucas Kanade. If the matrix in Lucas-Kanade has a eigen value smaller than the threshold, no shift is applied.\ Erode size: Size of the erosion kernel in the uncertainty mask.\ After clicking on Prepare accumulation one can inspect the reconstruction kernel for every pixel my moving the mouse over the image.\ Clicking then on Accumulate adds the selected images to the final result. This way one can add one image after the other, but also all images at once by selecting them all together. This helps debugging...\ If Clear results is set, the final image buffer is cleared before every new image added.\ Super resolution activates the super resolution feature, resampling the final image at double the original resolution. When changing this flag, Prepare accumulation must be done again!\ app4 5) Finally, the last tab takes the result buffer from the step before, applies the chosen tone curve and color settings and allows to save the result as a 16 bit TIFF image.\ app5

Some results

New York scene - 5 frames

Reference frame decoded with Adobe Camera RAW:

newyorkFrame3

Merge of 5 frames using this implementation (The applied tone curves differ a little):

newyorkMerged

Reference frame decoded with Adobe Camera RAW (Crop 1:1):

newyorkFrame3Crop1

Merge of 5 frames using this implementation (Crop 1:1):

newyorkMergedCrop1

Super-Resolution test chart with a Samsung Galaxy S10e (20 frames)

Area Cropped:

superRes

Merge of 20 images in super-resolution mode:

SuperResCrop

Developped DNG (reference frame) using Adobe Camera RAW (resized by factor 2 using bicubic interpolation):

SuperResAdobe

Out of camera JPEG (resized by factor 2 using bicubic interpolation):

samsung

Night sky at Grand Canyon (34 frames with 10 second exposure each)

grandCanyon