Open temp-64GTX opened 4 months ago
The export scaling in Gyroflow is Lanczos4 already. Preview in Gyroflow is Bilinear Bicubic is also implemented but there's no option to select it from the UI. I can add the selector, but I feel like there's something else going on in your example than Gyroflow scaling
Here is video-comparison. Youtube eats video-quality, but difference are still visible.
Added the selector in 7b48ed3, in Export settings -> Advanced
Selector is great, but nothing is changed :( All 3 methods giving almost the same results with "stairs-edges" :(
yeah that's why I think there's something else going on, please do more testing:
Short Results:
All of different (png, prores, h265, etc.) export codecs with 720p : stairs-edges. 4k & FOV 4: stairs-edges. 4k & FOV 0.25: Smooth Edges. DaVinci Resolve: crashed three times on exporting process. Then i just removed it to hell, where it came from. "use DaVinci", they said. "it fast and simple", they said.
Ok, i tried Fusion Studio. it's more or less working, but i can't understand how to tell the gyroflow that it must do a downscale. Because now it is Fusion doing it.
In Gyroflow app, set Export size to 720p, and save that in the project. Then in Fusion, go to Gyroflow plugin settings and check "Use plugin RoD for output size"
Yep. It's also giving the stairs-edges.
Can you send me the sample file?
i hope link is working. In the video, this effect will be very noticeable at the beginning - bridge lines and fence lines. https://drive.google.com/file/d/1QzovqTtn9jC0AiDLUIeQSxdAH_ByFXEs/view?usp=sharing
Meanwhile, i've tested mobile version of gyroflow. To exclude any possibility of impact the result by my particular pc-hardware. (And result is same) I put original gopro file and all 3 output files to the link above.
I have the same problem with GoPro videos. It looks like there isn't any filters used for scaling. Scaling from 4k to 8k looks good but from 4k to 720p is completely unusable. I have to export 8k and scale that down with ffmpeg to get decent output from Gyroflow.
8k export and scaled to 720p:
720p export:
I still can't reproduce this issue, the 720p exports look perfectly fine for me, and of course there are filters used when exporting, and high quality Lanczos4 is the default
Are you using high enough FOV? Examples from footage provided by @temp-64GTX with FOV set to 3
8k export to 720p:
720p export:
Ok I see it Since our scaling code is ported from OpenCV, we've inherited the scaling algorithm that is not entirely correct:
We should look into changing our scaling code to be based on pillow's: https://github.dev/zurutech/pillow-resize/blob/main/src/PillowResize/PillowResize.cc They are pretty similar from the first glance, so looks like we need to dig deeper and identify what's exactly different. Not trivial, but also not terrible
/bounty $300
/attempt #780
with your implementation plan/claim #780
in the PR body to claim the bounty/livestream
once liveThank you for contributing to gyroflow/gyroflow!
I've made some progress, I re-implemented the pillow's algorithm in Rust with a minimal example, comparing OpenCV's implementations (the code currently in Gyroflow) and Pillow's implementation.
The Pillow's one resizes the images better, however, the way it's structured is that it calculates coefficients up front for resizing from input to output. This is a problem, because in Gyroflow we need to feed input coordinates to the resampling function, because they will be rotated (stabilized), so it's different than simple resizing (where coordinates map in a simple linear way from input to output)
Example project: resizing.zip
there's pillow::sample_at_output
, which uses precomputed coefficients, but we can't use it as our source image coordinates can't be precomputed (because they are calculated in the gpu kernel per pixel)
I've implemented pillow::sample_input_at
, but this one doesn't use any precomputed coeffcients, and instead calls the resampling function right there for every sampled pixel, which will be slow.
I'm reducing the bounty to $150 since most of the work is done, just need to figure out a way how to precompute the coefficients for that use case. The tricky thing is that it depends on scale (which ideally should be input_size/output_size * fov
, but it might be enough to have input_size/output_size
, which would then be static for the whole shader)
Not sure if this would work, but what if a few pixels (say four in a square) at the center of the output image are sampled to get the input pixels coordinates (for determining the scaling factor), which is then used to pre-compute the coefficients for the whole image? Or is that also too slow? There are some edge cases though, particularly with varying scales when the frame is very tilted, resulting in different pixel scales across the image
I implemented it without precomputed coefficients in order to test, and the rendered video is now much nicer, check out the files there: https://drive.google.com/drive/folders/1brliOo0b4RLHOKbhraUBvIyRtsMIS-uj?usp=sharing
However, this improves things only when downscaling (massive improvement), but for regular videos (where there is mostly slight upscaling - see __GH011230_stabilized.mp4
), I don't see any difference, and I've been pixel peeping pretty hard
This kinda makes sense, because the main difference between these implementations is the the sampling area is scaled up (when resizing down), but it's never scaled down (when resizing up), so the case where we upscale the video should be pretty much equivalent between current and this one
This implementation is 2-3x slower to render
Not sure if this would work, but what if a few pixels (say four in a square) at the center of the output image are sampled to get the input pixels coordinates (for determining the scaling factor)
This is a good idea, but I think this new implementation only makes sense if we use the fov (which changes per frame because of dynamic zoom), and precomputing coeffs for every frame might be too much memory (and transferring it to the gpu). It would have to be benchmarked though
Well, stairs is gone. However, new one is kind of blurry. (left - AfterEffects downscale, right - new downloaded from your google-drive)
indeed hmm, maybe this really needs to be a 2-stage process, first stabilize and then resize doing it in one step sounds good on paper, but apparently has it's limitations
For reference, here's the wgsl implementation if anyone wants to play with it
fn bilinear_filter(x_: f32) -> f32 { let x = abs(x_); if x < 1.0 { return 1.0 - x; } else { return 0.0; } }
fn hamming_filter(x_: f32) -> f32 { var x = abs(x_); if x == 0.0 { return 1.0; } else if x >= 1.0 { return 0.0; } else { x = x * 3.14159265359; return (sin(x) / x) * (0.54 + 0.46 * cos(x)); } }
fn bicubic_filter(x_: f32) -> f32 { let x = abs(x_); let A: f32 = -0.5; if x < 1.0 { return ((A + 2.0) * x - (A + 3.0)) * x * x + 1.0; } else if x < 2.0 { return (((x - 5.0) * x + 8.0) * x - 4.0) * A; } else { return 0.0; } }
fn sinc_filter(x: f32) -> f32 { if x == 0.0 { return 1.0; } else { let xx = x * 3.14159265359; return sin(xx) / xx; } }
fn lanczos_filter(x: f32) -> f32 { if x >= -3.0 && x < 3.0 { return sinc_filter(x) * sinc_filter(x / 3.0); } else { return 0.0; } }
fn sample_input_at2(uv_param: vec2<f32>) -> vec4<f32> {
let filter_support = 3.0;
let scale = min(params.fov, 10.0);
let filter_scale = max(scale, 1.0);
let support = filter_support * filter_scale;
let ss = 1.0 / filter_scale;
var kx = array<f32, 64>();
var ky = array<f32, 64>();
let fix_range = bool(flags & 1);
let bg = params.background * params.max_pixel_value;
var sum = vec4<f32>(0.0);
var uv = uv_param;
if (params.input_rotation != 0.0) {
uv = rotate_point(uv, params.input_rotation * (3.14159265359 / 180.0), vec2<f32>(f32(params.width) / 2.0, f32(params.height) / 2.0));
}
if (bool(flags & 32)) { // Uses source rect
uv = vec2<f32>(
map_coord(uv.x, 0.0, f32(params.width), f32(params.source_rect.x), f32(params.source_rect.x + params.source_rect.z)),
map_coord(uv.y, 0.0, f32(params.height), f32(params.source_rect.y), f32(params.source_rect.y + params.source_rect.w))
);
}
////////////////////////////////
let xcenter = uv.x + 0.5 * scale;
let xmin = i32(floor(max(xcenter - support, 0.0)));
let xmax = max(i32(ceil(min(xcenter + support, f32(params.width)))) - xmin, 0);
var xw = 0.0;
for (var x: i32 = 0; x < xmax; x = x + 1) {
let f: f32 = (f32(x) + f32(xmin) - xcenter + 0.5) * ss;
kx[x] = lanczos_filter(f);
xw += kx[x];
}
if (xw != 0.0) { for (var x: i32 = 0; x < xmax; x = x + 1) { kx[x] /= xw; } }
////////////////////////////////
let ycenter = uv.y + 0.5 * scale;
let ymin = i32(floor(max(ycenter - support, 0.0)));
let ymax = max(i32(ceil(min(ycenter + support, f32(params.height)))) - ymin, 0);
var yw = 0.0;
for (var y: i32 = 0; y < ymax; y = y + 1) {
let f: f32 = (f32(y) + f32(ymin) - ycenter + 0.5) * ss;
ky[y] = lanczos_filter(f);
yw += ky[y];
}
if (yw != 0.0) { for (var y: i32 = 0; y < ymax; y = y + 1) { ky[y] /= yw; } }
////////////////////////////////
let sx = xmin;
let sy = ymin;
for (var yp: i32 = 0; yp < ymax; yp = yp + 1) {
if (sy + yp >= params.source_rect.y && sy + yp < params.source_rect.y + params.source_rect.w) {
var xsum = vec4<f32>(0.0, 0.0, 0.0, 0.0);
for (var xp: i32 = 0; xp < xmax; xp = xp + 1) {
var pixel: vec4<f32>;
if (sx + xp >= params.source_rect.x && sx + xp < params.source_rect.x + params.source_rect.z) {
pixel = read_input_at(vec2<i32>(sx + xp, sy + yp));
pixel = draw_pixel(pixel, u32(sx + xp), u32(sy + yp), true);
if (fix_range) {
pixel = remap_colorrange(pixel, bytes_per_pixel == 1);
}
} else {
pixel = bg;
}
xsum = xsum + (pixel * kx[xp]);
}
sum = sum + xsum * ky[yp];
} else {
sum = sum + bg * ky[yp];
}
}
return vec4<f32>(
min(sum.x, params.pixel_value_limit),
min(sum.y, params.pixel_value_limit),
min(sum.z, params.pixel_value_limit),
min(sum.w, params.pixel_value_limit)
);
}
Replaces sample_input_at
I tried implementing the same algorithm as is used in imagemagick's distort operator - elliptical weighted average with cubic bc filtering. This is just a test to evaluate performance, it does not do any real distortion, it only applies a given affine transformation https://github.com/VladimirP1/gpu-warp . To use this in gyroflow we'd have to calculate affine approximations of the transformation at each pixel of the undistorted image, which is not much harder than computing the transformation itself.
Some test transformation 4000x3000 -> downscale by 2.2 onto 1920x1080 canvas + 0.1 rad of rotation: https://drive.google.com/drive/folders/1jHVp6L73TESmmYO1VOXEmn1dAzxfqL-H?usp=sharing This exact transformation takes 54ms on UHD 630, 20ms on GTX1070 and 128ms on i9-9900k cpu PoCL
And it seems that there are some bugs left in my implementation
I seem to have fixed most bugs and tried some real warping with my code
This shows that both upscale (in the left part of image) and downscale (in the right) work
This is just some fisheye-like distortion
tried some real warping
hmm. looks interesting. Does it possible to implement some custom warping? Like, you now, lens distortion: https://github.com/gyroflow/gyroflow/issues/355
tried some real warping
hmm. looks interesting. Does it possible to implement some custom warping? Like, you now, lens distortion: #355
that's not related, it's a different issue
Progress so far: EWA, cubic bc filtering: https://youtu.be/egGW8EQafhc current filtering in gyroflow (lanczos4): https://youtu.be/KNUOr-IasBg
I am using numeric differentiation now (so basically running the distort transformation three times per output pixel instead of one). It is possible to do it in one step, but that would require adding jacobian calculation into lens models.
Is there an existing feature request for this?
Description
Good evening. Me again.
Is it possible it implement a better scale quality option? Typical case: i have a 4k video, open'd it in Gyroflow, apply stabilisation, and export to a 720p ProRes proxy, for fast editing. And this 720p video looks very "aliased". Like in games, you know, when you turn AA completely off. For example, this picture. 4k video. Upper one exported from Gyroflow in 720p. Lower one exported from Gyroflow in native 4k, and downscaled it video-editor. It's standard scale-processing, "bilinear", or something. So we can clearly see the difference. Post (tall metal white tiny thing) in the center - upper one has this typical "stairs-edges". Same "stairs-edges" we can see on the wires, and on the vertical lines of beige building at the right. Meanwhile on the lower picture - all these lines are softer and smoother.
And in dynamic video it is much more visible, than in a static picture. So i think better scale quality will be great. At least for exporting videos. I guess bilinear works fine. Also i am using "Lanczos" in XnView for scaling pictures - it gives a good quality.