haruishi43 / equilib

🌎→🗾Equirectangular (360/panoramic) image processing library for Python with minimal dependencies only using Numpy and PyTorch
Apache License 2.0
155 stars 21 forks source link

biased shift of cube2equi and equi2cube #8

Closed qsh-zh closed 1 year ago

qsh-zh commented 2 years ago

I find there are shifts if we run cube2equi and equi2cube. How can we avoid or decrease such shifts

cube_xt = rearrange(xt, "(b n) c h w -> b c h (n w)", n=6)

for _ in range(10):
    rec_equi_xt = cube2equi(
        cubemap = cube_xt,
        cube_format="horizon",
        height = height,
        width = width,
        # mode="bilinear",
        mode="bicubic",
        # mode="nearest",
    )

    cube_xt = equi2cube(
        equi=rec_equi_xt[None],
        rots=[{"roll": 0, "pitch": 0, "yaw": 0}],
        w_face=64,
        cube_format="horizon",
        # mode="bilinear",
        mode="bicubic",
        # mode="nearest",
    )

cube_xt = rearrange(cube_xt, "b c h (n w) -> (b n) c h w", n=6)
# show cube and xt

image image

haruishi43 commented 2 years ago

Thanks for submitting an issue.

Yeah, I've been struggling to make cube2equi -> equi2cube conversion as robust as possible and I've seen minor bias in my tests, but I haven't seen cubemaps distorting that much.

Would you mind testing this in your notebook to see if a cubemap will transform the same way it did for me? You can download the horizon image I used here

import os.path as osp
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
import equilib

cubestrip = Image.open(osp.join('other_data', 'Cubestrip.jpg'))
cubemap = np.array(cubestrip)

plt.imshow(cubemap)

chw_cubemap = cubemap.transpose(2, 0, 1)
equi = equilib.cube2equi(
    cubemap=chw_cubemap,
    cube_format="horizon",
    height=200,
    width=400,
    mode="bilinear",
)
plt.imshow(equi.transpose(1, 2, 0))

out_cubemap = equilib.equi2cube(
    equi=equi,
    rots={"roll": 0, "pitch": 0, "yaw": 0},
    w_face=100,
    cube_format="horizon",
    mode="bilinear",
)
plt.imshow(out_cubemap.transpose(1, 2, 0))

Original Cubemap: 29faeb3c-f3f7-4202-98f3-a0e6bb56b7bc

Output Cubemap: bff5c131-1d95-4bb6-a583-f8c6b0b2cd62

Unfortunately, converting back to cubemaps adds noise and distortion due to grid sampling, interpolation, rescaling, etc..., but rotational bias should be minimal.

BTW, I noticed that bicubic interpolation is buggy on numpy, so bilinear or nearest would be better to use. The bicubic interpolation works well when inputs are torch.tensor.

haruishi43 commented 2 years ago

Oh, nvm. I misunderstood your question, you looped this 10 times.

When iterating the same transform on the same data many times, it does seem the shifts increase quite a bit.

I will take a look in the internals when I have time, but it seems really hard to make the transform robust since this transform is irreversible.

qsh-zh commented 2 years ago

@haruishi43 Thanks for your reply. I find large w_face can reduce distortion compared with w_face. Do you have materials/note to explain math for cube2equi and equi2cube?

haruishi43 commented 2 years ago

@qsh-zh I will document each algorithm in the future. I haven't got around to it yet :/

All algorithms (equi2cube, cube2equi, etc...) essentially do the same thing.

  1. create a grid of coordinates that maps each of the pixels from the input image to the output array.
  2. translate the coordinates if we're given a rotation matrix
  3. sample pixels (grid_sample)

For cube2equi:

For equi2cube:

Oletus commented 1 year ago

I'm suspecting that the sampling could be offset by half a pixel, though I'm new to the library so I'd hope for someone more familiar to review my reasoning.

I looked at the numpy implementation of nearest sampling, and it rounds the sample location to the nearest integer. However, pixel centers are offset by 0.5 from the integer coordinates of pixels. So I think correct rounding for the coordinates would be floor rather than rint.

image

Similarly in bilinear filtering, the code currently reads:

    min_grid = np.floor(grid).astype(np.int64)
    max_grid = min_grid + 1
    d_grid = grid - min_grid

whereas I believe the correct implementation would be:

    min_grid = np.floor(grid - 0.5).astype(np.int64) # 0.5 is subtracted to get to the pixel center
    max_grid = min_grid + 1
    d_grid = grid - (min_grid + 0.5) # min_grid + 0.5 is the pixel center
haruishi43 commented 1 year ago

@Oletus Thanks for investigating! Does it remove the biased shift?

Oletus commented 1 year ago

I've tested the pull request and it fixes the biased shift.

However, there's also another issue - when using bilinear or cubic sampling, the samples may wrap to the other side of the cube face, instead of clamping to the edge of the cube face or wrapping to the neighboring cube face. This can also cause artifacts, and the PR might make the artifacts from wrapping worse. image

haruishi43 commented 1 year ago

closing this! thanks to @Oletus