Support for transforming cubemaps to perspectives

fangchuan commented 2 weeks ago

Hi, really awesome work. I am working on panorama and perspective 3D vision, and recently we are organizing a huge panorama dataset from scratch, we also try to transform panorama to perspective images using these libs. However, since our pano images have relative small resolution 512x1024, so the resulting perspective images are not good enough in terms of resolution. Since the cubemap images resolution is fine (at least 512x512), We found that directly converting cubemaps to perspectives can circumvent the resolution problem, just using Homograph transform between the cubemaps and subview perspectives. Now I can make 8 subview perspectives from the cubemap dices, the 8 subviews distribute evenly across the vertical axis. But I want do more things in this procedure, get more diverse perspectives from the cubemaps, which can rotate among roll and pitch angles, but I fail to get reasonable results when doing homograph warping. Could you help me figure out the bug, or we can push together to this project to enable transform between cubemaps and perspectives. Here is my code snippet:

    def get_K_R(FOV, THETA, PHI, height, width):
        f = 0.5 * width * 1 / np.tan(0.5 * FOV / 180.0 * np.pi)
        cx = (width - 1) / 2.0
        cy = (height - 1) / 2.0
        K = np.array([
            [f, 0, cx],
            [0, f, cy],
            [0, 0,  1],
        ], np.float32)

        y_axis = np.array([0.0, 1.0, 0.0], np.float32)
        x_axis = np.array([1.0, 0.0, 0.0], np.float32)
        z_axis = np.array([0.0, 0.0, 1.0], np.float32)
        R1, _ = cv2.Rodrigues(y_axis * np.radians(THETA))
        R2, _ = cv2.Rodrigues(np.dot(R1, x_axis) * np.radians(PHI))
        R3, _ = cv2.Rodrigues(np.dot(R2, z_axis) * 0)
        R = R2 @ R1
        return K, R

    def warp_img(fov: float, theta: float, phi: float, images: list[np.array], vx: list[float], vy: list[float]):
        """_summary_

        Args:
            fov (float): subview fov
            theta (float): subview yaw angle
            phi (float): subview pitch angle
            images (list[np.array]): cubemap images, in the order of [up, left, front, right, back, down]
            vx (list[float]): yaw angles of cubemap images, [-90, 270, 0, 90, 180, -90]
            vy (list[float]): pitch angles of cubemap images, [90, 0, 0, 0, 0, -90]

        Returns:
            np.array: subview image
        """
        input_type = images.dtype
        img_combine = np.zeros(images[0].shape).astype(input_type)

        min_theta = 10000
        for i, cube_img in enumerate(images):
            # vx: cubemap images yaws: [-90, 270, 0, 90, 180, -90]
            # vy: cubemap images pitches: [90, 0, 0, 0, 0, -90]
            _theta = vx[i]-theta
            _phi = vy[i]-phi
            print(f'cubemap image {i} theta:{theta:.3f} phi:{phi:.3f}, _theta:{_theta:.3f} _phi:{_phi:.3f}')
            plt.imshow(cube_img)
            plt.title(f'cubemap image {i}')
            plt.show()

            if i == 2 and theta > 270:
                _theta = max(360-theta, _theta)

            # skip the left, front, right, back images if the camera is looking outside 90 degree
            if i in [1, 2, 3, 4] and np.absolute(_theta) > 90:
                continue

            if i > 0 and i < 5 and np.absolute(_theta) < min_theta:
                min_theta = _theta
                min_idx = i

            im_h, im_w = cube_img.shape[:2]
            K, R = get_K_R(fov, _theta, _phi, im_h, im_w)
            homo_matrix = K@R@np.linalg.inv(K)
            img_warp1 = cv2.warpPerspective(cube_img, homo_matrix, (im_w, im_h), flags=cv2.INTER_NEAREST)
            # mask the invalid pixels warpped from the top and down view
            if i == 0:
                img_warp1[im_h//2:] = 0
            elif i == 5:
                img_warp1[:im_h//2] = 0

            img_combine += img_warp1
            plt.imshow(img_combine)
            plt.title(f'warped image {i}')
            plt.show()
        return img_combine

    def prepare_rgb_mvimages(input_dir:str,
                            output_dir:str,
                            camera_id:int,
                            image_width:int = 512,
                            image_height:int = 512,
                            init_degree:float = 0.0,
                            pitch_degree:float = 0.0,
                            num_subviews:int = 8,
                            mvp_fov:float = 90,
                            mvp_interval:float = 45,
                               ) -> List[str]:
        """ prepare MultiViewPerspective RGB images

        Args:
            input_dir (str): input folder of raw rgb data
            output_dir (str): output folder of MVP data
            camera_id (int): nenw camera id
            image_width (int): target image width
            image_height (int): target image height

        Returns:
            bool: success or not
        """
        new_cam_name = str(camera_id)
        # 6 cubemap view images
        save_cube_img_folderpath = os.path.join(output_dir, 'cubemap_rgb')
        os.makedirs(save_cube_img_folderpath, exist_ok=True)

        # 8 subview images
        save_subview_img_folderpath = os.path.join(output_dir, f'mvp_rgb')
        os.makedirs(save_subview_img_folderpath, exist_ok=True)

        subview_fov = mvp_fov
        cubemap_img_resolution = image_width
        subview_img_resolution = image_width
        rotation_interval = mvp_interval

        cube_yaw_angle_lst = [-90, 270, 0, 90, 180, -90]
        cube_pitch_angle_lst = [90, 0, 0, 0, 0, -90]

        # use rendered cubemap images, but the rendered images are sorted in different order
        # so we re-arrange the images
        rendered_cubimages_map = {
            'up': "rgb_2.jpg",
            'left': "rgb.jpg",
            'front': "rgb_4.jpg",
            'right': "rgb_1.jpg",
            'back': "rgb_5.jpg",
            'down': "rgb_3.jpg",
        }
        # as the rendered images are flipped by unknown reason, we need to flip them
        rendered_cubimages_flip_map = {
            'up': ['ROTATE_270', 'FLIP_TOP_BOTTOM'],
            'left': [],
            'front': [],
            'right': ['FLIP_LEFT_RIGHT'],
            'back': ['FLIP_LEFT_RIGHT'],
            'down': ['ROTATE_270'],
        }

        # 6 cubemap images: up, left, front, right, back, down
        cubemap_img_lst = []
        for i, (face_name, cubeimg_name) in enumerate(rendered_cubimages_map.items()):
            orig_img_path = os.path.join(input_dir, cubeimg_name)
            orig_img = Image.open(orig_img_path)
            for flip_ops in rendered_cubimages_flip_map[face_name]:
                orig_img = orig_img.transpose(getattr(PIL.Image, f'{flip_ops}'))
            subview_img = orig_img.resize((cubemap_img_resolution, cubemap_img_resolution), Image.NEAREST)
            subview_img = np.array(subview_img)    
            cubemap_img_lst.append(subview_img)
            save_path = os.path.join(save_cube_img_folderpath, f'{new_cam_name}_skybox{i}_sami.png')
            Image.fromarray(subview_img).save(save_path)
        cubemap_img_lst = np.array(cubemap_img_lst)
        # warp 8 subview images
        subview_path_lst = []
        for i in range(num_subviews):
            yaw_degree = (init_degree+rotation_interval*i) % 360
            img = warp_img(
                fov=subview_fov, theta=yaw_degree, phi=pitch_degree, images=cubemap_img_lst, vx=cube_yaw_angle_lst, vy=cube_pitch_angle_lst)
            img = cv2.resize(img, (subview_img_resolution, subview_img_resolution), interpolation=cv2.INTER_NEAREST)
            save_path = os.path.join(save_subview_img_folderpath, '%s_%.3f.png' % (new_cam_name, yaw_degree))
            Image.fromarray(img).save(save_path)
            subview_path_lst.append(save_path)

        return subview_path_lst

I am looking forward your reply @haruishi43

haruishi43 commented 2 weeks ago

@fangchuan I can definitely see some issues when doing cubemap -> equirectangular -> perspective view when dealing with low resolution cubemaps; loss of data being one. And your idea of doing cubemap -> perspective view mapping seems to be a great way of mitigating the problem. I would like to help, but unfortunately, I don't have too much free time to implement this.

On the top of my head, one way could be to create a mapping of pixels for grid sampling by combining cube2equi and equi2pers in this library and create cube2pers function. I think projecting the cubemap to a high resolution equirectangular grid and sampling the pixels for perspective view might be good enough, but we would have to try it out to see.

fangchuan commented 1 week ago

@fangchuan I can definitely see some issues when doing cubemap -> equirectangular -> perspective view when dealing with low resolution cubemaps; loss of data being one. And your idea of doing cubemap -> perspective view mapping seems to be a great way of mitigating the problem. I would like to help, but unfortunately, I don't have too much free time to implement this.

On the top of my head, one way could be to create a mapping of pixels for grid sampling by combining cube2equi and equi2pers in this library and create cube2pers function. I think projecting the cubemap to a high resolution equirectangular grid and sampling the pixels for perspective view might be good enough, but we would have to try it out to see.

Hi, thanks for your advice. I fail to generate perspective images by wrapping cubemaps, below are an example of my failure cases:

I try to warp the surrounding cubemaps to my expected subview: yaw=0.0, pitch=-26.43 degree. The front and down view work reasonably, but when I apply homograph to the left and right view, it results in a messy overlayed on the perspective image. I guess these incorrect warpped region is reasoned by the scale ambiguity in homograph transforms, since it works in the image plane where z=1.0, it may incorrectly make some pixels visible, which should be back of the image plane.

Additionally, I finally make it through cube2equi following equi2pers pipeline,

However, this pipeline is a bit time-consuming, it cost at least 1.25s in my torch dataloader, where I do 32 times cube2equi+equi2pers to get perspective images I want.

haruishi43 commented 1 week ago

Can you preprocess your cubemap offline and convert it to a high-resolution panorama (e.g., 1024x2048) for faster loading into your dataloader? Loading panorama and cropping arbitrary perspective views should be fast enough. This isn’t an elegant solution, but I recall it worked well with a 1024x2048 panorama and cropping a few 256x256 perspective views. Regarding the cube2pers pipeline, I’ll find an efficient approach when I have time!

fangchuan commented 1 week ago

Can you preprocess your cubemap offline and convert it to a high-resolution panorama (e.g., 1024x2048) for faster loading into your dataloader? Loading panorama and cropping arbitrary perspective views should be fast enough. This isn’t an elegant solution, but I recall it worked well with a 1024x2048 panorama and cropping a few 256x256 perspective views. Regarding the cube2pers pipeline, I’ll find an efficient approach when I have time!

That's be great! Yeah, I am trying to precompute all panorama beforehand. I still not give up how to warp perspectives from cubemaps directly, and if you have have any plan, please let me know if I can help. Thanks a lot!

haruishi43 / equilib

Support for transforming cubemaps to perspectives #23