isaac-sim / IsaacLab

Unified framework for robot learning built on NVIDIA Isaac Sim
https://isaac-sim.github.io/IsaacLab
Other
1.84k stars 688 forks source link

[Question] Camera and TiledCamera producing different images #493

Closed titoirfan closed 2 months ago

titoirfan commented 2 months ago

Question

Hi there,

As I was migrating my code from Orbit to IsaacLab, I tried to change my RGB Camera sensors into RGB TiledCamera sensors. However, as I was testing the TiledCamera sensor, I noticed that it produces very different images compared to Camera.

I tried to add an RGB TiledCamera sensor to the file source/standalone/tutorials/04_sensors/add_sensors_on_robot.py, then visualize the images rendered by both the RGB TiledCamera and RGB Camera using matplotlib.pyplot.

Aside from adding matplotlib.pyplot as plt and TiledCameraCfg to the imports, I added these lines to SensorsSceneCfg

    # TiledCamera
    tiled_camera = TiledCameraCfg(
        prim_path="{ENV_REGEX_NS}/Robot/base/front_cam_tiled",
        update_period=0.1,
        height=480,
        width=640,
        data_types=["rgb"],
        spawn=sim_utils.PinholeCameraCfg(
            focal_length=24.0, focus_distance=400.0, horizontal_aperture=20.955, clipping_range=(0.1, 1.0e5)
        ),
        offset=TiledCameraCfg.OffsetCfg(pos=(0.510, 0.0, 0.015), rot=(0.5, -0.5, 0.5, -0.5), convention="ros"),
    )

and these to the end of run_simulator().

        # Render RGB Camera (4 channel, RGBA, uint8)
        image = scene["camera"].data.output["rgb"].clone()
        image = image[1][..., :-1]  # Discard alpha channel
        plt.imshow(image.cpu())
        plt.show()

        # Render RGB TiledCamera (3 channel, RGB, float32)
        tiled_image = scene["tiled_camera"].data.output["rgb"].clone()
        tiled_image = tiled_image[1]
        plt.imshow(tiled_image.cpu())
        plt.show()

I then just invoke python source/standalone/tutorials/04_sensors/add_sensors_on_robot.py --enable_cameras.

Image produced by Camera: Camera Image

Image produced by TiledCamera: TiledCamera Image


By doing some similar steps, I also compared the images produced by both classes on CartpoleRGBCameraEnvCfg.

Image produced by Camera: Camera Image

Image produced by TiledCamera: TiledCamera Image


I am not sure if this is a bug, as in the Cartpole case the images produced by TiledCamera might be more desirable. However, in other cases (e.g. quadruped locomotion), I don't think that the images produced by TiledCamera are usable as they don't reflect how a camera behaves in the real world.

I tried to read the docs but all I get from it is that TiledCamera is supposed to be a faster and more efficient Camera for multi-camera simulation purposes.

Hence, my questions are:

  1. Is this difference of produced images between TiledCamera and Camera intended?
  2. If it is intended, is there a way to make TiledCamera produce images similar to Camera?

Thank you in advance!

Edit: Here is my setup, I use the conda environment provided by IsaacLab

ArneKlages4444 commented 2 months ago

We encountered similar issues with the TiledCamera. For us, the CartpoleRGBCamera returns complete black images only. In a second test, a TiledCamera sensor was added to the FrankaCubeLift environment. In there, we get noisy images that don't correspond to the scene in any way. When the camera is swapped to the standard non-tiled version, everything works fine.

System Info:

Setup1:

Setup2:

The target system for our experiments would be an HPC with L40S and H100 GPUs

Code:

Test1 CartpoleRGBCamera:

import argparse

from omni.isaac.lab.app import AppLauncher

parser = argparse.ArgumentParser(description="TEST")
parser.add_argument("--num_envs", type=int, default=2, help="")
AppLauncher.add_app_launcher_args(parser)
args_cli = parser.parse_args()
app_launcher = AppLauncher(args_cli)
simulation_app = app_launcher.app

import torch

import omni.replicator.core as rep
from omni.isaac.lab.utils import convert_dict_to_backend
from omni.isaac.lab_tasks.direct.cartpole.cartpole_camera_env import CartpoleRGBCameraEnvCfg, CartpoleCameraEnv

env_cfg = CartpoleRGBCameraEnvCfg()
env_cfg.scene.num_envs = args_cli.num_envs
env_cfg.write_image_to_file = True
env = CartpoleCameraEnv(cfg=env_cfg, render_mode="rgb_array")

rep_writer = rep.BasicWriter(
    output_dir="/workspace/isaaclab/camera_output",
    frame_padding=3
)

for i in range(100):
    a = torch.rand(env.num_envs, env.num_actions, device=env.device)
    observation, reward, terminated, truncated, info = env.step(a)
    camera = env.scene["tiled_camera"]
    for camera_index in range(camera.data.output.shape[0]):
        single_cam_data = convert_dict_to_backend(camera.data.output[camera_index], backend="numpy")
        rep_output = {"annotators": {}}
        for key, data in zip(single_cam_data.keys(), single_cam_data.values()):
            if data.shape[-1] == 4:
                data = data[..., :-1]
            rep_output["annotators"][key] = {"render_product": {"data": data}}
        rep_output["trigger_outputs"] = {"on_time": camera.frame[camera_index]}
        rep_writer.write(rep_output)

Tiled camera output:

rgb_11_000

Test2 FrankaCubeLift:

import argparse

from omni.isaac.lab.app import AppLauncher

parser = argparse.ArgumentParser(description="TEST")
parser.add_argument("--num_envs", type=int, default=2, help="")
AppLauncher.add_app_launcher_args(parser)
args_cli = parser.parse_args()
app_launcher = AppLauncher(args_cli)
simulation_app = app_launcher.app

import torch

import omni.replicator.core as rep
import omni.isaac.lab.sim as sim_utils
from omni.isaac.lab.sensors import CameraCfg, TiledCameraCfg
from omni.isaac.lab.managers import ObservationTermCfg as ObsTerm
from omni.isaac.lab.managers import SceneEntityCfg
from omni.isaac.lab_tasks.manager_based.manipulation.lift.config.franka.joint_pos_env_cfg import FrankaCubeLiftEnvCfg
from omni.isaac.lab.envs import ManagerBasedRLEnv
from omni.isaac.lab.utils import convert_dict_to_backend

def camera_img(
        env: ManagerBasedRLEnv,
        cam_cfg: SceneEntityCfg = SceneEntityCfg("camera"),
) -> torch.Tensor:
    camera_output = env.scene[cam_cfg.name].data.output["rgb"]
    camera_output = camera_output[..., :-1]  # cut depth chanel
    return camera_output.flatten(start_dim=1)

class FrankaObjectLiftEnvCfg(FrankaCubeLiftEnvCfg):
    def __post_init__(self):
        super().__post_init__()
        self.decimation = 6
        self.episode_length_s = 2.5
        self.sim.dt = 0.01

        self.scene.camera = CameraCfg(
            prim_path="{ENV_REGEX_NS}/Table/camera_sensor",
            update_period=self.decimation * self.sim.dt,
            height=84,
            width=84,
            data_types=["rgb"],
            spawn=sim_utils.PinholeCameraCfg(
                focal_length=24.0, focus_distance=400.0, horizontal_aperture=20.955, clipping_range=(0.1, 1.0e5)
            ),
            offset=CameraCfg.OffsetCfg(pos=(0.0, -0.4, 0.4),
                                       rot=(0.6532814862382034, -0.2705980584036595,
                                            0.2705980656543066, 0.6532814687335938),
                                       convention="world"),
        )

        self.scene.camera = TiledCameraCfg(
            prim_path="{ENV_REGEX_NS}/Table/camera_sensor",
            update_period=self.decimation * self.sim.dt,
            offset=TiledCameraCfg.OffsetCfg(pos=(0.0, -0.4, 0.4),
                                            rot=(0.6532814862382034, -0.2705980584036595,
                                                 0.2705980656543066, 0.6532814687335938),
                                            convention="world"),
            data_types=["rgb"],
            spawn=sim_utils.PinholeCameraCfg(
                focal_length=24.0, focus_distance=400.0, horizontal_aperture=20.955, clipping_range=(0.1, 1.0e5)
            ),
            width=84,
            height=84,
        )

        self.observations.policy.image = ObsTerm(func=camera_img)

env_cfg = FrankaObjectLiftEnvCfg()
env_cfg.scene.num_envs = args_cli.num_envs
env = ManagerBasedRLEnv(cfg=env_cfg, render_mode="rgb_array")

rep_writer = rep.BasicWriter(
    output_dir="/workspace/isaaclab/camera_output",
    frame_padding=3
)

for i in range(100):
    a = torch.randn_like(env.action_manager.action)
    observation, reward, terminated, truncated, info = env.step(a)
    camera = env.scene["camera"]
    for camera_index in range(camera.data.output.shape[0]):
        single_cam_data = convert_dict_to_backend(camera.data.output[camera_index], backend="numpy")
        rep_output = {"annotators": {}}
        for key, data in zip(single_cam_data.keys(), single_cam_data.values()):
            if data.shape[-1] == 4:
                data = data[..., :-1]
            rep_output["annotators"][key] = {"render_product": {"data": data}}
        rep_output["trigger_outputs"] = {"on_time": camera.frame[camera_index]}
        rep_writer.write(rep_output)

Console command: ./isaaclab.sh -p test.py --headless --enable_cameras --num_envs 2

Tiled camera output:

rgb_30_001

Standard camera output:

rgb_18_000

titoirfan commented 2 months ago

Thanks @ArneKlages4444 for sharing and confirming the issue!

Could the developers please take another look at this issue?

kellyguo11 commented 2 months ago

The tiled camera output is expected to be different than what we get from the renderer. Currently, the RGB tiled rendering API only provides ambient RGB, which does not capture any lighting and shadow information. We are working on a full tiled rendering API that will provide the same data as what we would achieve with the renderer.

titoirfan commented 2 months ago

Thank you for the clarification, looking forward to the full tiled rendering API.