isaac-sim / OmniIsaacGymEnvs

Reinforcement Learning Environments for Omniverse Isaac Gym
Other
764 stars 202 forks source link

Running out of memory with 1 env, 16x16x3 image on a RTX 3070 Ti (8GB) GPU #90

Closed Pipe-Runner closed 8 months ago

Pipe-Runner commented 8 months ago

Hello, thanks a lot for the updates in v2023. Finally, having official support for cameras is precisely what I needed for my thesis. I have tried running your Cartpole Camera training example and things run fine. I had to dial back the number of envs to 10 for it to work. But it worked and trained just fine.

I am planning to retrofit the Franka cabinet example with the camera code. For now, I am not changing the obs buffer and just using the demo code to save images taken from the envs. But even with a single env, 16x16x3 img, it fails to even start the sim complaining about memory overflow.

This is a my retrofit code. For clarity, I have only added the 2 functions I have modified wrt to camera addition. The rest of the code is just refactored version of what you guys have and is tested, ie. works if I just set the flag ENABLE_CAMERA=False

class FrankaCabinetCameraTask(RLTask):
    def set_up_scene(self, scene) -> None:
        self.get_robots = spawn_robot(self)
        self.get_cabinets = spawn_cabinet(self)

        super().set_up_scene(scene, filter_collisions=False)

        # articulation views views
        self._robots = self.get_robots(scene)
        self._cabinets = self.get_cabinets(scene)

        # start replicator to capture image data
        self.rep.orchestrator._orchestrator._is_started = True

        # set up cameras
        if ENABLE_CAMERA:
            self.render_products = []
            env_pos = self._env_pos.cpu()
            for i in range(self._num_envs):
                camera = self.rep.create.camera(
                    position=(-4.2 + env_pos[i][0], env_pos[i][1], 3.0),
                    look_at=(env_pos[i][0], env_pos[i][1], 2.55),
                )
                render_product = self.rep.create.render_product(
                    camera, resolution=(self.camera_width, self.camera_height)
                )
                self.render_products.append(render_product)

            # initialize pytorch writer for vectorized collection
            self.pytorch_listener = self.PytorchListener()
            self.pytorch_writer = self.rep.WriterRegistry.get("PytorchWriter")
            self.pytorch_writer.initialize(
                listener=self.pytorch_listener, device="cuda"
            )
            self.pytorch_writer.attach(self.render_products)

        self.init_data()

    def get_observations(self) -> dict:
        # -------------- COMPUTATION FOR OBSERVATION BUFFER ----------------#

        hand_pos, hand_rot = self._robots._hands.get_world_poses(clone=False)
        drawer_pos, drawer_rot = self._cabinets._drawers.get_world_poses(
            clone=False
        )

        robot_dof_pos = self._robots.get_joint_positions(clone=False)
        robot_dof_vel = self._robots.get_joint_velocities(clone=False)

        (
            franka_lfinger_pos,
            franka_lfinger_rot,
        ) = self._robots._lfingers.get_world_poses(clone=False)

        cabinet_dof_pos = self._cabinets.get_joint_positions(clone=False)
        cabinet_dof_vel = self._cabinets.get_joint_velocities(clone=False)

        (
            robot_grasp_rot,
            robot_grasp_pos,
            drawer_grasp_rot,
            drawer_grasp_pos,
        ) = compute_grasp_transforms(
            hand_rot,
            hand_pos,
            self.robot_local_grasp_rot,
            self.robot_local_grasp_pos,
            drawer_rot,
            drawer_pos,
            self.drawer_local_grasp_rot,
            self.drawer_local_grasp_pos,
        )

        to_target = drawer_grasp_pos - robot_grasp_pos

        dof_pos_scaled = (
            2.0
            * (robot_dof_pos - self.robot_dof_lower_limits)
            / (self.robot_dof_upper_limits - self.robot_dof_lower_limits)
            - 1.0
        )
        dof_vel_scaled = robot_dof_vel * self.dof_vel_scale

        self.obs_buf = torch.cat(
            (
                dof_pos_scaled,  # size 9
                dof_vel_scaled,  # size 9
                to_target,  # size 3
                cabinet_dof_pos[:, 3].unsqueeze(
                    -1
                ),  # drawer joint pos - size 1
                cabinet_dof_vel[:, 3].unsqueeze(
                    -1
                ),  # drawer joint vel - size 1
            ),
            dim=-1,
        )

        if ENABLE_CAMERA:
            images = self.pytorch_listener.get_rgb_data()
            if images is not None:
                if self._export_images:
                    from torchvision.utils import save_image, make_grid

                    img = images / 255
                    save_image(make_grid(img, nrows=2), "cartpole_export.png")

                # self.obs_buf = torch.swapaxes(images, 1, 3).clone().float() / 255.0
            else:
                print("Image tensor is NONE!")

        # -------------- PRE-COMPUTATION FOR REWARD BUFFER ------------- #
        self.robot_dof_pos = robot_dof_pos
        self.cabinet_dof_pos = cabinet_dof_pos
        self.robot_grasp_pos, self.robot_grasp_rot = (
            robot_grasp_pos,
            robot_grasp_rot,
        )
        self.drawer_grasp_pos, self.drawer_grasp_rot = (
            drawer_grasp_pos,
            drawer_grasp_rot,
        )
        hand_pos, hand_rot = self._robots._hands.get_world_poses(clone=False)
        (
            self.robot_lfinger_pos,
            self.robot_lfinger_rot,
        ) = self._robots._lfingers.get_world_poses(clone=False)
        (
            self.robot_rfinger_pos,
            self.robot_rfinger_rot,
        ) = self._robots._lfingers.get_world_poses(clone=False)

        return {self._robots.name: {"obs_buf": self.obs_buf}}

The following is the config that modifies the default over frankaCabinet

TASK_CFG["task"]["env"]["numEnvs"] = 1
TASK_CFG["task"]["env"]["cameraWidth"] = 16
TASK_CFG["task"]["env"]["cameraHeight"] = 16
TASK_CFG["task"]["env"]["exportImages"] = True
TASK_CFG["task"]["env"]["envSpacing"] = 20

TASK_CFG["task"]["sim"]["rendering_dt"] = 0.0166  # 1/60 half of physics step
TASK_CFG["task"]["sim"]["enable_cameras"] = True
TASK_CFG["task"]["sim"]["add_ground_plane"] = True
TASK_CFG["task"]["sim"]["add_distant_light"] = True

The following is a screenshot of my error: image

GPU specs are here: image

It is totaly possible that I might have done something stupid so my sinsincer apologies in advance.

kellyguo11 commented 8 months ago

Hi there, 8GB of GPU memory is likely quite limited for the FrankaCabinet environment with camera. Compared with Cartpole, the FrankaCabinet environment has many more higher-fidelity meshes in the scene. You can try reducing the complexity of the scene to save some memory, such as setting numProps to 0 to reduce the number of blocks in the drawer, and removing the ground plane.

Pipe-Runner commented 8 months ago

@kellyguo11 I suspected this to be the issue. I have removed the props, but I should simplify the scene further. What GPU are you guys using to test this, by the way?

Pipe-Runner commented 8 months ago

@kellyguo11 on second thought, if I run a single env, is there a way of doing the physics on the CPU instead of the GPU? That way, I may still be able to have enough memory for the camera-related tasks.