isaac-sim / IsaacLab

Unified framework for robot learning built on NVIDIA Isaac Sim
https://isaac-sim.github.io/IsaacLab
Other
1.9k stars 733 forks source link

[Question] Training speed of Orbit compared to IsaacGym #373

Open hojae-io opened 4 months ago

hojae-io commented 4 months ago

Hi all,

I was using IsaacGym for my humanoid learning before and have recently moved to Orbit.

One thing that has been bothering me for a while is that I observe that the training speed in Orbit is about 40% slower than in IsaacGym.

Here are some specification for comparing training speed of Orbit and IsaacGym (legged_gym in RSL). Robot type: Humanoid / Using same URDF on both sides Parameters Orbit IsaacGym
num_steps_per_env 24 24
sim_dt 0.001 [s] 0.001 [s]
decimation 10 10
collection time / iteration 1.38 [s] 0.85 [s]

Although I wouldn't mention every parameters here, I double checked that almost all params ( physx settings, runner settings etc.) are same.

While I was tracking down where the difference is originating from, I ended up at simulator update code in decimation for-loop.

Here's the Orbit's simulator update code in rl_task_env.py:

for _ in range(self.cfg.decimation):
    # set actions into buffers
    self.action_manager.apply_action()   # Line 1
    # set actions into simulator
    self.scene.write_data_to_sim()  # Line 2
    # simulate
    self.sim.step(render=False)  # Line 3
    # update buffers at sim dt
    self.scene.update(dt=self.physics_dt)  # Line 4

Here's the IsaacGym's simulator update code in legged_robot.py:

for _ in range(self.cfg.control.decimation):
    self.torques = self._compute_torques(self.actions).view(self.torques.shape)  # Line 1
    self.gym.set_dof_actuation_force_tensor(self.sim, gymtorch.unwrap_tensor(self.torques))  # Line 2
    self.gym.simulate(self.sim)  # Line 3
    if self.device == 'cpu':
        self.gym.fetch_results(self.sim, True)
    self.gym.refresh_dof_state_tensor(self.sim)  # Line 4

As you might see, each Line 1,2,3,4 is functionally corresponding. So I measured the execution time of each Line for line-by-line comparison and here's the results.

Line Orbit [s] IsaacGym [s]
Line 1 0.00027 0.00186
Line 2 0.00417 0.00076
Line 3 0.03805 0.03389
Line 4 0.01720 0.00065
Total 0.05983 0.03725

As you can see, multiplying the difference of the total time by num_steps_per_env ((0.05983 - 0.03725) x 24 = 0.542 s) explains why the Orbit is slower roughly 40% than IsaacGym in my case.

Here are my questions:

  1. Is Orbit slower than IsaacGym by design? If so, 40% seems a quite of slow down. I'm wondering if this will be improved in the future.
  2. If not (if the training speed should be more or less same), I'm wondering what I am missing here. From my understanding, Line 3 and Line 4 have significantly slowed down. Line 3 is related to physx update and Line 4 is related to self.scene where we update the robot articulation and the sensors. I would appreciate if you can guide me if I'm missing any important parameter setting to make Orbit faster.

Thank you for the help in advance! Best,

inkyant commented 1 month ago

I have noticed the same thing. When running legged gym with the default anymal c rough terrain task, the iteration time is around 1.1 seconds while the same task in Isaac Lab is at least 2 seconds per iteration or more. According to the printout and my summaries it seems like most of the increase is in the collection time, not the learning time, so I also suspect it is an issue with orbit/isaac lab. Note I was running in a containerized cluster environment so I know i had the same hardware and software as the default config. As far as i can tell the tasks are the same in legged_gym and in isaac lab, so there shouldn't be an increase like this. This nearly doubles the training time, so I would really appreciate if this could be investigated.

Mayankm96 commented 1 month ago

Yes, some slowdown is expected from the legged-gym as there are changes in the environment's implementation. Out of my mind, these are as follows:

Recently, we updated the framework to use lazy-tensors, so we hope that reduces some of the performance gaps you see.

inkyant commented 1 month ago

Interesting, thank you for the timely response. Would you recommend I disable the ray casting calculations somehow? Or is that not possible?