isaac-sim / IsaacLab

Unified framework for robot learning built on NVIDIA Isaac Sim
https://isaac-sim.github.io/IsaacLab
Other
1.9k stars 727 forks source link

[Bug Report] Problem with offscreen_render for recording video during training #230

Closed MiladShafiee closed 5 months ago

MiladShafiee commented 7 months ago

Thanks for amazing software.

Describe the bug

I would like to record video during training to check the progress, When I run the RL task with off-screen render, I receive the following error after 7 learning iteration, without this argument it works properly:

Steps to reproduce

./orbit.sh -p source/standalone/workflows/rsl_rl/train.py --task=Isaac-Velocity-Rough-Anymal-C-v0 --headless --num_envs 4096 --video --offscreen_render

Setting seed: 42
2024-02-05 19:51:56 [50,563ms] [Error] [omni.kit.app._impl] [py stderr]: /data/shafiee/isaac_sim-2023.1.0-hotfix.1/extscache/omni.pip.torch-2_0_1-2.0.2+105.1.lx64/torch-2-0-1/torch/nn/modules/module.py:1501: UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters(). (Triggered internally at ../aten/src/ATen/native/cudnn/RNN.cpp:982.)
  return forward_call(*args, **kwargs)

/data/shafiee/isaac_sim-2023.1.0-hotfix.1/extscache/omni.pip.torch-2_0_1-2.0.2+105.1.lx64/torch-2-0-1/torch/nn/modules/module.py:1501: UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters(). (Triggered internally at ../aten/src/ATen/native/cudnn/RNN.cpp:982.)
  return forward_call(*args, **kwargs)
2024-02-05 19:51:58 [52,445ms] [Warning] [omni.hydra] Disabling DLSS Frame Generation for at least one view due to incompatible render outputs and OmniGraph postprocessing being active
.
.
.
.

          Learning iteration 7/1500                        

                       Computation: 15505 steps/s (collection: 6.249s, learning 0.091s)
               Value function loss: 0.0071
                    Surrogate loss: -0.0102
             Mean action noise std: 0.91
                       Mean reward: -3.50
               Mean episode length: 172.67
Episode Reward/track_lin_vel_xy_exp: 0.0359
Episode Reward/track_ang_vel_z_exp: 0.0282
       Episode Reward/lin_vel_z_l2: -0.0297
      Episode Reward/ang_vel_xy_l2: -0.0319
     Episode Reward/dof_torques_l2: -0.0222
         Episode Reward/dof_acc_l2: -0.0463
     Episode Reward/action_rate_l2: -0.0387
      Episode Reward/feet_air_time: -0.0050
 Episode Reward/undesired_contacts: -0.0594
Episode Reward/flat_orientation_l2: 0.0000
     Episode Reward/dof_pos_limits: 0.0000
         Curriculum/terrain_levels: 3.2462
Metrics/base_velocity/error_vel_xy: 0.2704
Metrics/base_velocity/error_vel_yaw: 0.2551
      Episode Termination/time_out: 3.7917
  Episode Termination/base_contact: 2.0000
--------------------------------------------------------------------------------
                   Total timesteps: 786432
                    Iteration time: 6.34s
                        Total time: 52.16s
                               ETA: 9733.5s

2024-02-05 18:54:05 [107,594ms] [Error] [__main__] 'RLTaskEnv' object has no attribute 'action_manager'
2024-02-05 18:54:05 [107,595ms] [Error] [__main__] Traceback (most recent call last):
  File "/data/shafiee/softwares/Orbit/source/standalone/workflows/rsl_rl/train.py", line 137, in <module>
    main()
  File "/data/shafiee/softwares/Orbit/source/standalone/workflows/rsl_rl/train.py", line 128, in main
    runner.learn(num_learning_iterations=agent_cfg.max_iterations, init_at_random_ep_len=True)
  File "/data/shafiee/softwares/anaconda3/envs/orbit/lib/python3.10/site-packages/rsl_rl/runners/on_policy_runner.py", line 112, in learn
    obs, rewards, dones, infos = self.env.step(actions)
  File "/data/shafiee/softwares/Orbit/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/utils/wrappers/rsl_rl/vecenv_wrapper.py", line 161, in step
    obs_dict, rew, terminated, truncated, extras = self.env.step(actions)
  File "/data/shafiee/softwares/anaconda3/envs/orbit/lib/python3.10/site-packages/gymnasium/wrappers/record_video.py", line 155, in step
    ) = self.env.step(action)
  File "/data/shafiee/softwares/anaconda3/envs/orbit/lib/python3.10/site-packages/gymnasium/wrappers/order_enforcing.py", line 56, in step
    return self.env.step(action)
  File "/data/shafiee/softwares/Orbit/source/extensions/omni.isaac.orbit/omni/isaac/orbit/envs/rl_task_env.py", line 162, in step
    self.action_manager.process_action(action)
AttributeError: 'RLTaskEnv' object has no attribute 'action_manager'

2024-02-05 18:54:08 [110,298ms] [Warning] [carb] Plugin interface for a client: omni.hydratexture.plugin was already released.
[110.941s] Simulation App Shutting Down
2024-02-05 18:54:09 [111,037ms] [Warning] [omni.core.ITypeFactory] Module /data/shafiee/isaac_sim-2023.1.0-hotfix.1/kit/exts/omni.activity.core/bin/libomni.activity.core.plugin.so remained loaded after unload request.

System Info

Describe the characteristic of your environment:

Checklist

WillMandil001 commented 7 months ago

Exact same here! same error as well! perhaps we are not initialising the record_video.py correctly? - I am using sb3 not rsl_rl if that matters.

Let me know if you figure something out!

Mayankm96 commented 7 months ago

Hi @WillMandil001 @MiladShafiee ,

I have been unable to reproduce this issue on my end. Could it be because of insufficient RAM that the code crashes? On my PC, running the above command already takes 32 GB RAM during training. Quite a lot for such an example though.

I suggest reducing the image resolution in the ViewerCfg. Hopefully that reduces the consumption.

WillMandil001 commented 6 months ago

Hi @MiladShafiee @Mayankm96 I have a sketchy fix:

The RAM wasnt the issue.

The error is produced because when the gymnasium script video_recorder.py is called to close() it actually closes the whole environment (line 148 for me - self.env.close()).

Thats why non of the env.manager functions can be found.

Commenting out this line is a gross but an easy fix...

MiladShafiee commented 5 months ago

Hi @WillMandil001 @Mayankm96 , Sorry for super late answer, have not received notification (my github setting).

Thank you so much, the RAM was not my issue neither. Commenting the env.close() solved my problem too. Just for further reference I modified video_recorder.py in the following directory (there are multiple video_recorder here and there): /anaconda3/envs/orbit/lib/python3.10/site-packages/gymnasium/wrappers/monitoring

Since it solved somehow, I close this issue.

GiulioRomualdi commented 3 months ago

Thank you @MiladShafiee for the hint

yuqiang-yang commented 1 month ago

Hi @WillMandil001 @Mayankm96 , Sorry for super late answer, have not received notification (my github setting).

Thank you so much, the RAM was not my issue neither. Commenting the env.close() solved my problem too. Just for further reference I modified video_recorder.py in the following directory (there are multiple video_recorder here and there): /anaconda3/envs/orbit/lib/python3.10/site-packages/gymnasium/wrappers/monitoring

Since it solved somehow, I close this issue.

yes, there are so many video_recorder.py. In my case, I should modify the video_recorder.py in .local/share/ov/pkg/isaac-sim4.0.0/exts/omni.isaac.ml_archive/pip_prebundle/gymnasium/wrappers/monitoring