haosulab / ManiSkill2-Learn

Apache License 2.0
80 stars 16 forks source link

how to run gail code #20

Open lijinming2018 opened 1 year ago

lijinming2018 commented 1 year ago

how to run gail code

lijinming2018 commented 1 year ago

while I run python maniskill2_learn/apis/run_rl.py configs/mfrl/gail/maniskill2_pn.py --work-dir ./logs/bc_PickCube_pointcloud_128bs_ee --gpu-ids 1 --sim-gpu-ids 0 --cfg-options "env_cfg.env_name=PickCube-v0" "env_cfg.obs_mode=pointcloud" "env_cfg.n_points=1200" "env_cfg.control_mode=pd_joint_delta_pos" "replay_cfg.buffer_filenames=../ManiSkill2/demos/rigid_body/PickCube-v0/trajectory.none.pd_joint_delta_pos_pointcloud_ee.h5" "env_cfg.obs_frame=ee" "eval_cfg.num=100" "eval_cfg.save_traj=False" "eval_cfg.save_video=False" "train_cfg.n_eval=5000" "train_cfg.total_steps=500000" "train_cfg.n_checkpoint=10000" "train_cfg.n_updates=500"

and get ‘Traceback (most recent call last): File "maniskill2_learn/apis/run_rl.py", line 522, in main() File "maniskill2_learn/apis/run_rl.py", line 486, in main run_one_process(0, 1, args, cfg) File "maniskill2_learn/apis/run_rl.py", line 461, in run_one_process main_rl(rollout, evaluator, replay, args, cfg, expert_replay=expert_replay, recent_traj_replay=recent_traj_replay) File "maniskill2_learn/apis/run_rl.py", line 296, in main_rl train_rl( File "/data/private/ljm/ManiSkill2-Learn/maniskill2_learn/apis/train_rl.py", line 209, in train_rl replay.push_batch(trajectories) File "/data/private/ljm/ManiSkill2-Learn/maniskill2_learn/env/replay_buffer.py", line 196, in push_batch self.memory.assign(slice(self.position, self.position + len(items)), items) File "/data/private/ljm/ManiSkill2-Learn/maniskill2_learn/utils/data/dict_array.py", line 830, in assign self.memory = self._assign(self.memory, indices, value) File "/data/private/ljm/ManiSkill2-Learn/maniskill2_learn/utils/data/dict_array.py", line 471, in _assign memory[key] = cls._assign(memory[key], indices, value[key], ignore_list) File "/data/private/ljm/ManiSkill2-Learn/maniskill2_learn/utils/data/dict_array.py", line 471, in _assign memory[key] = cls._assign(memory[key], indices, value[key], ignore_list) File "/data/private/ljm/ManiSkill2-Learn/maniskill2_learn/utils/data/dict_array.py", line 477, in _assign memory[indices] = value ValueError: could not broadcast input array from shape (4000,1200,3) into shape (4000,1250,3) Exception ignored in: <function SharedGDict.del at 0x7f0da445a0d0> Traceback (most recent call last): File "/data/private/ljm/ManiSkill2-Learn/maniskill2_learn/utils/data/dict_array.py", line 928, in del File "/data/private/ljm/ManiSkill2-Learn/maniskill2_learn/utils/data/dict_array.py", line 913, in _unlink File "/opt/conda/lib/python3.8/multiprocessing/shared_memory.py", line 239, in unlink ImportError: sys.meta_path is None, Python is likely shutting down Exception ignored in: <function SharedGDict.del at 0x7f0da445a0d0> Traceback (most recent call last): File "/data/private/ljm/ManiSkill2-Learn/maniskill2_learn/utils/data/dict_array.py", line 928, in del File "/data/private/ljm/ManiSkill2-Learn/maniskill2_learn/utils/data/dict_array.py", line 913, in _unlink File "/opt/conda/lib/python3.8/multiprocessing/shared_memory.py", line 239, in unlink ImportError: sys.meta_path is None, Python is likely shutting down /opt/conda/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 29 leaked shared_memory objects to clean up at shutdown’

xuanlinli17 commented 1 year ago

I think you forgot "env_cfg.n_goal_points=50" since your demos seem to contains the goal points

lijinming2018 commented 1 year ago

when I run python maniskill2_learn/apis/run_rl.py configs/mfrl/gail/maniskill2_pn.py --work-dir ./logs/bc_PickCube_pointcloud_gail --gpu-ids 1 --sim-gpu-ids 0 --cfg-options "env_cfg.env_name=PickCube-v0" "env_cfg.obs_mode=pointcloud" "env_cfg.n_points=1200" "env_cfg.control_mode=pd_ee_delta_pose" "env_cfg.n_goal_points=50" "replay_cfg.buffer_filenames=../ManiSkill2/demos/rigid_body/PickCube-v0/trajectory.none.pd_ee_delta_pose_pointcloud3.h5" "env_cfg.obs_frame=ee" "eval_cfg.save_traj=False" ,I get ''' PickCube-v0-train - (run_rl.py:261) - INFO - 2023-11-24,14:02:35 - Num of parameters: 1.33M, Model Size: 5.32M [2023-11-24 14:02:36.662] [svulkan2] [error] GLFW error: X11: Failed to open display 0 [2023-11-24 14:02:36.662] [svulkan2] [warning] Continue without GLFW. [2023-11-24 14:02:36.745] [svulkan2] [error] GLFW error: X11: Failed to open display 0 [2023-11-24 14:02:36.745] [svulkan2] [warning] Continue without GLFW. [2023-11-24 14:02:36.792] [svulkan2] [error] GLFW error: X11: Failed to open display 0 [2023-11-24 14:02:36.792] [svulkan2] [warning] Continue without GLFW. [2023-11-24 14:02:36.897] [svulkan2] [error] GLFW error: X11: Failed to open display 0 [2023-11-24 14:02:36.897] [svulkan2] [warning] Continue without GLFW. [2023-11-24 14:02:37.018] [svulkan2] [error] GLFW error: X11: Failed to open display 0 [2023-11-24 14:02:37.018] [svulkan2] [warning] Continue without GLFW. PickCube-v0-train - (run_rl.py:289) - INFO - 2023-11-24,14:02:37 - Work directory of this run ./logs/bc_PickCube_pointcloud_gail PickCube-v0-train - (run_rl.py:291) - INFO - 2023-11-24,14:02:37 - Train over GPU [1]! PickCube-v0-train - (train_rl.py:180) - INFO - 2023-11-24,14:02:37 - Rollout state dim: {'xyz': (4, 1250, 3), 'rgb': (4, 1250, 3), 'frame_related_states': (4, 4, 3), 'to_frames': (4, 2, 4, 4), 'state': (4, 30)}, action dim: (4, 7)! PickCube-v0-train - (train_rl.py:202) - INFO - 2023-11-24,14:02:37 - Begin 8000 warm-up steps with random policy! Evaluation-PickCube-v0-train-env-0 - (evaluation.py:294) - INFO - 2023-11-24,14:02:40 - The Evaluation environment has seed in 345236826! Evaluation-PickCube-v0-train-env-0 - (evaluation.py:330) - INFO - 2023-11-24,14:02:40 - Size of image in the rendered video (512, 512, 3) PickCube-v0-train - (train_rl.py:210) - INFO - 2023-11-24,14:03:06 - Warm up samples stats: rewards:31.3[3.4, 88.9], max_single_R:0.52[0.18, 0.80], lens:200[200, 200], success:0.00! {'obs': {'xyz': (8000, 1250, 3), 'rgb': (8000, 1250, 3), 'frame_related_states': (8000, 4, 3), 'to_frames': (8000, 2, 4, 4), 'state': (8000, 30)}, 'next_obs': {'xyz': (8000, 1250, 3), 'rgb': (8000, 1250, 3), 'frame_related_states': (8000, 4, 3), 'to_frames': (8000, 2, 4, 4), 'state': (8000, 30)}, 'actions': (8000, 7), 'rewards': (8000, 1), 'dones': (8000, 1), 'infos': {'elapsed_steps': (8000, 1), 'is_obj_placed': (8000, 1), 'is_robot_static': (8000, 1), 'success': (8000, 1), 'reward': (8000, 1), 'TimeLimit.truncated': (8000, 1)}, 'episode_dones': (8000, 1), 'worker_indices': (8000, 1)} PickCube-v0-train - (train_rl.py:225) - INFO - 2023-11-24,14:03:06 - Finish 8000 warm-up steps! PickCube-v0-train - (train_rl.py:244) - INFO - 2023-11-24,14:03:06 - Begin training! PickCube-v0-train - (train_rl.py:285) - INFO - 2023-11-24,14:03:09 - Replay buffer shape: {'actions': (60000, 7), 'dones': (60000, 1), 'episode_dones': (60000, 1), 'is_truncated': (60000, 1), 'next_obs': {'frame_related_states': (60000, 4, 3), 'rgb': (60000, 1250, 3), 'state': (60000, 30), 'to_frames': (60000, 2, 4, 4), 'xyz': (60000, 1250, 3)}, 'obs': {'frame_related_states': (60000, 4, 3), 'rgb': (60000, 1250, 3), 'state': (60000, 30), 'to_frames': (60000, 2, 4, 4), 'xyz': (60000, 1250, 3)}, 'rewards': (60000, 1), 'worker_indices': (60000, 1)}.

PickCube-v0-train - (train_rl.py:374) - INFO - 2023-11-24,14:09:05 - 20800/20000000(0%) Passed time:5m58s ETA:6d11h33m36s samples_stats: rewards:22.4[2.5, 67.2], max_single_R:0.48[0.20, 0.85], lens:200[200, 200], success:0.00 gpu_mem_ratio: 41.3% gpu_mem: 9.92G gpu_mem_this: 0.00G gpu_util: 0% discriminator_rewards: 0.886 critic_loss: 0.224 max_critic_abs_err: 1.323 actor_loss: -4.383 alpha: 0.178 alpha_loss: 2.098 q: 3.531 q_target: 3.572 entropy: 4.753 target_entropy: -7.000 critic_grad: 10.267 actor_grad: 0.185 episode_time: 358.256 collect_sample_time: 96.606 memory: 18.54G PickCube-v0-train - (train_rl.py:374) - INFO - 2023-11-24,14:15:00 - 33600/20000000(0%) Passed time:11m53s ETA:6d10h36m48s samples_stats: rewards:35.4[6.1, 70.8], max_single_R:0.66[0.19, 0.86], lens:200[200, 200], success:0.00 gpu_mem_ratio: 41.3% gpu_mem: 9.92G gpu_mem_this: 0.00G gpu_util: 21% discriminator_rewards: 1.462 critic_loss: 0.334 max_critic_abs_err: 2.835 actor_loss: -8.181 alpha: 0.144 alpha_loss: 1.674 q: 7.497 q_target: 7.524 entropy: 4.656 target_entropy: -7.000 critic_grad: 29.183 actor_grad: 0.138 episode_time: 354.416 collect_sample_time: 91.741 memory: 18.38G PickCube-v0-train - (train_rl.py:374) - INFO - 2023-11-24,14:20:56 - 46400/20000000(0%) Passed time:17m49s ETA:6d10h24m23s samples_stats: rewards:40.2[2.4, 108.9], max_single_R:0.71[0.28, 2.43], lens:200[200, 200], success:0.00 gpu_mem_ratio: 41.3% gpu_mem: 9.92G gpu_mem_this: 0.00G gpu_util: 0% discriminator_rewards: 1.253 critic_loss: 0.882 max_critic_abs_err: 4.930 actor_loss: -9.764 alpha: 0.117 alpha_loss: 1.348 q: 9.191 q_target: 9.233 entropy: 4.484 target_entropy: -7.000 critic_grad: 54.392 actor_grad: 0.136 episode_time: 355.570 collect_sample_time: 92.506 memory: 18.38G PickCube-v0-train - (train_rl.py:374) - INFO - 2023-11-24,14:26:54 - 59200/20000000(0%) Passed time:23m48s ETA:6d10h30m11s samples_stats: rewards:125.7[33.8, 169.0], max_single_R:1.53[0.82, 2.54], lens:200[200, 200], success:0.00 gpu_mem_ratio: 41.3% gpu_mem: 9.92G gpu_mem_this: 0.00G gpu_util: 0% discriminator_rewards: 0.380 critic_loss: 0.988 max_critic_abs_err: 5.150 actor_loss: -8.505 alpha: 9.713e-02 alpha_loss: 1.057 q: 7.967 q_target: 8.030 entropy: 3.869 target_entropy: -7.000 critic_grad: 47.301 actor_grad: 0.209 episode_time: 357.881 collect_sample_time: 94.864 memory: 18.38G PickCube-v0-train - (train_rl.py:374) - INFO - 2023-11-24,14:32:51 - 72000/20000000(0%) Passed time:29m44s ETA:6d10h20m44s samples_stats: rewards:99.5[16.5, 250.2], max_single_R:1.25[0.50, 2.87], lens:200[200, 200], success:0.00 gpu_mem_ratio: 41.3% gpu_mem: 9.92G gpu_mem_this: 0.00G gpu_util: 0% discriminator_rewards: 0.206 critic_loss: 0.606 max_critic_abs_err: 3.439 actor_loss: -8.337 alpha: 8.139e-02 alpha_loss: 0.838 q: 7.814 q_target: 7.870 entropy: 3.288 target_entropy: -7.000 critic_grad: 19.566 actor_grad: 0.242 episode_time: 355.850 collect_sample_time: 93.060 memory: 18.38G PickCube-v0-train - (train_rl.py:374) - INFO - 2023-11-24,14:39:58 - 84800/20000000(0%) Passed time:36m51s ETA:6d15h17m15s samples_stats: rewards:112.8[46.8, 187.7], max_single_R:1.22[0.77, 2.60], lens:200[200, 200], success:0.00 gpu_mem_ratio: 41.3% gpu_mem: 9.92G gpu_mem_this: 0.00G gpu_util: 0% discriminator_rewards: 0.192 critic_loss: 0.569 max_critic_abs_err: 3.061 actor_loss: -9.402 alpha: 6.837e-02 alpha_loss: 0.690 q: 8.959 q_target: 9.018 entropy: 3.085 target_entropy: -7.000 critic_grad: 21.825 actor_grad: 0.237 episode_time: 426.309 collect_sample_time: 94.114 memory: 21.33G PickCube-v0-train - (train_rl.py:374) - INFO - 2023-11-24,14:45:55 - 97600/20000000(0%) Passed time:42m48s ETA:6d14h29m18s samples_stats: rewards:115.0[15.5, 192.7], max_single_R:1.49[0.75, 2.77], lens:200[200, 200], success:0.00 gpu_mem_ratio: 41.3% gpu_mem: 9.92G gpu_mem_this: 0.00G gpu_util: 0% discriminator_rewards: 0.204 critic_loss: 0.635 max_critic_abs_err: 3.376 actor_loss: -9.904 alpha: 5.747e-02 alpha_loss: 0.566 q: 9.537 q_target: 9.601 entropy: 2.841 target_entropy: -7.000 critic_grad: 26.638 actor_grad: 0.228 episode_time: 356.770 collect_sample_time: 93.589 memory: 21.33G PickCube-v0-train - (train_rl.py:374) - INFO - 2023-11-24,14:51:52 - 110400/20000000(1%) Passed time:48m45s ETA:6d13h50m43s samples_stats: rewards:107.3[9.4, 244.3], max_single_R:1.56[0.54, 2.90], lens:200[200, 200], success:0.00 gpu_mem_ratio: 41.3% gpu_mem: 9.92G gpu_mem_this: 0.00G gpu_util: 0% discriminator_rewards: 0.280 critic_loss: 0.702 max_critic_abs_err: 3.900 actor_loss: -9.667 alpha: 4.821e-02 alpha_loss: 0.478 q: 9.375 q_target: 9.434 entropy: 2.903 target_entropy: -7.000 critic_grad: 29.599 actor_grad: 0.223 episode_time: 356.411 collect_sample_time: 93.377 memory: 21.33G PickCube-v0-train - (train_rl.py:374) - INFO - 2023-11-24,14:57:52 - 123200/20000000(1%) Passed time:54m45s ETA:6d13h29m18s samples_stats: rewards:172.6[36.8, 403.4], max_single_R:1.75[0.46, 2.83], lens:200[200, 200], success:0.00 gpu_mem_ratio: 41.3% gpu_mem: 9.92G gpu_mem_this: 0.00G gpu_util: 0% discriminator_rewards: 0.373 critic_loss: 0.971 max_critic_abs_err: 4.578 actor_loss: -9.738 alpha: 4.059e-02 alpha_loss: 0.374 q: 9.436 q_target: 9.497 entropy: 2.210 target_entropy: -7.000 critic_grad: 39.959 actor_grad: 0.247 episode_time: 359.880 collect_sample_time: 96.928 memory: 21.33G Traceback (most recent call last): File "maniskill2_learn/apis/run_rl.py", line 522, in main() File "maniskill2_learn/apis/run_rl.py", line 486, in main run_one_process(0, 1, args, cfg) File "maniskill2_learn/apis/run_rl.py", line 461, in run_one_process main_rl(rollout, evaluator, replay, args, cfg, expert_replay=expert_replay, recent_traj_replay=recent_traj_replay) File "maniskill2_learn/apis/run_rl.py", line 296, in main_rl train_rl( File "/data/private/ljm/ManiSkill2-Learn/maniskill2_learn/apis/train_rl.py", line 313, in train_rl disc_update_applied = agent.update_discriminator(expert_replay, recent_traj_replay, n_ep) File "/data/private/ljm/ManiSkill2-Learn/maniskill2_learn/methods/mfrl/gail.py", line 142, in update_discriminator self.update_discriminator_helper(expert_replay, recent_traj_replay) File "/data/private/ljm/ManiSkill2-Learn/maniskill2_learn/methods/mfrl/gail.py", line 115, in update_discriminator_helper expert_sampled_batch = expert_replay.sample(self.discriminator_batch_size // 2).to_torch( File "/data/private/ljm/ManiSkill2-Learn/maniskill2_learn/env/replay_buffer.py", line 231, in sample assert self.position == 0, "cache size should equals to buffer size" AssertionError: cache size should equals to buffer size Exception ignored in: <function SharedGDict.del at 0x7fe1a5891280> Traceback (most recent call last): File "/data/private/ljm/ManiSkill2-Learn/maniskill2_learn/utils/data/dict_array.py", line 928, in del File "/data/private/ljm/ManiSkill2-Learn/maniskill2_learn/utils/data/dict_array.py", line 913, in _unlink File "/opt/conda/lib/python3.8/multiprocessing/shared_memory.py", line 239, in unlink ImportError: sys.meta_path is None, Python is likely shutting down Exception ignored in: <function SharedGDict.del at 0x7fe1a5891280> Traceback (most recent call last): File "/data/private/ljm/ManiSkill2-Learn/maniskill2_learn/utils/data/dict_array.py", line 928, in del File "/data/private/ljm/ManiSkill2-Learn/maniskill2_learn/utils/data/dict_array.py", line 913, in _unlink File "/opt/conda/lib/python3.8/multiprocessing/shared_memory.py", line 239, in unlink ImportError: sys.meta_path is None, Python is likely shutting down Exception ignored in: <function SharedGDict.del at 0x7fe1a5891280> Traceback (most recent call last): File "/data/private/ljm/ManiSkill2-Learn/maniskill2_learn/utils/data/dict_array.py", line 928, in del File "/data/private/ljm/ManiSkill2-Learn/maniskill2_learn/utils/data/dict_array.py", line 913, in _unlink File "/opt/conda/lib/python3.8/multiprocessing/shared_memory.py", line 239, in unlink ImportError: sys.meta_path is None, Python is likely shutting down /opt/conda/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 50 leaked shared_memory objects to clean up at shutdown '''

xuanlinli17 commented 1 year ago

Looks like your config is modified from the original config, as the replay buffer size is different.

The error is due to assert self.position == 0, "cache size should equals to buffer size" in expert replay. As mentioned in readme, set the expert replay with capacity == cache_size, e.g.,

demo_replay_cfg=dict(
    type="ReplayMemory",
    capacity=int(2e4),
    num_samples=-1,
    cache_size=int(2e4),
    dynamic_loading=True,
    synchronized=False,
    keys=["obs", "actions", "dones", "episode_dones"],
    buffer_filenames=[
        "PATH_TO_DEMO.h5",
    ],
),