Closed DanielTakeshi closed 1 year ago
Hi Daniel, I believe it is due to an oversight in the excavate env. This env somehow sets a completely unnecessary particle cap
at excavate.py
line 31. You can delete the max_particles argument in the parameter and the super().__init__
calls to use the default particle count (65536). This should resolve the issue.
I will try to patch it together with the release of other environments.
Thanks @fbxiang including that patch would be be helpful.
BTW MPM environments use sparse reward by default, even though dense rewards have been implemented. Please pass in env_cfg.reward_mode=dense
for training.
Thanks for the fast responses @fbxiang and @xuanlinli17 !
I see the dense reward is used by default here in this pull request https://github.com/haosulab/ManiSkill2/pull/5
Also to clarify @fbxiang for anyone who's reading this, until it gets patched, you want to delete both 31 and 38 here:
The issue should be fixed in v0.2.0 by https://github.com/haosulab/ManiSkill2/pull/12
On an Ubuntu 20.04 machine with RTX 3090 GPUs (each has 24G of memory), and having installed ManiSkill2 (and ManiSkill2-Learn) as per both READMEs, I am training PPO to get a sense of the task difficulty and to better understand the code. I put this in a bash script:
This is based on the PPO instructions in the README here. The only minor differences are that I format the
LOGDIR
andENVCFG
to make it easier to swap around different environments. Also I use 10 processes instead of 5 (but I'm running this on a machine wherehtop
shows 80 CPUs).If I put this in
ppo.sh
and run it via./ppo.sh
on the command line, it seems like it will train but then I get this near what seems like the end of training:Also here is the log directory that has been created from this:
Here is the output of the train.py file which shows training details:
``` (mani_skill2) seita@lambda-dual2:~/ManiSkill2/ManiSkill2-Learn (main) $ cat logs/Excavate-v0_ppo_pn/20220805_203734-train.py agent_cfg = dict( type='PPO', gamma=0.95, lmbda=0.95, critic_coeff=0.5, entropy_coeff=0, critic_clip=False, obs_norm=False, rew_norm=True, adv_norm=True, recompute_value=True, num_epoch=2, critic_warmup_epoch=4, batch_size=330, detach_actor_feature=False, max_grad_norm=0.5, eps_clip=0.2, max_kl=0.2, dual_clip=None, shared_backbone=True, ignore_dones=True, actor_cfg=dict( type='ContinuousActor', head_cfg=dict( type='GaussianHead', init_log_std=-1, clip_return=True, predict_std=False), nn_cfg=dict( type='Visuomotor', visual_nn_cfg=dict( type='PointNet', feat_dim='pcd_all_channel', mlp_spec=[64, 128, 512], feature_transform=[]), mlp_cfg=dict( type='LinearMLP', norm_cfg=None, mlp_spec=['512 + agent_shape', 256, 256, 'action_shape'], inactivated_output=True, zero_init_output=True)), optim_cfg=dict( type='Adam', lr=0.0003, param_cfg=dict({'(.*?)visual_nn(.*?)': None}))), critic_cfg=dict( type='ContinuousCritic', nn_cfg=dict( type='Visuomotor', visual_nn_cfg=None, mlp_cfg=dict( type='LinearMLP', norm_cfg=None, mlp_spec=['512 + agent_shape', 256, 256, 1], inactivated_output=True, zero_init_output=True)), optim_cfg=dict(type='Adam', lr=0.0003))) train_cfg = dict( on_policy=True, total_steps=5000000, warm_steps=0, n_steps=20000, n_updates=1, n_eval=5000000, n_checkpoint=1000000, ep_stats_cfg=dict(info_keys_mode=dict(success=[True, 'max', 'mean']))) env_cfg = dict( type='gym', env_name='Excavate-v0', obs_mode='pointcloud', ignore_dones=True, n_points=1200, control_mode='pd_joint_delta_pos') replay_cfg = dict(type='ReplayMemory', capacity=20000) rollout_cfg = dict( type='Rollout', num_procs=10, with_info=True, multi_thread=False) eval_cfg = dict( type='Evaluation', num_procs=10, num=100, use_hidden_state=False, save_traj=False, save_video=True, log_every_step=False, env_cfg=dict(ignore_dones=False)) work_dir = None resume_from = None expert_replay_cfg = None recent_traj_replay_cfg = None (mani_skill2) seita@lambda-dual2:~/ManiSkill2/ManiSkill2-Learn (main) $ ```I have successfully trained PPO training from scratch for
PickCube-v0
,PegInsertionSide-v0
,PlugCharger-v0
, andStackCube-v0
so I don't know if this is specific to the soft body environments. Also it seems like it happened near the end of training (I think 5M is default) so it might be hard to reproduce. But just to ask are there known ways to counter this (or is this a known issue with this environment)?