Open ErikKrauter opened 10 months ago
whether is is more efficient to train and simulate on 5 GPUs or whether it is more efficient to separate training and simulation so that I train on 2 GPUs and simulate on 2 different GPUs
I think it's better to train and simulate on all 5 GPUs; it doesn't cost much extra GPU memory. The bottleneck is mainly on the CPU.
Do I need to make changes to any of the hyperparameters, like capacity and cache_size because I am training on 5 GPUs?
For demo_replay_cfg
, you don't need to change its configs. However, you do need to change other agent configs and the main replay buffer config.
For 5-GPU training, I also recommend to use a larger (#gpus x batch size) and (#gpus x n_steps) compared to a single-gpu case, otherwise the multi-GPU training setting isn't really advantageous over a single-GPU training setting.
Example config for 5-GPU training:
agent_cfg = dict(
type="PPO",
gamma=0.95,
lmbda=0.95,
critic_coeff=0.5,
entropy_coeff=0,
critic_clip=False,
obs_norm=False,
rew_norm=True,
adv_norm=True,
recompute_value=True,
num_epoch=2,
critic_warmup_epoch=4,
batch_size=256, # this will be effectively multiplied by # GPUs
detach_actor_feature=False,
max_grad_norm=0.5,
eps_clip=0.2,
max_kl=0.2,
dual_clip=None,
shared_backbone=True,
ignore_dones=True,
dapg_lambda=0.1,
dapg_damping=0.995,
actor_cfg=dict(
type="ContinuousActor",
head_cfg=dict(
type="GaussianHead",
init_log_std=-1,
clip_return=True,
predict_std=False,
),
nn_cfg=dict(
type="Visuomotor",
visual_nn_cfg=dict(type="PointNet", feat_dim="pcd_all_channel", mlp_spec=[64, 128, 512], feature_transform=[]),
mlp_cfg=dict(
type="LinearMLP",
norm_cfg=None,
mlp_spec=["512 + agent_shape", 256, 256, "action_shape"],
inactivated_output=True,
zero_init_output=True,
),
),
optim_cfg=dict(type="Adam", lr=3e-4, param_cfg={"(.*?)visual_nn(.*?)": None}),
),
critic_cfg=dict(
type="ContinuousCritic",
nn_cfg=dict(
type="Visuomotor",
visual_nn_cfg=None,
mlp_cfg=dict(
type="LinearMLP", norm_cfg=None, mlp_spec=["512 + agent_shape", 256, 256, 1], inactivated_output=True, zero_init_output=True
),
),
optim_cfg=dict(type="Adam", lr=3e-4),
),
demo_replay_cfg=dict(
type="ReplayMemory",
capacity=2e4,
num_samples=2e4,
keys=["obs", "actions", "dones", "episode_dones"],
buffer_filenames=[
"PATH_TO_DEMO.h5",
],
),
)
train_cfg = dict(
on_policy=True,
total_steps=int(5e6), # this will be effectively multiplied by # GPU, so 5GPUs = 25e6
warm_steps=0,
n_steps=int(8e3), # this will be effectively multiplied by # GPU
n_updates=1,
n_eval=int(1e6), # this will be effectively multiplied by # GPU
n_checkpoint=int(5e5), # this will be effectively multiplied by # GPU
ep_stats_cfg=dict(
info_keys_mode=dict(
success=[True, "max", "mean"],
)
),
)
env_cfg = dict(
type="gym",
env_name="PickCube-v0",
obs_mode='pointcloud',
ignore_dones=True,
)
rollout_cfg = dict(
type="Rollout",
num_procs=3, # this will be effectively multiplied by # GPU
with_info=True,
multi_thread=False,
)
replay_cfg = dict(
type="ReplayMemory",
capacity=int(8e3), # this will be effectively multiplied by #GPUs, and should keep the same as train_cfg.n_steps
sampling_cfg=dict(type="OneStepTransition", with_replacement=False),
)
eval_cfg = dict(
type="Evaluation",
num_procs=5, # evaluation is always performed by the first GPU, so configs will not be multiplied by the #GPUs
num=100,
use_hidden_state=False,
save_traj=False,
save_video=True,
log_every_step=False,
env_cfg=dict(ignore_dones=False),
)
Then when initiating experiments, use the arg gpu-ids 0 1 2 3 4
Hello,
I want to train a DAPG+PPO agent using 5 GPUs. I want to use the same hyperparameters as specified in the ManiSkill2 paper in Appendix F.8 in Table 9: Hyperparameters for DAPG+PPO. My question is how I need to modify the hyperparameters given that I want to train on multiple GPUs. Additionally I would like to get advice on whether is is more efficient to train and simulate on 5 GPUs or whether it is more efficient to separate training and simulation so that I train on 2 GPUs and simulate on 2 different GPUs. In the code it is required that
len(args.sim_gpu_ids) == len(args.gpu_ids)
meaning I could not make use of the fifth GPU that I have if I were to separate training and simulation.I would like to train for 25e6 steps (total_steps), with 2e4 number of samples per PPO step (n_steps), number of samples per minibatch 300 (batch_size), number of critic warm up epochs 4 (critic_warmup_epoch), number of PPO updates epochs 2 (num_epoch), replay buffer capacity of 2e4 (I believe it must be the same size as n_steps, correct be if I am wrong), a model checkpoint every 1e6 steps, a final evaluation after the entire training is over.
Also I want to use a demonstration buffer with dynamic loading. From the Readme I got the following configuration. Do I need to make changes to any of the hyperparameters, like capacity and cache_size because I am training on 5 GPUs? Does the capacity of the demonstration replay buffer need to match the capacity of the experience replay buffer?
In the Readme the following information is provided. I am wondering if the Readme already named all of the hyperparmeters that are affected by simulation and training parallelism?