haosulab / ManiSkill2-Learn

Apache License 2.0
78 stars 15 forks source link

Questions regarding Simulation and Network Parallelism #21

Open ErikKrauter opened 10 months ago

ErikKrauter commented 10 months ago

Hello,

I want to train a DAPG+PPO agent using 5 GPUs. I want to use the same hyperparameters as specified in the ManiSkill2 paper in Appendix F.8 in Table 9: Hyperparameters for DAPG+PPO. My question is how I need to modify the hyperparameters given that I want to train on multiple GPUs. Additionally I would like to get advice on whether is is more efficient to train and simulate on 5 GPUs or whether it is more efficient to separate training and simulation so that I train on 2 GPUs and simulate on 2 different GPUs. In the code it is required that len(args.sim_gpu_ids) == len(args.gpu_ids) meaning I could not make use of the fifth GPU that I have if I were to separate training and simulation.

I would like to train for 25e6 steps (total_steps), with 2e4 number of samples per PPO step (n_steps), number of samples per minibatch 300 (batch_size), number of critic warm up epochs 4 (critic_warmup_epoch), number of PPO updates epochs 2 (num_epoch), replay buffer capacity of 2e4 (I believe it must be the same size as n_steps, correct be if I am wrong), a model checkpoint every 1e6 steps, a final evaluation after the entire training is over.

Also I want to use a demonstration buffer with dynamic loading. From the Readme I got the following configuration. Do I need to make changes to any of the hyperparameters, like capacity and cache_size because I am training on 5 GPUs? Does the capacity of the demonstration replay buffer need to match the capacity of the experience replay buffer?

demo_replay_cfg=dict(
    type="ReplayMemory",
    capacity=int(2e4),
    num_samples=-1,
    cache_size=int(2e4),
    dynamic_loading=True,
    synchronized=False,
    keys=["obs", "actions", "dones", "episode_dones"],
    buffer_filenames=[
        "PATH_TO_DEMO.h5",
    ],
),

In the Readme the following information is provided. I am wondering if the Readme already named all of the hyperparmeters that are affected by simulation and training parallelism?

"Note that during training, if multiple simulation GPUs are used, then some arguments in the configuration file (e.g. train_cfg.n_steps, replay_cfg.capacity) are effectively multiplied by the number of simulation GPUs. Similarly, if multiple network-training GPUs are used, then some arguments (e.g. agent_cfg.batch_size) are effectively multiplied by the number of network-training GPUs."

xuanlinli17 commented 10 months ago
whether is is more efficient to train and simulate on 5 GPUs or whether it is more efficient to separate training and simulation so that I train on 2 GPUs and simulate on 2 different GPUs

I think it's better to train and simulate on all 5 GPUs; it doesn't cost much extra GPU memory. The bottleneck is mainly on the CPU.

Do I need to make changes to any of the hyperparameters, like capacity and cache_size because I am training on 5 GPUs?

For demo_replay_cfg, you don't need to change its configs. However, you do need to change other agent configs and the main replay buffer config.

For 5-GPU training, I also recommend to use a larger (#gpus x batch size) and (#gpus x n_steps) compared to a single-gpu case, otherwise the multi-GPU training setting isn't really advantageous over a single-GPU training setting.

Example config for 5-GPU training:

agent_cfg = dict(
    type="PPO",
    gamma=0.95,
    lmbda=0.95,
    critic_coeff=0.5,
    entropy_coeff=0,
    critic_clip=False,
    obs_norm=False,
    rew_norm=True,
    adv_norm=True,
    recompute_value=True,
    num_epoch=2,
    critic_warmup_epoch=4,
    batch_size=256, # this will be effectively multiplied by # GPUs
    detach_actor_feature=False,
    max_grad_norm=0.5,
    eps_clip=0.2,
    max_kl=0.2,
    dual_clip=None,
    shared_backbone=True,
    ignore_dones=True,
    dapg_lambda=0.1,
    dapg_damping=0.995,
    actor_cfg=dict(
        type="ContinuousActor",
        head_cfg=dict(
            type="GaussianHead",
            init_log_std=-1,
            clip_return=True,
            predict_std=False,
        ),
        nn_cfg=dict(
            type="Visuomotor",
            visual_nn_cfg=dict(type="PointNet", feat_dim="pcd_all_channel", mlp_spec=[64, 128, 512], feature_transform=[]),
            mlp_cfg=dict(
                type="LinearMLP",
                norm_cfg=None,
                mlp_spec=["512 + agent_shape", 256, 256, "action_shape"],
                inactivated_output=True,
                zero_init_output=True,
            ),
        ),
        optim_cfg=dict(type="Adam", lr=3e-4, param_cfg={"(.*?)visual_nn(.*?)": None}),
    ),
    critic_cfg=dict(
        type="ContinuousCritic",
        nn_cfg=dict(
            type="Visuomotor",
            visual_nn_cfg=None,
            mlp_cfg=dict(
                type="LinearMLP", norm_cfg=None, mlp_spec=["512 + agent_shape", 256, 256, 1], inactivated_output=True, zero_init_output=True
            ),
        ),
        optim_cfg=dict(type="Adam", lr=3e-4),
    ),
    demo_replay_cfg=dict(
        type="ReplayMemory",
        capacity=2e4,
        num_samples=2e4,
        keys=["obs", "actions", "dones", "episode_dones"],
        buffer_filenames=[
            "PATH_TO_DEMO.h5",
        ],
    ),
)

train_cfg = dict(
    on_policy=True,
    total_steps=int(5e6), # this will be effectively multiplied by # GPU, so 5GPUs = 25e6
    warm_steps=0,
    n_steps=int(8e3), # this will be effectively multiplied by # GPU
    n_updates=1,
    n_eval=int(1e6), # this will be effectively multiplied by # GPU
    n_checkpoint=int(5e5), # this will be effectively multiplied by # GPU
    ep_stats_cfg=dict(
        info_keys_mode=dict(
            success=[True, "max", "mean"],
        )
    ),
)

env_cfg = dict(
    type="gym",
    env_name="PickCube-v0",
    obs_mode='pointcloud',
    ignore_dones=True,
)

rollout_cfg = dict(
    type="Rollout",
    num_procs=3, # this will be effectively multiplied by # GPU
    with_info=True,
    multi_thread=False,
)

replay_cfg = dict(
    type="ReplayMemory",
    capacity=int(8e3), # this will be effectively multiplied by #GPUs, and should keep the same as train_cfg.n_steps
    sampling_cfg=dict(type="OneStepTransition", with_replacement=False),
)

eval_cfg = dict(
    type="Evaluation",
    num_procs=5, # evaluation is always performed by the first GPU, so configs will not be multiplied by the #GPUs
    num=100,
    use_hidden_state=False,
    save_traj=False,
    save_video=True,
    log_every_step=False,
    env_cfg=dict(ignore_dones=False),
)

Then when initiating experiments, use the arg gpu-ids 0 1 2 3 4