HI there, I try to run multiworld env with her-sac. But for the pushing environment SawyerPushAndReachEnvEasy-v0, I run for multiple times, it could not converge. The parameters i use are:
variant = dict(
algorithm='HER-SAC',
version='normal',
algo_kwargs=dict(
batch_size=256,
num_epochs=1500,
num_eval_steps_per_epoch=5000,
num_expl_steps_per_train_loop=1000,
num_trains_per_train_loop=1000,
min_num_steps_before_training=1000,
max_path_length=50,
),
sac_trainer_kwargs=dict(
discount=0.99,
soft_target_tau=5e-3,
target_update_period=1,
policy_lr=3E-4,
qf_lr=3E-4,
reward_scale=1,
use_automatic_entropy_tuning=True,
),
replay_buffer_kwargs=dict(
max_size=int(1E6),
fraction_goals_rollout_goals=0.2, # equal to k = 4 in HER paper
fraction_goals_env_goals=0,
),
qf_kwargs=dict(
hidden_sizes=[400, 300],
),
policy_kwargs=dict(
hidden_sizes=[400, 300],
),
)
HI there, I try to run multiworld env with her-sac. But for the pushing environment SawyerPushAndReachEnvEasy-v0, I run for multiple times, it could not converge. The parameters i use are: variant = dict( algorithm='HER-SAC', version='normal', algo_kwargs=dict( batch_size=256, num_epochs=1500, num_eval_steps_per_epoch=5000, num_expl_steps_per_train_loop=1000, num_trains_per_train_loop=1000, min_num_steps_before_training=1000, max_path_length=50, ), sac_trainer_kwargs=dict( discount=0.99, soft_target_tau=5e-3, target_update_period=1, policy_lr=3E-4, qf_lr=3E-4, reward_scale=1, use_automatic_entropy_tuning=True, ), replay_buffer_kwargs=dict( max_size=int(1E6), fraction_goals_rollout_goals=0.2, # equal to k = 4 in HER paper fraction_goals_env_goals=0, ), qf_kwargs=dict( hidden_sizes=[400, 300], ), policy_kwargs=dict( hidden_sizes=[400, 300], ), )