How to deal with not converging with HER_SAC

HI there, I try to run multiworld env with her-sac. But for the pushing environment SawyerPushAndReachEnvEasy-v0, I run for multiple times, it could not converge. The parameters i use are: variant = dict( algorithm='HER-SAC', version='normal', algo_kwargs=dict( batch_size=256, num_epochs=1500, num_eval_steps_per_epoch=5000, num_expl_steps_per_train_loop=1000, num_trains_per_train_loop=1000, min_num_steps_before_training=1000, max_path_length=50, ), sac_trainer_kwargs=dict( discount=0.99, soft_target_tau=5e-3, target_update_period=1, policy_lr=3E-4, qf_lr=3E-4, reward_scale=1, use_automatic_entropy_tuning=True, ), replay_buffer_kwargs=dict( max_size=int(1E6), fraction_goals_rollout_goals=0.2, # equal to k = 4 in HER paper fraction_goals_env_goals=0, ), qf_kwargs=dict( hidden_sizes=[400, 300], ), policy_kwargs=dict( hidden_sizes=[400, 300], ), )

rail-berkeley / rlkit

How to deal with not converging with HER_SAC #95