Closed BaitingLuo closed 2 months ago
Hi Marc,
I'm trying to obtain the results for deterministic env by running commands like: 1R2R run_example examples.development --config examples.config._1R2R.mujoco.hopper_medium_expert --seed 1 --gpus 1
Following configuration: params = deepcopy(base_params) params.update({ 'domain': 'hopper', 'task': 'medium-expert-v2', 'exp_name': 'hopper_medium_expert' }) params['kwargs'].update({ 'pool_load_path': 'd4rl/hopper-medium-expert-v2', 'dynamic_risk': 'cvar_0.9', # or 'wang_0.1' 'rollout_length': 5, })
and
'epoch_length': 1000, 'n_epochs': 2000, 'train_every_n_steps': 1, 'n_train_repeat': 1, 'eval_render_mode': None, 'eval_n_episodes': 20, 'eval_deterministic': False, 'separate_mean_var': True, 'evaluate_interval': 10, 'discount': 0.99, 'tau': 5e-3, 'reward_scale': 1.0, 'critic_lr': 3e-4, 'actor_lr': 1e-4, 'adv_lr': 3e-4, 'real_ratio': 0.5, 'model_train_freq': 1000, 'model_retain_epochs': 5, 'rollout_batch_size': 50e3, 'deterministic': False, 'num_networks': 7, 'num_elites': 5, 'max_model_t': None, 'pretrain_bc': True
But results don't quite meet expectations. The paper mentioned that "we optimise the standard expected value objective", so I wonder if I need to change certain parameters for the deterministic Mujoco?
Thanks!
Hi Marc,
I'm trying to obtain the results for deterministic env by running commands like: 1R2R run_example examples.development --config examples.config._1R2R.mujoco.hopper_medium_expert --seed 1 --gpus 1
Following configuration: params = deepcopy(base_params) params.update({ 'domain': 'hopper', 'task': 'medium-expert-v2', 'exp_name': 'hopper_medium_expert' }) params['kwargs'].update({ 'pool_load_path': 'd4rl/hopper-medium-expert-v2', 'dynamic_risk': 'cvar_0.9', # or 'wang_0.1' 'rollout_length': 5, })
and
But results don't quite meet expectations. The paper mentioned that "we optimise the standard expected value objective", so I wonder if I need to change certain parameters for the deterministic Mujoco?
Thanks!