marc-rigter / 1R2R

Official code for "One Risk to Rule Them All: A Risk-Sensitive Perspective on Model-Based Offline Reinforcement Learning", NeurIPS 2023
MIT License
5 stars 1 forks source link

Obtain deterministic env results #2

Closed BaitingLuo closed 2 months ago

BaitingLuo commented 2 months ago

Hi Marc,

I'm trying to obtain the results for deterministic env by running commands like: 1R2R run_example examples.development --config examples.config._1R2R.mujoco.hopper_medium_expert --seed 1 --gpus 1

Following configuration: params = deepcopy(base_params) params.update({ 'domain': 'hopper', 'task': 'medium-expert-v2', 'exp_name': 'hopper_medium_expert' }) params['kwargs'].update({ 'pool_load_path': 'd4rl/hopper-medium-expert-v2', 'dynamic_risk': 'cvar_0.9', # or 'wang_0.1' 'rollout_length': 5, })

and

    'epoch_length': 1000,
    'n_epochs': 2000,
    'train_every_n_steps': 1,
    'n_train_repeat': 1,
    'eval_render_mode': None,
    'eval_n_episodes': 20,
    'eval_deterministic': False,
    'separate_mean_var': True,
    'evaluate_interval': 10,

    'discount': 0.99,
    'tau': 5e-3,
    'reward_scale': 1.0,

    'critic_lr': 3e-4,
    'actor_lr': 1e-4,
    'adv_lr': 3e-4,
    'real_ratio': 0.5,
    'model_train_freq': 1000,
    'model_retain_epochs': 5,
    'rollout_batch_size': 50e3,
    'deterministic': False,
    'num_networks': 7,
    'num_elites': 5,
    'max_model_t': None,
    'pretrain_bc': True

But results don't quite meet expectations. The paper mentioned that "we optimise the standard expected value objective", so I wonder if I need to change certain parameters for the deterministic Mujoco?

Thanks!