pytorch / rl

A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.
https://pytorch.org/rl
MIT License
2.28k stars 304 forks source link

[BUG] dreamer example broken caused by ``ObservationNorm`` #1694

Open FrankTianTT opened 11 months ago

FrankTianTT commented 11 months ago

Describe the bug

ObservationNorm with fixed obs_norm_state_dict is used in dreamer example, and this obs_norm_state_dict is estimated by only cfg.init_env_steps steps rollout. When the observation is pixels, some pixels have never changed during the whole trajectory, which makes zero scales.

Altough there are eps=1e-6 making scales non-zero, the scale of observation after normalizing is still too large, which makes Encoder outputs None, and causes huge actor and model losses.

To Reproduce

Run with exactly the default config of dreamer example.

python examples/dreamer/dreamer.py

Expected behavior

By setting obs_norm_state_dict to a normal value, we can avoid this issue.

    # key, init_env_steps, stats = None, None, None
    # if not cfg.vecnorm and cfg.norm_stats:
    #     if not hasattr(cfg, "init_env_steps"):
    #         raise AttributeError("init_env_steps missing from arguments.")
    #     key = ("next", "pixels") if cfg.from_pixels else ("next", "observation_vector")
    #     init_env_steps = cfg.init_env_steps
    #     stats = {"loc": None, "scale": None}
    # elif cfg.from_pixels:
    #     stats = {"loc": 0.5, "scale": 0.5}
    # proof_env = transformed_env_constructor(
    #     cfg=cfg, use_env_creator=False, stats=stats
    # )()
    # initialize_observation_norm_transforms(
    #     proof_environment=proof_env, num_iter=init_env_steps, key=key
    # )
    # _, obs_norm_state_dict = retrieve_observation_norms_state_dict(proof_env)[0]
    # proof_env.close()
    obs_norm_state_dict = {"loc": 0.5, "scale": 0.5}

Screenshots

Dreamer__8044d54b_23_11_13-01_54_04 is the original one, and Dreamer__8c83e177_23_11_13-02_02_24 is the modified one.

image image

loss_model_kl, loss_world_model, loss_model_reco, loss_model_reward and grad_world_model is None in original one, and r_training is low, seems that the original learned nothing.

BTW, the modified one is also broken in around 155k, loss_actor increases huge suddenly. However, I am not an expert of dreamer, could someone tells me why?

System info

torchrl.__version__ = 0.2.1
numpy.__version__ = 1.26.1 
sys.version = 3.9.18 (main, Sep 11 2023, 13:41:44) [GCC 11.2.0]
sys.platform =  linux

Reason and Possible fixes

see Expected behavior.

Checklist

vmoens commented 11 months ago

Thanks for pointing this out! Fixing dreamer is one my top priorities for the next release, i'll do my best to address this asap

FrankTianTT commented 11 months ago

@vmoens Cool! I have noticed that there is a issue listing some potential improvement (https://github.com/pytorch/rl/issues/916), I am really looking forward to it!

And just like MPC, dreamer is also failed to deal with early-stop env, I am addressing it now. But I have a question, WorldModelWrapper wrap the transition and reward model, but not cover terminated model, Why?

To make dreamer is sensitive to done, we need learn it explicitly. To do that, we need a new WorldModelWrapper including transition_model, terminated_model and reward_model, maybe we could take transition_model as an optional arg?