Open FrankTianTT opened 11 months ago
Thanks for pointing this out! Fixing dreamer is one my top priorities for the next release, i'll do my best to address this asap
@vmoens Cool! I have noticed that there is a issue listing some potential improvement (https://github.com/pytorch/rl/issues/916), I am really looking forward to it!
And just like MPC, dreamer is also failed to deal with early-stop env, I am addressing it now. But I have a question, WorldModelWrapper
wrap the transition and reward model, but not cover terminated model, Why?
To make dreamer is sensitive to done
, we need learn it explicitly. To do that, we need a new WorldModelWrapper
including transition_model
, terminated_model
and reward_model
, maybe we could take transition_model
as an optional arg?
Describe the bug
ObservationNorm
with fixedobs_norm_state_dict
is used in dreamer example, and thisobs_norm_state_dict
is estimated by onlycfg.init_env_steps
steps rollout. When the observation ispixels
, some pixels have never changed during the whole trajectory, which makes zero scales.Altough there are
eps=1e-6
making scales non-zero, the scale of observation after normalizing is still too large, which makesEncoder
outputs None, and causes huge actor and model losses.To Reproduce
Run with exactly the default config of dreamer example.
Expected behavior
By setting
obs_norm_state_dict
to a normal value, we can avoid this issue.Screenshots
Dreamer__8044d54b_23_11_13-01_54_04
is the original one, andDreamer__8c83e177_23_11_13-02_02_24
is the modified one.loss_model_kl
,loss_world_model
,loss_model_reco
,loss_model_reward
andgrad_world_model
is None in original one, andr_training
is low, seems that the original learned nothing.BTW, the modified one is also broken in around 155k,
loss_actor
increases huge suddenly. However, I am not an expert of dreamer, could someone tells me why?System info
Reason and Possible fixes
see Expected behavior.
Checklist