Closed hai-h-nguyen closed 2 years ago
Also, why do you have to extend the actions and the observation in https://github.com/twni2016/pomdp-baselines/blob/819f7c51dff8045c93c6580c90f6e743d6337ba6/policies/models/policy_rnn.py#L383?
Hello,
I wonder do you have any thoughts on how to handle that case? Currently, you assume it never happens and the code will raise an error for such cases. However, in some domains like the MiniGrid Lava Crossing, for instance, that might happen quite often. Thanks!
A quick fix is to manually force the episode length >=2, by adding a dummy transition if not.
Also, why do you have to extend the actions and the observation in
?
You can check the meaning of these variables in
Since at the first timestep, agent only receives initial observation, without previous action and reward, I have to add dummy prev act and rew into the trajectory for training.
Thank you!
Hello,
I wonder do you have any thoughts on how to handle that case? Currently, you assume it never happens and the code will raise an error for such cases. However, in some domains like the MiniGrid Lava Crossing, for instance, that might happen quite often. Thanks!