twni2016 / pomdp-baselines

Simple (but often Strong) Baselines for POMDPs in PyTorch, ICML 2022
https://sites.google.com/view/pomdp-baselines
MIT License
307 stars 42 forks source link

Handle cases when the episode length < 2 #12

Closed hai-h-nguyen closed 2 years ago

hai-h-nguyen commented 2 years ago

Hello,

I wonder do you have any thoughts on how to handle that case? Currently, you assume it never happens and the code will raise an error for such cases. However, in some domains like the MiniGrid Lava Crossing, for instance, that might happen quite often. Thanks!

hai-h-nguyen commented 2 years ago

Also, why do you have to extend the actions and the observation in https://github.com/twni2016/pomdp-baselines/blob/819f7c51dff8045c93c6580c90f6e743d6337ba6/policies/models/policy_rnn.py#L383?

twni2016 commented 2 years ago

Hello,

I wonder do you have any thoughts on how to handle that case? Currently, you assume it never happens and the code will raise an error for such cases. However, in some domains like the MiniGrid Lava Crossing, for instance, that might happen quite often. Thanks!

A quick fix is to manually force the episode length >=2, by adding a dummy transition if not.

twni2016 commented 2 years ago

Also, why do you have to extend the actions and the observation in

https://github.com/twni2016/pomdp-baselines/blob/819f7c51dff8045c93c6580c90f6e743d6337ba6/policies/models/policy_rnn.py#L383

?

You can check the meaning of these variables in

https://github.com/twni2016/pomdp-baselines/blob/629180d56641810d99653a116cca41ede65172eb/policies/models/policy_rnn.py#L172

Since at the first timestep, agent only receives initial observation, without previous action and reward, I have to add dummy prev act and rew into the trajectory for training.

hai-h-nguyen commented 2 years ago

Thank you!