twni2016 / pomdp-baselines

Simple (but often Strong) Baselines for POMDPs in PyTorch, ICML 2022
https://sites.google.com/view/pomdp-baselines
MIT License
293 stars 41 forks source link

Hidden state for subsequence #19

Closed kbkartik closed 1 year ago

kbkartik commented 1 year ago

For the temporal credit assignment problem, I see that you're randomly choosing a start position from sampled episodes. Let's say the subseq is s4, s5..., s10. What do you pass the hidden state h3 as to the LSTM/GRU?

Also, when choosing end position, you add the context length to the randomly chosen start positions. However, it is possible that the context length is greater than the episode length right? In that case, your end position is overflowing into the next episode. How are you masking those out?

Thanks, kb

twni2016 commented 1 year ago

I initialized the hidden state as constants of zeros, following recurrent DQN paper. Although burn-in strategy might be more effective, zero initialization is simple and can work well in many tasks.

The replay buffer sample function was implemented to prevent sampling across episodes. You can check the corresponding code.