Hidden state for subsequence

twni2016 / pomdp-baselines

Simple (but often Strong) Baselines for POMDPs in PyTorch, ICML 2022

MIT License

293 stars 41 forks source link

For the temporal credit assignment problem, I see that you're randomly choosing a start position from sampled episodes. Let's say the subseq is s4, s5..., s10. What do you pass the hidden state h3 as to the LSTM/GRU?

Also, when choosing end position, you add the context length to the randomly chosen start positions. However, it is possible that the context length is greater than the episode length right? In that case, your end position is overflowing into the next episode. How are you masking those out?

Thanks, kb

twni2016 / pomdp-baselines

Hidden state for subsequence #19