this is so much fun! :)
but there's one more error: in format_rollouts (utils.py) we define 'rewards': []. However during training at slice_data the code tries to select the validation indexes of all key-value pairs, while the rewards is not filled (at least not in LunarLander). So it throws an index error, because the 'rewards' list remains empty.
I simply commented the rewards definition out, I hope the model still learns (based on the paper it should, haven't checked that part of the code yet)
this is so much fun! :) but there's one more error: in format_rollouts (utils.py) we define 'rewards': []. However during training at slice_data the code tries to select the validation indexes of all key-value pairs, while the rewards is not filled (at least not in LunarLander). So it throws an index error, because the 'rewards' list remains empty.
I simply commented the rewards definition out, I hope the model still learns (based on the paper it should, haven't checked that part of the code yet)