format_rollouts - Githubissues

this is so much fun! :) but there's one more error: in format_rollouts (utils.py) we define 'rewards': []. However during training at slice_data the code tries to select the validation indexes of all key-value pairs, while the rewards is not filled (at least not in LunarLander). So it throws an index error, because the 'rewards' list remains empty.

I simply commented the rewards definition out, I hope the model still learns (based on the paper it should, haven't checked that part of the code yet)

def format_rollouts(rollouts, env):
  data = {
    'obses': [],
    'actions': [],
    'next_obses': [],
    #'rewards': []
  }

rddy / mimi

format_rollouts #2