rddy / mimi

Code for the paper, "First Contact: Unsupervised Human-Machine Co-Adaptation via Mutual Information Maximization"
MIT License
23 stars 2 forks source link

format_rollouts #2

Closed guyko81 closed 2 years ago

guyko81 commented 2 years ago

this is so much fun! :) but there's one more error: in format_rollouts (utils.py) we define 'rewards': []. However during training at slice_data the code tries to select the validation indexes of all key-value pairs, while the rewards is not filled (at least not in LunarLander). So it throws an index error, because the 'rewards' list remains empty.

I simply commented the rewards definition out, I hope the model still learns (based on the paper it should, haven't checked that part of the code yet)

def format_rollouts(rollouts, env):
  data = {
    'obses': [],
    'actions': [],
    'next_obses': [],
    #'rewards': []
  }
guyko81 commented 2 years ago

Oh, sorry, you just fixed it 5 hours before my issue. Sorry for not checking before!