Closed CatIIIIIIII closed 4 years ago
Hey, this repo does not use GAE. The returns are simply the mote carlo estimate.
Either way, we store all the data in one buffer and also store the masks, i.e is_terminals, dones, etc. These masks are used to determine if the episode has ended and calculate the returns accordingly.
Dear nik: I noticed that your code store train data into one buffer from different episode, but use GAE to calculation accumulative reward. I am a little confused here cause shouldn't GAE be used on one same episode? Regards.