vietnh1009 / Super-mario-bros-PPO-pytorch

Proximal Policy Optimization (PPO) algorithm for Super Mario Bros
MIT License
1.09k stars 206 forks source link

size issue on GAE process #18

Open davincibj opened 2 years ago

davincibj commented 2 years ago

While study your Mario PPO codes, https://github.com/uvipen/Super-mario-bros-PPO-pytorch/blob/master/train.py, it’s hard to understand the following codes:

################################################################################ values = torch.cat(values).detach() # torch.Size([4096])

states = torch.cat(states) gae = 0 R = [] for value, reward, done in list(zip(values, rewards, dones))[::-1]: # len(list(zip(values, rewards, dones))[::-1]) is 512 gae = gae opt.gamma opt.tau gae = gae + reward + opt.gamma next_value.detach() (1 - done) - value.detach() next_value = value R.append(gae + value) ##################################################################################

Question: with  --num_local_steps=512 and  —num_processes=8,  after 'values = torch.cat(values).detach()’, the values.shape is torch.Size([4096]). But this list:  "list(zip(values, rewards, dones))[::-1]”,  the length is 512,   which mean only the first 512 items “values" are used in the "for…loop”,  the others are discarded.

So, in every 512 local_steps, only the values of first 64(=512/8) steps are used to calculate GAE and R.  Is it a problem or I have misunderstanding?

Looking for your answer, thanks!
42kun commented 2 years ago

This code design is completely wrong!

42kun commented 2 years ago

and gae not set zero when done

zhuzhu18 commented 1 year ago

see here