Closed YingxiaoKong closed 3 years ago
Hi, I highly recommend TL2.0, so you can use the new RL code
Thanks! I will have a look at it now!
From: Hao notifications@github.com Sent: Monday, 23 March 2020 9:17 AM To: tensorlayer/tensorlayer tensorlayer@noreply.github.com Cc: Kong, Yingxiao yingxiao.kong@vanderbilt.edu; Author author@noreply.github.com Subject: Re: [tensorlayer/tensorlayer] Is PPO still applicable if the length of each trajectory is different? (#1074)
Hi, I highly recommend TL2.0, so you can use the new RL code
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftensorlayer%2Ftensorlayer%2Fissues%2F1074%23issuecomment-602666816&data=02%7C01%7Cyingxiao.kong%40vanderbilt.edu%7C66d212053db4435b00ea08d7cf3d50aa%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637205734559740779&sdata=7gqoxlR7mfKJrdvqNCV%2F8pKEikNQ7rBuO8kmAaU2XXA%3D&reserved=0, or unsubscribehttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAIMGH4WHOF6LAOB7NCA7XEDRI54Q5ANCNFSM4LREWUZQ&data=02%7C01%7Cyingxiao.kong%40vanderbilt.edu%7C66d212053db4435b00ea08d7cf3d50aa%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637205734559750777&sdata=9Y0TP6SPcpqseNYmWudDA3jrdpfcptAKdG7yQWBjsUk%3D&reserved=0.
Hi, PPO generally does not require the length of trajectory to be fixed/same, for update in either a batch manner or based on samples a single episode/trajectory. As PPO has on-policy update process, the update may not take a batch (usually applied in off-policy algorithms like DDPG, SAC) but a single episodic trajectory, as shown in our tutorial. So the length of trajectory is never required to be fixed, the trajectory will finish once it's done. Note that for other implementations in which PPO is updated in a batch manner, a fixed length is also not required, a batch filled with samples from different trajectories can be created.
Best, Zihan
New Issue Checklist
Issue Description
[INSERT DESCRIPTION OF THE PROBLEM]
Reproducible Code
[INSERT CODE HERE]
Hi in the paper I have read, they all said the trajectories should have the same length, or a 'fixed length'. I may misunderstand this meaning. But in all the codes I saw, they all make a fixed length for each episode. Can PPO still work if the length is different?