tensorlayer / TensorLayer

Deep Learning and Reinforcement Learning Library for Scientists and Engineers
http://tensorlayerx.com
Other
7.31k stars 1.61k forks source link

Is PPO still applicable if the length of each trajectory is different? #1074

Closed YingxiaoKong closed 3 years ago

YingxiaoKong commented 4 years ago

New Issue Checklist

Issue Description

[INSERT DESCRIPTION OF THE PROBLEM]

Reproducible Code

[INSERT CODE HERE]

# ======================================================== #
###### THIS CODE IS AN EXAMPLE, REPLACE WITH YOUR OWN ######
# ======================================================== #

import tensorflow as tf
import tensorlayer as tl

x = tf.placeholder(tf.float32, [None, 64])
net_in = tl.layers.InputLayer(x)

net = tl.layers.DenseLayer(net_in, n_units=25, act=tf.nn.relu, name='relu1')

print("Output Shape:", net.outputs.get_shape().as_list()) ### Output Shape: [None, 25]

# ======================================================== #
###### THIS CODE IS AN EXAMPLE, REPLACE WITH YOUR OWN ######
# ======================================================== #

Hi in the paper I have read, they all said the trajectories should have the same length, or a 'fixed length'. I may misunderstand this meaning. But in all the codes I saw, they all make a fixed length for each episode. Can PPO still work if the length is different?

zsdonghao commented 4 years ago

Hi, I highly recommend TL2.0, so you can use the new RL code

YingxiaoKong commented 4 years ago

Thanks! I will have a look at it now!


From: Hao notifications@github.com Sent: Monday, 23 March 2020 9:17 AM To: tensorlayer/tensorlayer tensorlayer@noreply.github.com Cc: Kong, Yingxiao yingxiao.kong@vanderbilt.edu; Author author@noreply.github.com Subject: Re: [tensorlayer/tensorlayer] Is PPO still applicable if the length of each trajectory is different? (#1074)

Hi, I highly recommend TL2.0, so you can use the new RL code

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftensorlayer%2Ftensorlayer%2Fissues%2F1074%23issuecomment-602666816&data=02%7C01%7Cyingxiao.kong%40vanderbilt.edu%7C66d212053db4435b00ea08d7cf3d50aa%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637205734559740779&sdata=7gqoxlR7mfKJrdvqNCV%2F8pKEikNQ7rBuO8kmAaU2XXA%3D&reserved=0, or unsubscribehttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAIMGH4WHOF6LAOB7NCA7XEDRI54Q5ANCNFSM4LREWUZQ&data=02%7C01%7Cyingxiao.kong%40vanderbilt.edu%7C66d212053db4435b00ea08d7cf3d50aa%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637205734559750777&sdata=9Y0TP6SPcpqseNYmWudDA3jrdpfcptAKdG7yQWBjsUk%3D&reserved=0.

quantumiracle commented 3 years ago

Hi, PPO generally does not require the length of trajectory to be fixed/same, for update in either a batch manner or based on samples a single episode/trajectory. As PPO has on-policy update process, the update may not take a batch (usually applied in off-policy algorithms like DDPG, SAC) but a single episodic trajectory, as shown in our tutorial. So the length of trajectory is never required to be fixed, the trajectory will finish once it's done. Note that for other implementations in which PPO is updated in a batch manner, a fixed length is also not required, a batch filled with samples from different trajectories can be created.

Best, Zihan