Is PPO still applicable if the length of each trajectory is different?

YingxiaoKong commented 4 years ago

New Issue Checklist

[ ] I have read the Contribution Guidelines
[ ] I searched for existing GitHub issues

Issue Description

[INSERT DESCRIPTION OF THE PROBLEM]

Reproducible Code

Which OS are you using ?
Please provide a reproducible code of your issue. Without any reproducible code, you will probably not receive any help.

[INSERT CODE HERE]

# ======================================================== #
###### THIS CODE IS AN EXAMPLE, REPLACE WITH YOUR OWN ######
# ======================================================== #

import tensorflow as tf
import tensorlayer as tl

x = tf.placeholder(tf.float32, [None, 64])
net_in = tl.layers.InputLayer(x)

net = tl.layers.DenseLayer(net_in, n_units=25, act=tf.nn.relu, name='relu1')

print("Output Shape:", net.outputs.get_shape().as_list()) ### Output Shape: [None, 25]

# ======================================================== #
###### THIS CODE IS AN EXAMPLE, REPLACE WITH YOUR OWN ######
# ======================================================== #

Hi in the paper I have read, they all said the trajectories should have the same length, or a 'fixed length'. I may misunderstand this meaning. But in all the codes I saw, they all make a fixed length for each episode. Can PPO still work if the length is different?

zsdonghao commented 4 years ago

Hi, I highly recommend TL2.0, so you can use the new RL code

YingxiaoKong commented 4 years ago

Thanks! I will have a look at it now!

From: Hao notifications@github.com Sent: Monday, 23 March 2020 9:17 AM To: tensorlayer/tensorlayer tensorlayer@noreply.github.com Cc: Kong, Yingxiao yingxiao.kong@vanderbilt.edu; Author author@noreply.github.com Subject: Re: [tensorlayer/tensorlayer] Is PPO still applicable if the length of each trajectory is different? (#1074)

Hi, I highly recommend TL2.0, so you can use the new RL code

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftensorlayer%2Ftensorlayer%2Fissues%2F1074%23issuecomment-602666816&data=02%7C01%7Cyingxiao.kong%40vanderbilt.edu%7C66d212053db4435b00ea08d7cf3d50aa%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637205734559740779&sdata=7gqoxlR7mfKJrdvqNCV%2F8pKEikNQ7rBuO8kmAaU2XXA%3D&reserved=0, or unsubscribehttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAIMGH4WHOF6LAOB7NCA7XEDRI54Q5ANCNFSM4LREWUZQ&data=02%7C01%7Cyingxiao.kong%40vanderbilt.edu%7C66d212053db4435b00ea08d7cf3d50aa%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637205734559750777&sdata=9Y0TP6SPcpqseNYmWudDA3jrdpfcptAKdG7yQWBjsUk%3D&reserved=0.

quantumiracle commented 3 years ago

Hi, PPO generally does not require the length of trajectory to be fixed/same, for update in either a batch manner or based on samples a single episode/trajectory. As PPO has on-policy update process, the update may not take a batch (usually applied in off-policy algorithms like DDPG, SAC) but a single episodic trajectory, as shown in our tutorial. So the length of trajectory is never required to be fixed, the trajectory will finish once it's done. Note that for other implementations in which PPO is updated in a batch manner, a fixed length is also not required, a batch filled with samples from different trajectories can be created.

Best, Zihan

tensorlayer / TensorLayer