What's the pros and cons of using the entire episodes to train an PPO with LSTM at each step?

xlnwel commented 4 years ago

In this code, I spotted that PPO with LSTM uses the entire trajectories for each gradient descent step, where input data is of shape [envsperbatch, nsteps, ]. I'm wondering if this is a good practice for long trajectories? Why not truncate the trajectories, i.e. for every gradient step, we use batches of shape [n_envs, stepsperbatch, ](of course, we should keep the intermediate states of LSTM in this case)? What's the pros and cons of each method?

gomlfx commented 4 years ago

Hi guys can you tell me if there's a PPO regressor I can call like the scikit LGBMRegressor() ?

On Thu, Oct 31, 2019, 6:47 PM The Raven Chaser, notifications@github.com wrote:

In this code https://github.com/openai/baselines/blob/665b888eeb688396894455a0d94febc4f712e0c0/baselines/ppo2/ppo2.py#L174, I spotted that PPO with LSTM uses the entire trajectories for each gradient descent step, where input data is of shape [envsperbatch, nsteps, ]. I'm wondering if this is a good practice for long trajectories? Why not truncate the trajectories, i.e. for every gradient step, we use batches of shape [n_envs, stepsperbatch, ](of course, we should keep the intermediate states of LSTM in this case)? What's the pros and cons of each method?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/openai/baselines/issues/1026?email_source=notifications&email_token=AIVELNFH5FJAW6GFM2AYC6LQRODENA5CNFSM4JHVAEC2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HV7N7BQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIVELNGUQ7OY3FLD7DDIDQ3QRODENANCNFSM4JHVAECQ .

christopherhesse commented 4 years ago

Won't using stepsperbatch instead of nsteps change the amount of LSTM unrolling, since you don't backprop through those intermediate states?

christopherhesse commented 4 years ago

Also I think the trajectories are already truncated, * in the shape you mentioned should be a constant regardless of the length of the actual trajectories (time for an entire episode).

xlnwel commented 4 years ago

I actually intend to truncate the unrolling. To be my best knowledge, most RNN structures, even LSTM, are not good at capturing long-time dependencies. Therefore, I doubt if it is necessary to unroll LSTM for thousands of steps. The reason I want to use a smaller sequence length is that I can use a larger batch size to gain some speed-up. However, my latest attempt to do so did not end up with a good performance:-( That was why I asked the question. BTW, the environment I used to test my code is BipedalWalker-v2 from GYM, whose maximum episodic steps is 1600(but the actual length could be much smaller thanks to a done signal)

christopherhesse commented 4 years ago

You can set nsteps to a smaller number, which will truncate the unrolling. If you need higher GPU utilization you can lower nminibatches or use more parallel environments.

xlnwel commented 4 years ago

Hi, thanks for response. I know how to do that, but I found that it impaired the performance somehow(I'm not so sure if it is because of some issue of my implementation)...Therefore, I'm wondering if this is a good practice to truncate the sequence length. For example, in OpenAI Five, is the sequence length truncated to a smaller size? How does it handle the scenario where trajectories have different sequence lengths? I personally tried to pad with zeros, but some sequences may be much more shorter than others, which makes me thinking if such a padding mechanism is a good choice.

openai / baselines

What's the pros and cons of using the entire episodes to train an PPO with LSTM at each step? #1026