RLlib getting rid of rollout_fragment length and replace with something more dynamic/interpretable

avnishn commented 2 years ago

Description

rollout_fragment_length is super confusing for RL users.

It isn't used in PPO for controlling sampling, train batch size is used for controlling sampling.

It is used by DQN, but rollout_fragment_length is actually a function of the train batch size, which in off policy algorithms is the amount of samples to collect from replay buffer before running a training step.

We could potentially dynamically size this or make it as a ratio with respect to train batch size, which sort of represents the replacement speed (e.g. for every few training steps we replace the whole rb)

Use case

No response

mvindiola1 commented 2 years ago

Hi @avnishn,

I agree that the rollout_fragment length is confusing at first but it serves a few purposes that are not mentioned in this issue so I thought I would bring them up because the replacement should be mindful to keep these functionalities.

In the algorithms that use a replay buffer the rollout fragment length makes it possible to express how many new environment steps to sample and add to the replay buffer between each training step. It is the rollout_fragment_length that indicates how often to train not the train_batch_size. So for example if num_steps_sampled_before_learning_starts=50 training batch size=50 and the r_f_l=10 the it will do something like the following. Collect num_steps_sampled_before_learning_starts (50), train bs=50, collect 10num_workers steps and add them to buffer, train with a batch size of 50, collect 10num_workers new steps add them to buffer, .... If you change it to 1 then it will sample num_worker new steps. I used to think that train_bach_size was dictating the amount of samples to collect between updates but it is not, r_f_l is.

If you take the FACMAC paper for example (https://arxiv.org/abs/2003.06709) they use three environments. MAParticles, SMAC, and MA-MUJOCO. In the first two environments they sample a complete episode and then perform a training step, in the MUJOCO case they update after every step of the environments. I know RLLIB does not support FACMAC I just happened to have this one open when typing this. The point is with r_f_l we can currently express something like "samples this many new steps then update". I consider it important to make sure that rllib keeps that ability.

The second use case is in the on-policy cases like (A2C, PPO, etc.) that calculate a value: In those cases the r_f_l determines the size of a trajectory for the postprocess_fn and the postprosess_trajectory callback. Those are in turn used to compute the return, and advantages on a fragment. They also bootstrap when the fragment does not contain a terminal state. So it is used to control sampling because it lets you express how many steps to include in the n-step return and advantage computations. This is often called horizon in the literature, which is different from rllibs horizon config parameter because that one ends an episode.

stale[bot] commented 1 year ago

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

If you'd like to keep the issue open, just leave any comment, and the stale label will be removed!
If you'd like to get more attention to the issue, please tag one of Ray's contributors.

You can always ask for help on our discussion forum or Ray's public slack channel.

stale[bot] commented 1 year ago

Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.

Please feel free to reopen or open a new issue if you'd still like it to be addressed.

Again, you can always ask for help on our discussion forum or Ray's public slack channel.

Thanks again for opening the issue!

Rohan138 commented 1 year ago

Reopening: I agree that rollout_fragment_length and train_batch_size need more documentation, we should keep this open for now

stale[bot] commented 11 months ago