Closed smorad closed 2 years ago
I just checked and this is not an issue with A2C
. Maybe it is just PPO
-related?
input_dict["obs_flat"] is already padded. so shouldn't flat.shape[0] // seq_lens.shape[0] basically be seq_lens.max() assuming the padding is correct? let me give your script a try.
Hey @smorad , when I run your example as-is, I get a train_batch with batch dim=0 inside the PPO loss function. Something is definitely not right here, but I'm not sure it's a generic RLlib problem.
@gjoliver You're probably right in general. But in my example, if you set USE_CORRECT_SHAPE=True
you'll see that the two values are not the same.
@sven1977
If you set USE_CORRECT_SHAPE
to True
, you should see a crash and hopefully the issue will be more clear. I'm using ray-1.7.0
which might differ from master.
Hi @sven1977, @gjoliver and @smorad,
I was able to reproduce both Sven's and Steven's errors in master.
The error, or at least part of it seems to arise as follows:
simple_list_collector, is creating SampleBatches with the max_seq_len set to the configuration value: https://github.com/ray-project/ray/blob/9dba5e0eadd3a065023d6cc7cafff631355c980a/rllib/evaluation/collectors/simple_list_collector.py#L329-L337
The sample_batch during training uses this value to construct indexes into the batch for training https://github.com/ray-project/ray/blob/9dba5e0eadd3a065023d6cc7cafff631355c980a/rllib/policy/sample_batch.py#L871-L874
But if rnn_sequencing:pad_batch_to_sequences_of_same_size sets dynamic_max to True, which it is doing in this repro script, then the indexes will be off when none of the rollout episodes last max_seq_len (200) steps. Hint: at the beginning of training none of them do.
This leads to one of two error conditions:
If the adjusted start and stop times are within the bounds of the SampleBatch then you will get a batch but it will likely not be aligned to the actual start of the batch because the max_seq_len is too large (@smorad's error).
If the adjusted start or stop of the minibatch exceed the bounds of the SampleBatch then you will get an empty SampleBatch (@sven1977's error) .
P.S @smorad If you are using 1 or fewer gpus then adding ""simple_optimizer": True" to the config seems to avoid this issue.
thanks @mvindiola1, I can reproduce these pretty reliably too. I guess the start or stop indices shouldn't be conditioned on max_seq_len right? I need to read SampleBatch a bit more, but we should clean this up.
@gjoliver
I changed this:
To the snippet below and I think it fixes the misalignment issue. I did not see it anymore in my quick test.
feature_sequences, initial_states, seq_lens = \
chop_into_sequences(
feature_columns=[batch[k] for k in feature_keys_],
state_columns=[batch[k] for k in state_keys],
episode_ids=batch.get(SampleBatch.EPS_ID),
unroll_ids=batch.get(SampleBatch.UNROLL_ID),
agent_indices=batch.get(SampleBatch.AGENT_INDEX),
seq_lens=batch.get(SampleBatch.SEQ_LENS),
max_seq_len=max_seq_len,
dynamic_max=dynamic_max,
states_already_reduced_to_init=states_already_reduced_to_init,
shuffle=shuffle)
for i, k in enumerate(feature_keys_):
batch[k] = feature_sequences[i]
for i, k in enumerate(state_keys):
batch[k] = initial_states[i]
batch[SampleBatch.SEQ_LENS] = np.array(seq_lens)
if dynamic_max:
batch.max_seq_len = max(seq_lens)
As for smorad's original issue where the max of seq_lens does not match the padded seq len, you will still see that is the case. The reason is as follows. The padding is done according to the longest sequence on the entire SampleBatch collected during rollouts. However the forward function in the model is given sub-sequences based on sgd_minibatch_size. So if the max sequence length is lets say 90. It is possible that a sub-sequence from that randomly sampled mini-batch will have a max sequence that is shorter. [20,32,15,25,90,45,15]. If we sample two contiguous sequences from that list then we have 2 ways of drawing a minibatch with 90 and 4 ways of drawing a minibatch without 90 which is what the repo script is detecting.
@mvindiola1 with respect to simple_optimizer
: If I use the following config, I get intermittent crashes at around 50 iters:
CFG = {
"env_config": {},
"framework": "torch",
"model": {
"custom_model": TestRNN,
"max_seq_len": MAX_SEQ_LEN,
},
"num_workers": 0,
"num_gpus": 0,
"env": StatelessCartPole,
"horizon": MAX_SEQ_LEN,
"simple_optimizer": True,
}
Is this the second error condition?
2021-11-06 13:33:58,307 ERROR trial_runner.py:924 -- Trial PPO_StatelessCartPole_71918_00000: Error processing event.
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/ray/tune/trial_runner.py", line 890, in _process_trial
results = self.trial_executor.fetch_result(trial)
File "/usr/local/lib/python3.8/dist-packages/ray/tune/ray_trial_executor.py", line 788, in fetch_result
result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
File "/usr/local/lib/python3.8/dist-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/ray/worker.py", line 1625, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(AssertionError): ray::PPO.train_buffered() (pid=45224, ip=172.17.0.2, repr=PPO)
File "/usr/local/lib/python3.8/dist-packages/ray/tune/trainable.py", line 224, in train_buffered
result = self.train()
File "/usr/local/lib/python3.8/dist-packages/ray/rllib/agents/trainer.py", line 682, in train
raise e
File "/usr/local/lib/python3.8/dist-packages/ray/rllib/agents/trainer.py", line 668, in train
result = Trainable.train(self)
File "/usr/local/lib/python3.8/dist-packages/ray/tune/trainable.py", line 283, in train
result = self.step()
File "/usr/local/lib/python3.8/dist-packages/ray/rllib/agents/trainer_template.py", line 206, in step
step_results = next(self.train_exec_impl)
File "/usr/local/lib/python3.8/dist-packages/ray/util/iter.py", line 756, in __next__
return next(self.built_iterator)
File "/usr/local/lib/python3.8/dist-packages/ray/util/iter.py", line 783, in apply_foreach
for item in it:
File "/usr/local/lib/python3.8/dist-packages/ray/util/iter.py", line 783, in apply_foreach
for item in it:
File "/usr/local/lib/python3.8/dist-packages/ray/util/iter.py", line 843, in apply_filter
for item in it:
File "/usr/local/lib/python3.8/dist-packages/ray/util/iter.py", line 843, in apply_filter
for item in it:
File "/usr/local/lib/python3.8/dist-packages/ray/util/iter.py", line 783, in apply_foreach
for item in it:
File "/usr/local/lib/python3.8/dist-packages/ray/util/iter.py", line 783, in apply_foreach
for item in it:
File "/usr/local/lib/python3.8/dist-packages/ray/util/iter.py", line 791, in apply_foreach
result = fn(item)
File "/usr/local/lib/python3.8/dist-packages/ray/rllib/execution/train_ops.py", line 64, in __call__
learner_info = do_minibatch_sgd(
File "/usr/local/lib/python3.8/dist-packages/ray/rllib/utils/sgd.py", line 104, in do_minibatch_sgd
for minibatch in minibatches(batch, sgd_minibatch_size):
File "/usr/local/lib/python3.8/dist-packages/ray/rllib/utils/sgd.py", line 53, in minibatches
all_slices = samples._get_slice_indices(sgd_minibatch_size)
File "/usr/local/lib/python3.8/dist-packages/ray/rllib/utils/annotations.py", line 101, in _ctor
return obj(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/ray/rllib/policy/sample_batch.py", line 904, in _get_slice_indices
assert np.all(self[SampleBatch.SEQ_LENS] < slice_size), \
AssertionError: ERROR: `slice_size` must be larger than the max. seq-len in the batch!
@smorad
That is unfortunate. I only ran for 10 iterations in my test.
Did you try adding this snippet mentioned above with the standard optimizer?
if dynamic_max:
batch.max_seq_len = max(seq_lens)
@mvindiola1 adding that line doesn't seem to stop it from crashing. This time it crashed at iter 88.
ok, seems there are 2 bug here.
what do you guys think?
@gjoliver,
I thin l you are right that it should be <= but I dont think the <= is going to fix that error. I just eyeballed the code so I could be wrong but if you look at where it is called:
It is checking that the seq lengths are shorter than the sgd_minibatch_size (128) but in this example the horizon and max_seq_len is 200. When the policy gets good enough it will generate sequences longer than 127 timesteps and trigger the exception.
@smorad you could try running with a sgd_minibatch_size > 200 and see if the error goes away.
hmm, yeah, but this doesn't make any sense. why should sgd_minibatch_size has anything to do with how long the seqs are from the environment??
@gjoliver I think the max_seq_len th is to truncate BPTT but if that seq len is larger than the Mini-batch size you could not actually backprop through the seq length the user requested.
My thought on the solution here is to put a check in validate config that if the policy is recurrent then make sure the mini-batch size is >= to the max_seq_len.
@mvindiola1 that sounds like a good solution. It still allows for long or infinite-length episodes while chunking trajectories into manageable sizes.
@gjoliver @sven1977,
What is the status on this? I think this should get fixed before 1.9 is cut. I am happy to start working on a PR if it is not already in progress.
I went ahead an prepared a PR just in case.
For those reading now, you need to do the following in your config for PPO to work correctly with recurrent models:
max_seq_len = some_value
config = {
"use_simple_optimizer": True,
"horizon": max_seq_len - 1,
"model": {
"max_seq_len": max_seq_len
}
}
@smorad,
Did the way horizon works change? My understanding from looking at it in the past was that horizon would terminate the episode. What if the max_seq_len is shorter than the episode length?
For those reading now, you need to do the following in your config for PPO to work correctly with recurrent models:
max_seq_len = some_value config = { "use_simple_optimizer": True, "horizon": max_seq_len - 1, "model": { "max_seq_len": max_seq_len } }
@smorad
I tried this for PPO, but it still doesn't work for me.
I did some test, and found out that if you set sgd_minibatch_size
>=num_gpus
*max_seq_len
, it would work.
config = {
"sgd_minibatch_size": sgd_minibatch_size
"model": {
"use_lstm": true,
"max_seq_len": max_seq_len
}
}
For those reading now, you need to do the following in your config for PPO to work correctly with recurrent models:
max_seq_len = some_value config = { "use_simple_optimizer": True, "horizon": max_seq_len - 1, "model": { "max_seq_len": max_seq_len } }
@smorad I tried this for PPO, but it still doesn't work for me. I did some test, and found out that if you set
sgd_minibatch_size
>=num_gpus
*max_seq_len
, it would work.config = { "sgd_minibatch_size": sgd_minibatch_size "model": { "use_lstm": true, "max_seq_len": max_seq_len } }
Hello, is this also true for attention models? I seam to get the same error. I would like to use a shorter sequence length, should it be a divisor of horizon +1?
I am getting the same error AssertionError: ERROR: slice_size must be larger than the max. seq-len in the batch!
when I use batch_mode="complete_episodes" with PPO multi-agent training. This is even when
sgd_minibatch_size==num_gpus*max_seq_len`.
This is with ray 2.32.0
. There is no horizon parameter anymore to set to make PPO work correctly with recurrent models. Can anyone help with what config should be done with the new versions of ray?
Search before asking
Ray Component
RLlib
What happened + What you expected to happen
I would expect given two sequences
A, B
:[A, A, A, B, B]; seq_lens=[3, 2], obs.shape = [5, 1]
would be padded to[A, A, A, B, B, *]; seq_lens=[3, 2], obs.shape = [2, 3, 1]
This does not appear to be the case. For some reason rllib zero-pads
obs
to something besidesseq_lens.max()
. Even more worrisome is callingtorch.nonzero()
on theinput_dict
, which shows front-padded zeros to the observations. For example, printinginput_dict['obs'].reshape(B, T, -1) == 0
results in:The zero-padding is clearly messed up, the first five observations have been zero-padded and then we have real observations offset by five.
Versions / Dependencies
Linux Ray 1.7.0
Reproduction script
Feel free to play with the
USE_CORRECT_SHAPE
flagAnything else
Every train step
Are you willing to submit a PR?