Open jsuarez5341 opened 2 years ago
may be related to: https://github.com/ray-project/ray/issues/23202 can you help create a minimal repro script, so we can add it as a unit test etc? thanks.
Think this is related to a problem I've experienced where if the same sequence is sampled twice in the same training batch, then they will have the same episode_ids
, unroll_ids
and agent_indicies
and therefore will be grouped together by the chop_into_sequences
function.
This is caused by seq_lens
not being passed in the call to the chop_into_sequences
function so it has to try to deduce the lengths itself. The symptom of this is that there is now a sequence that is double the length that it should be and therefore if dynamic_max
is set, it will automatically adjust the maximum sequence length to this doubled value.
The fix is to include this line (in qmix_policy.py
) as shown highlighted below:
Search before asking
Ray Component
RLlib
What happened + What you expected to happen
This issue is part of the ongoing effort to add Neural MMO to RLlib's integration tests
I am trying to get QMix working with Neural MMO. I have simplified down to a fixed, 2 agents in one group. The observation and action spaces of NMMO are still hierarchical (dicts).
First problem: RLlib aggressively instantiates models here https://github.com/ray-project/ray/blob/master/rllib/agents/trainer.py#L2381 (not sure for what purpose). This then calls obs flattening logic here https://github.com/ray-project/ray/blob/31ed9e5d02e0b5c8bbbdd3126e1b54dd25f477b9/rllib/agents/qmix/qmix_policy.py#L541, which fails on NMMO obs.
You can patch the flattening logic (which shouldn't even be being called in the first place, given that we want structured obs and are using a custom model) like this
to fully flatten obs.
Even once this is fixed, there are additional errors in calling the model. It appears agents/qmix/qmix_policy acts as a wrapper over custom models, which are recurrent by default. For some reason, it passes seq_lens=None as an argument (https://github.com/ray-project/ray/blob/31ed9e5d02e0b5c8bbbdd3126e1b54dd25f477b9/rllib/agents/qmix/qmix_policy.py#L615), which is needed for a variety of purposes.
Versions / Dependencies
ray==1.10.0, python 3.9, Ubuntu 20.04
Reproduction script
https://github.com/NeuralMMO/baselines/blob/rllib-debug/repro_qmix_lstm_lens.py
You can skip the first ~400 lines -- mostly network definitions to process structured obs.
You can get the NMMO dependency with pip install nmmo[rllib]
Anything else
I am willing to help PR this and other QMix fixes, but I'd need some idea of why flattening logic is being called and where to get seq_lens from in order to pass them along
Are you willing to submit a PR?