[Bug] QMix flattening obs, not passing seq_lens to recurrent models

jsuarez5341 commented 2 years ago

Search before asking

[X] I searched the issues and found no similar issues.

Ray Component

RLlib

What happened + What you expected to happen

This issue is part of the ongoing effort to add Neural MMO to RLlib's integration tests

I am trying to get QMix working with Neural MMO. I have simplified down to a fixed, 2 agents in one group. The observation and action spaces of NMMO are still hierarchical (dicts).

First problem: RLlib aggressively instantiates models here https://github.com/ray-project/ray/blob/master/rllib/agents/trainer.py#L2381 (not sure for what purpose). This then calls obs flattening logic here https://github.com/ray-project/ray/blob/31ed9e5d02e0b5c8bbbdd3126e1b54dd25f477b9/rllib/agents/qmix/qmix_policy.py#L541, which fails on NMMO obs.

You can patch the flattening logic (which shouldn't even be being called in the first place, given that we want structured obs and are using a custom model) like this

unpacked_obs = []
for u in unpacked: 
    agent_obs    = [e.reshape(e.shape[0], -1) for e in tree.flatten(u['obs'])] 
   unpacked_obs.append(np.concatenate(agent_obs, 1))
unpacked_obs = np.stack(unpacked_obs, 1)

to fully flatten obs.

Even once this is fixed, there are additional errors in calling the model. It appears agents/qmix/qmix_policy acts as a wrapper over custom models, which are recurrent by default. For some reason, it passes seq_lens=None as an argument (https://github.com/ray-project/ray/blob/31ed9e5d02e0b5c8bbbdd3126e1b54dd25f477b9/rllib/agents/qmix/qmix_policy.py#L615), which is needed for a variety of purposes.

Versions / Dependencies

ray==1.10.0, python 3.9, Ubuntu 20.04

Reproduction script

https://github.com/NeuralMMO/baselines/blob/rllib-debug/repro_qmix_lstm_lens.py

You can skip the first ~400 lines -- mostly network definitions to process structured obs.

You can get the NMMO dependency with pip install nmmo[rllib]

Anything else

I am willing to help PR this and other QMix fixes, but I'd need some idea of why flattening logic is being called and where to get seq_lens from in order to pass them along

Are you willing to submit a PR?

[ ] Yes I am willing to submit a PR!

gjoliver commented 2 years ago

may be related to: https://github.com/ray-project/ray/issues/23202 can you help create a minimal repro script, so we can add it as a unit test etc? thanks.

davidADSP commented 2 years ago

Think this is related to a problem I've experienced where if the same sequence is sampled twice in the same training batch, then they will have the same episode_ids, unroll_ids and agent_indicies and therefore will be grouped together by the chop_into_sequences function.

This is caused by seq_lens not being passed in the call to the chop_into_sequences function so it has to try to deduce the lengths itself. The symptom of this is that there is now a sequence that is double the length that it should be and therefore if dynamic_max is set, it will automatically adjust the maximum sequence length to this doubled value.

The fix is to include this line (in qmix_policy.py) as shown highlighted below:

ray-project / ray