[RLlib] `KeyError: 'infos'` in `_process_observations` when Using Custom Multi-Agent Environment

clemenjuan commented 5 months ago

What happened + What you expected to happen

I am experiencing a persistent issue with my custom multi-agent environment in RLlib, where the infos dictionary is not being found, leading to a KeyError: 'infos'. This error arises during the processing of observations in the _process_observations function in env_runner_v2.py.

The error occurs consistently across different configurations and even after ensuring the environment complies with the expected structures for observations, rewards, terminations, truncations, and infos.

Versions / Dependencies

RLlib Version: 2.10.0
Python Version 3.11.6
Operating System: MacOS
Framework: PyTorch

Reproduction script

Steps to Reproduce

Environment Setup: I have a custom multi-agent environment with satellites as agents. Observations, rewards, terminations, truncations, and infos are handled per agent.
RLlib Setup: Using PPO with multi-agent configuration. The environment is registered and used within a standard RLlib training loop.
Error Encounter: Upon initiating the training, during the first iteration of sampling from the environment, the error KeyError: 'infos' occurs in _process_observations.

Code Snippets

# Sample environment's step method:
def step(self, actions):
    # Process actions...
    observations, rewards, terminations, truncations, infos = {}, {}, {}, {}, {}
    # Logic to fill the above dictionaries based on the environment's dynamics
    return observations, rewards, terminations, truncations, infos

def reset(self, seed=None, options=None):
   observations, infos = {}, {}
   # Logic to fill the dictionaries
   return observations, infos

Issue Severity

High: It blocks me from completing my task.

clemenjuan commented 5 months ago

/I found that line 573 in env_runner_v2.py was commented, so I uncommented it and now it seems to work, but I think that should be checked.

values_dict = {
                    SampleBatch.T: episode.length,  # Episodes start at -1 before we
                    # add the initial obs. After that, we infer from initial obs at
                    # t=0 since that will be our new episode.length.
                    SampleBatch.ENV_ID: env_id,
                    SampleBatch.AGENT_INDEX: episode.agent_index(agent_id),
                    # Last action (SampleBatch.ACTIONS) column will be populated by
                    # StateBufferConnector.
                    # Reward received after taking action at timestep t.
                    SampleBatch.REWARDS: rewards[env_id].get(agent_id, 0.0),
                    # After taking action=a, did we reach terminal?
                    SampleBatch.TERMINATEDS: agent_terminated,
                    # Was the episode truncated artificially
                    # (e.g. b/c of some time limit)?
                    SampleBatch.TRUNCATEDS: agent_truncated,
                    SampleBatch.INFOS: infos[env_id].get(agent_id, {}), # this line was previously commented
                    SampleBatch.NEXT_OBS: obs,
                }

simonsays1980 commented 5 months ago

@clemenjuan Thanks for filing this issue. During the the release of ray-2.10.0 and the actual release we have changed a lot in the coder of the EnvRunner API. Could you give the actual version a try and see, if the error prevails?

ray-project / ray