Support for POMDP Observation Settings

Giovannibriglia commented 2 months ago

Issue: I've noticed that the current implementation primarily supports observation settings typical of MDPs, where observations are provided from an absolute perspective.

To enhance the flexibility and utility of the library, especially for reinforcement learning (RL) practitioners, I would propose to introduce support for Partially Observable Markov Decision Processes (POMDPs). This feature would allow users to test and compare different approaches across MDP and POMDP settings, potentially leading to new insights and developments in RL.

Questions:

Is there any existing functionality in the current codebase that can be leveraged to achieve POMDP settings?
Can you already imagine what the foreseeable challenges will be in implementing this feature?

Suggestions:

Feature: Introduce an option to enable POMDP settings within the environment configuration via the make_env function. What are your thoughts on this?
Implementation: Modify the observations so that the current observation is expressed relative to the previous one. The first observation would remain absolute, while each subsequent observation would be adjusted by subtracting the previous observation.

Other: So far, I’ve been using VMAS in POMDP settings by handling observation subtraction outside of the VMAS object. While this approach is simple, I believe it could be more efficient if implemented directly within the object. What do you think about?

Thank you for the clarification. I'm ready to start working on this :)

matteobettini commented 2 months ago

Hello Giovanni!

Thanks for openeing this.

This seems quite a peculiar definition of POMDPs. Let me try to explain how I think about this.

In the multi-agent setting the most general abstraction in a Partially Observable Markov Game. This is a multi-agent extension of the POMDP.

What makes these formulations partially observable is the fact that the agent receives an observation that is a subset of the whole state, and thus they do not have the Markovian property anymore.

So tasks in VMAS are all partially observable, because agents do not usually have data about other agents.

What you are proposing seems to be a particular observation structure that makes it even more partially observable as agents need memory even more.

My opinion is that if users want to implement this structure they can:

create a scneario like this
do it outside of vmas by saving the observations after reset and then updating them to get the deltas

I personally would not add this as a feature of the simulator but happy to be convinced otherwise. If you have a paper describing this please send it over.

smarianimore commented 2 months ago

My two cents (hello everybody)

The POMDP definition (and, consequentially, the Markov game extension) actually encompasses several "degrees" of what term "partial" means:

could be limited in space
could be noisy
could be failing
could be limited in any other way

All the above in reference to "full" observability that usually means that the "internal" state of the environment and the "external" one perceived by agents exactly match (maybe can differ in representation, need to double-check this...)

Imho, when in a MAS the agents can perceive all of the true environment state with no noise, but not other agents' states, it is technically a POMDP setting still, but the "least" possible (hope I've explained myself). In MAS literature is usually a basic assumption that agents do not perceive other agents' states.

The setting @Giovannibriglia is working on (I think) is a slightly more restricting POMDP setting where agents do also have limited capabilities of observation of the environment (e.g. a limited field of view, a restricted set of sensors, a different representation, etc.)

I agree with @matteobettini that the kind of "incremental" observations that @Giovannibriglia is describing are quite peculiar. I can't recall any paper I read doing this. The most similar (?) setting I can recall is in some deep MARL paper where to limit non-stationarity agents are allowed to keep an history of state observations, instead of the most recent one only (and obviously creating a "delta" given such history is trivial).

I would say that this kind of "incremental" setting is still a POMDP setting, as the agent does not know the full state (but I agree that it can recover it by chaining observations properly...), but can be persuaded otherwise.

Keep up the good work, you both are doing great :D

matteobettini commented 2 months ago

Yes very agreed.

The types of partial observability are limitless and many beyond not observing other agents. For example here are some examples from vmas:

there are scenarios that add observation noise
scenarios that use lidars (limited sensors)
scenarios where global properties are not observable (e.g. opponents data in football)

My point is that I think VMAS should be agnostic of the type of observability the scenario decides to implement. It is up to the scenario creator or editor to choose what to expose and how to expose it to agents.

The structure described by @Giovannibriglia is something that can be added to all scenarios to make them even more partially observable, but I think that this structure is not used enough in the community to make it a feature of vmas. Instead, I think users that need it can easily add it locally. If, on the other hand, we get more requests for something like this we can add it.

Giovannibriglia commented 2 months ago

Thank you all for the valuable discussion! Looking forward to have another one as soon as possible :)

matteobettini commented 2 months ago

I'll close this since it seems we are aligned, feel free to reopen if needed

proroklab / VectorizedMultiAgentSimulator

Support for POMDP Observation Settings #136