proroklab / VectorizedMultiAgentSimulator

VMAS is a vectorized differentiable simulator designed for efficient Multi-Agent Reinforcement Learning benchmarking. It is comprised of a vectorized 2D physics engine written in PyTorch and a set of challenging multi-robot scenarios. Additional scenarios can be implemented through a simple and modular interface.
https://vmas.readthedocs.io
GNU General Public License v3.0
335 stars 69 forks source link

Feat request: Separate sensor measuring from observation function #110

Closed Zartris closed 4 months ago

Zartris commented 4 months ago

Hi again

I am running into the problem, that I need a temporal aspect to my sensor (record the last N lidar measurements). A hack would be to store the lidar.measure() in the observation function, but there is no guarantee that this function is never called more than once per agent during a step. So I would not be able to use this to store the latest lidar measurement in a history tensor.

This is only one of many calls that I would love to do just once after the world.step() function is called. This just happens to be the newest problem. For some time now I have just cloned VMAS and added a scenario.prestep() and scenario.poststep() function into the Environment.step() function as so:

class Environment(TorchVectorizedObject):
        def step(self, actions: Union[List, Dict]):
                ...        
                # Scenarios can define a custom action processor. This step takes care also of scripted agents automatically
                for agent in self.world.agents:
                    self.scenario.env_process_action(agent)

                # advance world state
                self.scenario.prestep()
                self.world.step()
                self.scenario.poststep()

                self.steps += 1
                ...

class BaseScenario(ABC):
        ...
        def prestep():
            pass
        def poststep():
            pass

This gives the user a way to manage all these "once per step" functions that could be needed in a custom scenario, fx. store temporal data.

matteobettini commented 4 months ago

Hey

I like the idea of prestep and poststep, we could add those!

As a side note, In vmas we do guarantee that all functions are called once per step. In particular:

I usually store poststep logic in reward and prestep logic in process action

The observation function is called also after a reset (which is done at init time)

I'll add presetp and poststep anyway as they are nicer, thanks for the suggestion!

matteobettini commented 4 months ago

Btw if you would like to do a PR for this feature to contribute it yourself let me know. We will just need some docstrings for those functions with examples in the same format of the other scenario functions

Zartris commented 4 months ago

Sure, ill do the PR now