proroklab / VectorizedMultiAgentSimulator

VMAS is a vectorized differentiable simulator designed for efficient Multi-Agent Reinforcement Learning benchmarking. It is comprised of a vectorized 2D physics engine written in PyTorch and a set of challenging multi-robot scenarios. Additional scenarios can be implemented through a simple and modular interface.
https://vmas.readthedocs.io
GNU General Public License v3.0
280 stars 57 forks source link

using heuristic with MARL #120

Open majid5776 opened 6 days ago

majid5776 commented 6 days ago

hello. how can we use heuristic with multi agent RL. for example I want to use rrlib.py and run_heuristic.py on example directory in the same time? this mean I want to use MAPPO with heuristic function. Does rrlib.py use this heuristic automatically? Thanks.

matteobettini commented 6 days ago

Hello!

I don't super understand the question:

The heuristic is alternative to RL. It is made so that you can compare your RL agents to the performance of the heuristic.

EIther your agents are controlled by an RL policy or by the heuristic policy.

What do you mean when you say you want to mix the two?

majid5776 commented 6 days ago

OK thank you for your answer. you are right. but As you know, solving problems with deep reinforcement learning algorithms is very time-consuming. My idea was to use search methods in some envs like discovery or flocking to calculate action and doing it instead of using epsilon greedy. In my opinion this can reduce the time of training.

matteobettini commented 5 days ago

Oh ok got it! You woul like to use the heuristic to bootstrap exploration.

Yes this is a really good idea!

Unfortunately I do not now is there is a default way to do this in rllib, I think you might need to code something custom.

The way I would do it in torchrl and BenchMARL is coding a custom callback that fills the replay buffer with data collected with the heuristic policy

Zartris commented 2 days ago

Hey @matteobettini Do you have an example of the TorchRL callback you talk about here? Would love to see an example.

matteobettini commented 2 days ago

I usually write all my custom code here https://github.com/facebookresearch/BenchMARL/blob/main/benchmarl/experiment/callback.py

Examples can be found in my recent project https://github.com/proroklab/ControllingBehavioralDiversity/blob/main/het_control/callback.py

I dont have an example for this specific case, but it should be easy do rollouts in the env with a given policy upon setup and store those in the buffer