Open majid5776 opened 6 days ago
Hello!
I don't super understand the question:
The heuristic is alternative to RL. It is made so that you can compare your RL agents to the performance of the heuristic.
EIther your agents are controlled by an RL policy or by the heuristic policy.
What do you mean when you say you want to mix the two?
OK thank you for your answer. you are right. but As you know, solving problems with deep reinforcement learning algorithms is very time-consuming. My idea was to use search methods in some envs like discovery or flocking to calculate action and doing it instead of using epsilon greedy. In my opinion this can reduce the time of training.
Oh ok got it! You woul like to use the heuristic to bootstrap exploration.
Yes this is a really good idea!
Unfortunately I do not now is there is a default way to do this in rllib, I think you might need to code something custom.
The way I would do it in torchrl and BenchMARL is coding a custom callback that fills the replay buffer with data collected with the heuristic policy
Hey @matteobettini Do you have an example of the TorchRL callback you talk about here? Would love to see an example.
I usually write all my custom code here https://github.com/facebookresearch/BenchMARL/blob/main/benchmarl/experiment/callback.py
Examples can be found in my recent project https://github.com/proroklab/ControllingBehavioralDiversity/blob/main/het_control/callback.py
I dont have an example for this specific case, but it should be easy do rollouts in the env with a given policy upon setup and store those in the buffer
hello. how can we use heuristic with multi agent RL. for example I want to use rrlib.py and run_heuristic.py on example directory in the same time? this mean I want to use MAPPO with heuristic function. Does rrlib.py use this heuristic automatically? Thanks.