hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.16k stars 725 forks source link

[Question] Can I train agents in a nested loop in SB3? #1183

Closed j-thib closed 1 year ago

j-thib commented 1 year ago

I am interested in training two agents, where the actions of one affects the environment of the other, and vice-versa. I understand that SB3 is not designed for multi-agent systems, but I'm not sure if this necessarily has to be treated as a multi-agent problem. The pseudo-code for my learning problem is the following:

Initialize environment

Repeat for X episodes:
  Agent 1 takes action a1
  Repeat for N steps:
      Agent 2 takes action a2
      Environment is updated according to a1, a2
      Agent 2 collects reward r2
  Agent 1 collects reward r1

  Update environment

I am training on a custom env which returns no bugs in the env_checker. Any advice on how I could proceed in SB3 would be much appreciated, or if this certainly impossible to do in SB3 it would also be useful to know that.