I am interested in training two agents, where the actions of one affects the environment of the other, and vice-versa. I understand that SB3 is not designed for multi-agent systems, but I'm not sure if this necessarily has to be treated as a multi-agent problem. The pseudo-code for my learning problem is the following:
Initialize environment
Repeat for X episodes:
Agent 1 takes action a1
Repeat for N steps:
Agent 2 takes action a2
Environment is updated according to a1, a2
Agent 2 collects reward r2
Agent 1 collects reward r1
Update environment
I am training on a custom env which returns no bugs in the env_checker. Any advice on how I could proceed in SB3 would be much appreciated, or if this certainly impossible to do in SB3 it would also be useful to know that.
I am interested in training two agents, where the actions of one affects the environment of the other, and vice-versa. I understand that SB3 is not designed for multi-agent systems, but I'm not sure if this necessarily has to be treated as a multi-agent problem. The pseudo-code for my learning problem is the following:
I am training on a custom env which returns no bugs in the env_checker. Any advice on how I could proceed in SB3 would be much appreciated, or if this certainly impossible to do in SB3 it would also be useful to know that.