[Question] Can I train agents in a nested loop in SB3?

I am interested in training two agents, where the actions of one affects the environment of the other, and vice-versa. I understand that SB3 is not designed for multi-agent systems, but I'm not sure if this necessarily has to be treated as a multi-agent problem. The pseudo-code for my learning problem is the following:

Initialize environment

Repeat for X episodes:
  Agent 1 takes action a1
  Repeat for N steps:
      Agent 2 takes action a2
      Environment is updated according to a1, a2
      Agent 2 collects reward r2
  Agent 1 collects reward r1

  Update environment

I am training on a custom env which returns no bugs in the env_checker. Any advice on how I could proceed in SB3 would be much appreciated, or if this certainly impossible to do in SB3 it would also be useful to know that.

hill-a / stable-baselines

[Question] Can I train agents in a nested loop in SB3? #1183