Question about step() in gymwrapper5

I'm sorry for the late reply.

In the case of terminal_step in Unity ML-Agents, the state of the agent that has reached the terminal step in the multi-agent environment, that is, the state of the agent that no longer takes action, enters terminal_step. (On the contrary, if the agent does not reach the terminal step and needs to continue action, the state is entered as decision_step.) This is because in our environment, there is no agent who can reach the terminal step because the step limit is not set for individual agents. We are aware of this and plan to add and modify it.
That's right. The official Repository of Unity ML-Agents does not yet support gym wrappers for multi-agent environments. Therefore, I developed and used a generic gym style wrapper that also supports multi-agent environments. It probably works in most multi-agent environments.

leehe228 / LogisticsEnv