Closed Ppig01 closed 7 months ago
Hi @Ppig01
1) To use your own customized training code, it is best to start by familiarising yourself on how to interact with the multi-agent SMARTS environment.
2) A multi-agent SMARTS example is given here.
3) Given observations
, rewards
, terminateds
, and truncateds
, which are the outputs of env.step(actions)
, you can train your own policy to yield the next actions
, thereby replacing the default RandomLanerAgent
policy used in the example.
High Level Description
注意到代码库中的三个示例都是关于PPO算法的训练,而论文中使用了多种算法,怎么使用除了示例以外的算法如多智能体算法训练模型并评估呢
Version
smarts2.01
Operating System
ubuntu 20.04
Problems
No response