如何使用除了PPO算法以外算法训练

huawei-noah / SMARTS

Scalable Multi-Agent RL Training School for Autonomous Driving

MIT License

954 stars 190 forks source link

如何使用除了PPO算法以外算法训练 #2147

Closed Ppig01 closed 7 months ago

Ppig01 commented 7 months ago

High Level Description

注意到代码库中的三个示例都是关于PPO算法的训练，而论文中使用了多种算法，怎么使用除了示例以外的算法如多智能体算法训练模型并评估呢

Version

smarts2.01

Operating System

ubuntu 20.04

Problems

No response

Adaickalavan commented 7 months ago

Hi @Ppig01

1) To use your own customized training code, it is best to start by familiarising yourself on how to interact with the multi-agent SMARTS environment. 2) A multi-agent SMARTS example is given here. 3) Given observations, rewards, terminateds, and truncateds, which are the outputs of env.step(actions), you can train your own policy to yield the next actions, thereby replacing the default RandomLanerAgent policy used in the example.