训练过程中出现的问题

hanruihua / rl_rvo_nav

The source code of the [RA-L] paper "Reinforcement Learned Distributed Multi-Robot Navigation with Reciprocal Velocity Obstacle Shaped Rewards"

MIT License

183 stars 32 forks source link

训练过程中出现的问题 #23

Closed BlueTuox23 closed 3 months ago

BlueTuox23 commented 3 months ago

您好，我在安装完成之后不进行任何参数修改的情况，按照您的readme文档进行运行，4个robot的最佳successful rate:93.00%，在此基础上运行的10个robot，除了policy_name: r10_1_100 successful rate: 47.00% average EpLen: 97.83 std length 7.88 average speed: 1.03 std speed 0.11，之后的successful rate大多数都为0.00%，最高不超过前面出现的47.00%，达不到论文中的98%的程度，即使6个robot跑2000epoch其最佳效果也只达到了successful rate: 95.00%，达不到论文中的100%，是我的训练方法错了么，还是需要重新设置参数

hanruihua commented 3 months ago

你好

这种情况需要修改下参数，可以优先尝试修改 reward_parameter，在这一行：

_par_env.add_argument('--rewardparameter', type=float, default=(3.0, 0.3, 0.0, 6.0, 0.3, 3.0, -0, 0), nargs='+'

尤其是增大避障相关的参数（p1 3.0 和 p4 6.0 ）可以提高成功率。

Translation:

Hi,

In this situation, you can try to adjust the reward_parameter in this line:

_par_env.add_argument('--rewardparameter', type=float, default=(3.0, 0.3, 0.0, 6.0, 0.3, 3.0, -0, 0), nargs='+'

Specially, increase the collision avoidance related parameters (p1 3.0 and p4 6.0) can improve the success rate.

BlueTuox23 commented 3 months ago

您好，我研读了您的论文，发现Reward Function与论文中的设置的略有不同，请问需要按照论文中的Reward Function进行修改么

hanruihua commented 3 months ago

代码的参数含义和论文是一样的，具体的数值需要根据自己的情况调整，在现有的代码基础上调整 reward parameter的数值应该就可以了

Translation:

The meaning of the parameter is the same as the paper, the specific value needs to be adjusted according to your own situation, adjust the value of reward parameter on the basis of the existing code should be enough.

BlueTuox23 commented 3 months ago

谢谢啦，我去试试

BlueTuox23 commented 3 months ago

你好，请问16个机器人也是在10个机器人的基础上进行继续训练么，我训练了好久，都学习不到内容，如果是在4个机器人的基础上二次学习16个机器人的话，效果不是很理想

hanruihua commented 3 months ago

你好

正常的话，10个机器人训练完成之后会具有一定的泛化性，可以直接用于16个机器人的情况，但是对于密集的情况，可以减少neighbors_num 以提高成功率。如果想直接用更多的机器人训练建议将环境尺寸扩大。

Translation:

Hi,

Normally, after the training of 10 robots, they will have certain generalization ability and can be directly used for 16 robots. However, in dense situations, reducing the _neighborsnum parameter can improve the success rate. If you want to train with more robots directly, it is recommended to increase the size of the environment.