le-liang / MARLspectrumSharingV2X

Spectrum sharing in vehicular networks based on multi-agent reinforcement learning, IEEE Journal on Selected Areas in Communications
232 stars 94 forks source link

关于reward design的中lambdda_c 和lambdda_d取值问题 #1

Closed zyy341 closed 4 years ago

zyy341 commented 4 years ago

梁老师,您好。在reward部分,lambdda被设置为0,这会不会导致V2I capacity在训练过程中没有被优化?亦或是,在您之前的实验中,把lambdda设置为0,将会得到最好的performance? https://github.com/le-liang/MARLspectrumSharingV2X/blob/4e9e3289fd4e8389165dcfe7559eb877d135c1a6/Environment_marl.py#L446-L454

le-liang commented 4 years ago

你好,这个参数你可以调整多跑一些结果看看。我是发现这么设置最后总体性能比较好(包括V2I)。感觉这个有点像我论文里面那个Centralized maxV2V benchmark (maximize V2V sum rates at each step without considering V2I),我当时加了一段remarks,欢迎讨论:

We also note that the centralized maxV2V scheme attains remarkable performance in terms of V2I performance. This could be due to the packet delivery rates of V2V links have been substantially enhanced with centralized maxV2V and the V2V links incur no interference to V2I links once their payload delivery has finished. This is an interesting observation that warrants further investigation into the performance tradeoff between V2I and V2V links.