xlnwel / model-free-algorithms

TD3, SAC, IQN, Rainbow, PPO, Ape-X and etc. in TF1.x
59 stars 10 forks source link

How the performance about RL+per in your library? #2

Closed kaixindelele closed 3 years ago

kaixindelele commented 3 years ago

Hi, Merry Christmas! Thank you for sharing the model-free-RL-library. Recently, I've been interesting for PER with continuous RL algorithms. However, I found that the performance of td3+per and sac+per is not good. I don't know whether it's my code problem or whether the these two algorithms with per are really not good. So I'd like to ask help for you about the performance of your codes with PER. Have you ever evaluated them? Thank you!

xlnwel commented 3 years ago

I'm not so sure. The code was written a year ago and I did test them then.

Your question is a little bit opaque. Which environment do you test on? And what's the performance of the agent with per and a uniform replay, respectively?

btw, there is no guarantee that PER is better than a uniform replay. It can perform worse than the uniform one on some environments.

kaixindelele commented 3 years ago

I test in halfcheetah-v2, and the performance in https://blog.csdn.net/hehedadaq/article/details/111600080#_240 It really can perform worse than the uniform one on some environments or with some RL algorithms.

xlnwel commented 3 years ago

If you refer to the original paper, PER indeed makes things worse in some cases. Overall, it's better than a uniform replay. Moreover, PER introduces several new hyperparameters, which may require further fine-tuning for new environments.

kaixindelele commented 3 years ago

yes, that is so bad for me.