xbpeng / awr

Implementation of advantage-weighted regression.
MIT License
178 stars 36 forks source link

Why Normalization of vf #6

Open im-Kitsch opened 2 years ago

im-Kitsch commented 2 years ago

Hello,

thanks for the code, while I tried to re-implement the program, I find that there is one step to normalize value function vf here . It's implementated by v_predict = v(s; \theta) * (1-/gamma) and critic update is implemented by min_\theta [v(s; \theta) * (1-/gamma) - v_estimate ]^2.

Is there any reason to normalize Value functions output, I tested to remove the normalization term and rescaled learning rate(by 1-gamma), looks there is no problem in HalfCheetah-v2.

It holds similar performance with original version.

Best,

xbpeng commented 2 years ago

the value scaling is just mainly a convention, i generally like to keep things normalized between 0 and 1. Training should work just as well without the normalization, but it might just need some tuning for the other hyper parameters like the stepsize.