vwxyzjn / cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
http://docs.cleanrl.dev
Other
4.84k stars 560 forks source link

clamp in C51 #443

Open XinJingHao opened 6 months ago

XinJingHao commented 6 months ago

Hi! The repo C51 really helps a lot!

However, I have a question. In line 226 and 227: l = b.floor().clamp(0, args.n_atoms - 1) u = b.ceil().clamp(0, args.n_atoms - 1)

It seems that the clamp() function is redundant? Because t_z is already in v_min and v_max. Why do we still use it here?

Looking forward to your reply~ Thanks