Why do you use the term On-Policy Value Function instead of On-Policy State-Value Function? At least for the sake of symmetry with On-Policy Action-Value Function, even that this symmetry is missing from Rich Sutton's book Reinforcement Learning: An Introduction.
Why do you use the term
On-Policy Value Function
instead ofOn-Policy State-Value Function
? At least for the sake of symmetry withOn-Policy Action-Value Function
, even that this symmetry is missing from Rich Sutton's book Reinforcement Learning: An Introduction.