Open thiagopbueno opened 5 years ago
I'm wondering if this is really necessary, as I've noticed improvement using general Q networks in TD3. One can even see the initial period where the networks predictions start becoming all negative in TensorBoard.
Make sure we constrain the set of function approximators for the Q-function in problems that the expected sum of rewards is bounded (e.g., never positive).