It occurred to me that this recent paper is an interesting one to implement inside brax
One of the cool things about brax is its differentiability, but as I understand it, attempt to leverage that thus on the type of environments that brax includes has not been very fruitful thus far; but this paper seems to quite nicely bridge that gap between diff-physics and RL. It makes a lot of sense to truncate trajectories with a learned critic.
It occurred to me that this recent paper is an interesting one to implement inside brax
One of the cool things about brax is its differentiability, but as I understand it, attempt to leverage that thus on the type of environments that brax includes has not been very fruitful thus far; but this paper seems to quite nicely bridge that gap between diff-physics and RL. It makes a lot of sense to truncate trajectories with a learned critic.