I recently found an MOPO code implemented using pytorch (https://github.com/junming-yang/mopo-pytorch). I cannot find the difference between MOPO and SAC. The only difference is that there data are sampled from the rollout replay buffer generated by the learned dynamcis?
I recently found an MOPO code implemented using pytorch (https://github.com/junming-yang/mopo-pytorch). I cannot find the difference between MOPO and SAC. The only difference is that there data are sampled from the rollout replay buffer generated by the learned dynamcis?