Hello, I am reimplementing AWR in my own PyTorch codebase, and I haven't been able to get it to work as well as in the paper (on PyBullet Gym environments). I think this is the only other implementation of AWR I have found apart from the initial TF code, but it doesn't seem to work.
Have you been able to get it to train policies well?
Hello, I am reimplementing AWR in my own PyTorch codebase, and I haven't been able to get it to work as well as in the paper (on PyBullet Gym environments). I think this is the only other implementation of AWR I have found apart from the initial TF code, but it doesn't seem to work.
Have you been able to get it to train policies well?