philtabor / Youtube-Code-Repository

Repository for most of the code from my YouTube channel
859 stars 479 forks source link

Policy Gradient, SAC doesn't learn #65

Open Ling01234 opened 1 year ago

Ling01234 commented 1 year ago

Hi! I have a few more questions about the code that I don't quite get.

First, I was wondering what pybullet_envs is for. I installed the library but got errors when i tried to import it. I also dont see where its being used.

Second, I was getting really bad scores when i ran the code. I cloned the code from your git, and changed a few things as follows. The first thing I changed is the environment. More specifically, I changed it to env = gym.make("InvertedPendulum-v4") and as a result I also changed the following obs, _ = env.reset() and obs_, reward, done, *_ = env.step(action). Finally, I commented out the lines in sac_torch.py where we use the reparameterize=True since I ran into some nan Tensors when calling rsample().

That's all I've changed, and when I run the code, the score actually decreases (oddly enough). It starts with a score of approx 10 like a random agent, and decreases down to 3 or 4 after 250 episodes.

Would you have any idea of why this is happening? It would be so greatly appreciated!

Thanks a lot for your time

philtabor commented 1 year ago

The pybullet_envs was originally used for the test environment: InvertedPendulumBulletEnv-v0, but unfortunately pybullet hasn't updated their code to be compliant with the new gym specifications. Hence the errors when you try to import.

I'm dealing with this problem in my Academy right now (and, spoiler, I'm writing a deep RL framework that I'll release a 0.1.dev build of very soon) and will be able to address these particular issues, and more, in the coming days.

But, to get you started, you want to make sure that you actually get the "truncated" boolean flag back from the env.step() function. The reason is that the done flag doesn't flip to True when max_steps is reached, rather the truncated flag takes care of this. So your while loop should be while not (done or truncated), so that you don't get an infinite loop.

As far as learning issues, I'll have to come back and update. I'm validating the initial commit of my framework, and will test SAC today.

Ling01234 commented 1 year ago

Thank you so much for your answer!

Please keep me updated on the learning issues whenever you have time to test it, it'd be greatly appreciated. I hope you have a good day!