normandipalo / curiosity-robot

Curiosity based exploration and playing in RL with Gym Robotics envs.
12 stars 3 forks source link

Unable to pick the Block #1

Closed Ameyapores closed 5 years ago

Ameyapores commented 5 years ago

Hi @normandipalo, amazing implementation of PPO. I tried to run the code for 10000 episodes, In the end, the robot acquires a behaviour to move the block randomly which is intuitive as it is trained only on curiosity rewards. However, even when I include the extrinsic rewards (i.e distance between the block and the target position), it does not learn to pick it up. Could you speculate a reason for this? Also, the loss function of the actor-network goes down, the critic loss remains constant. Further, I am just wondering if there is a reason to normalize the states?

normandipalo commented 5 years ago

Hello @Ameyapores, thank you. This is a tricky topic: as you can see from the literature, learning to pick up a block it's an hard task for a robot, and the problem is generally tackled with hierarchical RL techniques, imitation learning by examples, hindsight experience replay or other methods. I'm not surprised so that the robot doesn't learn efficiently to pick up the block since the general algorithm is not optimized for that. The loss functions of actors and critics can have quite different shape with respect to usual supervised learning since the data distribution is changing often (when the robot learns a new behaviour it reaches different states and does different actions). If the reward goes up you should not worry too much, the hyperparameters should already be optimized for the task. Normalizing inputs is always a good idea in deep learning. I tend to always do that and it generally pays off. Normalizing the rewards is also often a good idea.