kaymen99 / Robot-arm-control-with-RL

Robot arm control using reinforcement learning algorithms : DDPG and TD3 with hindsight experience replay (HER)
MIT License
35 stars 0 forks source link
ddpg-algorithm hindsight-experience-replay reinforcement-learning robot-control td3 twin-delayed-policy-gradient

Robot arm control with Reinforcement Learning

anim

This project focuses on controlling a 7 DOF robot arm provided in the pandas_gym Reacher environment using two continuous reinforcement learning algorithms: DDPG (Deep Deterministic Policy Gradients) and TD3 (Twin Delayed Deep Deterministic Policy Gradients). The technique of Hindsight Experience Replay is used to enhance the learning process of both algorithms.

Continuous RL Algorithms

Continuous reinforcement learning deals with environments where actions are continuous, such as the precise control of robotic arm joints or controlling the throttle of an autonomous vehicle. The primary objective is to find policies that effectively map observed states to continuous actions, ultimately optimizing the accumulation of expected rewards. Several algorithms have been specifically developed to address this challenge, including DDPG, TD3, SAC, PPO, and more.

1- DDPG (Deep Deterministic Policy Gradients)

DDPG is an actor-critic algorithm designed for continuous action spaces. It combines the strengths of policy gradients and Q-learning. In DDPG, an actor network learns the policy, while a critic network approximates the action-value (Q-function). The actor network directly outputs continuous actions, which are evaluted by the critic network to find the best action thus allowing for fine-grained control.

2- TD3 (Twin Delayed Deep Deterministic Policy Gradients)

TD3 is an enhancement of DDPG that addresses issues such as overestimation bias. It introduces the concept of "twin" critics to estimate the Q-value (it uses two critic networks instead of a single one like in DDPG), and it uses target networks with delayed updates to stabilize training. TD3 is known for its robustness and improved performance over DDPG.

Hindsight Experience Replay

Hindsight Experience Replay (HER) is a technique developed to address the challenge of sparse and binary rewards in RL environments. For example, in many robotic tasks, achieving the desired goal is rare, and traditional RL algorithms struggle to learn from such feedback (agent always gets a zero reward unless the robot successfully completed the task which makes it difficult for the algorithm to learn as it doesn't know if the steps done were good or not).

HER tackles this issue by reusing past experiences for learning, even if they didn't lead to the desired goal. It works by relabeling and storing experiences in a replay buffer, allowing the agent to learn from both successful and failed attempts which significantly accelerates the learning process.

Link to HER paper: https://arxiv.org/pdf/1707.01495.pdf

How ro run

Results

The training of both agents was done in the colab environment :


Contact

If you have any questions, feedback, or issues, please don't hesitate to open an issue or reach out to me: aymenMir1001@gmail.com.

License

Distributed under the MIT License. See LICENSE.txt for more information.