Gizmotronn commented 4 years ago

This issue is for us to document how we will use reinforcement learning to train our rl-agent to avoid obstacles. The goal of this issue is:

Learn more about the reward function
Help decide what parameters we will be likely to be using/needing (also see PR #2 )
Learn how we will train the rl-agent

A useful link to get started: https://pdfs.semanticscholar.org/0fcd/a4e464c9d55ccd9f8e8e3521c286e4b47933.pdf

Gizmotronn commented 4 years ago

SemanticScholar - RL-agent Obstacle Avoidance

*Abstract**

Reinforcement Learning

Reinforcement Learning is learning how to map environment situations to actions, with the goal of maximising a reward signal/value
It is a computational approach to learn from interaction. Learning from interaction is a foundational idea in almost all learning methods
The agent must learn from its own experience(s)
Exploration vs exploitation:
- The agent must take actions that give a higher reward score (on the reward function) to get the best accumulative rewards
- However, to find the best actions/choices in certain situations, the agent needs to try actions that it has not selected before
- The agent has no idea what the reward will be unless it takes the action (otherwise the agent would be able to finish the program on the first try, every time)
- The agent therefore has to exploit the best known actions to obtain rewards, while also exploring unknown options (to either increase its reward or to get further)

Gizmotronn commented 4 years ago

https://papers.nips.cc/paper/452-obstacle-avoidance-through-reinforcement-learning.pdf

Gizmotronn commented 4 years ago

These resources may be useful

Gizmotronn commented 4 years ago

SemanticScholar - RL-agent Obstacle Avoidance

Abstract*

Reinforcement Learning

Reinforcement Learning is learning how to map environment situations to actions, with the goal of maximising a reward signal/value

It is a computational approach to learn from interaction. Learning from interaction is a foundational idea in almost all learning methods

The agent must learn from its own experience(s)

Exploration vs exploitation:

The agent must take actions that give a higher reward score (on the reward function) to get the best accumulative rewards

However, to find the best actions/choices in certain situations, the agent needs to try actions that it has not selected before

The agent has no idea what the reward will be unless it takes the action (otherwise the agent would be able to finish the program on the first try, every time)

The agent therefore has to exploit the best known actions to obtain rewards, while also exploring unknown options (to either increase its reward or to get further)

Experiments

The experiment from this link was not just about obstacle avoidance, it was also about wall following (i.e. the agent would attempt to stay as close to the wall as possible and "trace" its path with the wall as its target line)
I'm not sure if we have a specific location our osr agent needs to start from, however it is not practical to have a wall-following policy as on Mars there are no walls. While we could say to the agent something along the lines of "stay at the edge of the map unless there is an obstacle that will impede you from doing so", however this would mean that in real life, with Mars being a sphere - and therefore no map edges - the system would fail. While the challenge is to get the highest reward score, which means that wall/map-edge following would be eligible, it would not do much good to the actual OSR, which is what our code would be going into. However, this article is still interesting to read and still has useful information

Gizmotronn commented 4 years ago

Might also want to have a look at these links:

Gizmotronn commented 4 years ago

Arxiv.org - Unmanned Aerial Vehicles

https://arxiv.org/pdf/1811.03307.pdf (or above comment)

Part 1

Introduction

To be able to avoid obstacles, UAVs (or rl-agents) need to be able to perceive the distance between itself and the obstacles

Gizmotronn commented 4 years ago

https://research.google/pubs/pub48418/

Gizmotronn commented 4 years ago

Google Research - Comparison of DRL Policies for Moving Obstacle Avoidance

Abstract

Deep RL learns to simultaneously predict the motion of objects and corresponding avoidance actions directly from robotic sensors.

Gizmotronn commented 4 years ago

More resources:

ieeexplore.ieee.org/document/1026964
http://github.com/sichkar-valentyn/Reinforcement_Learning_In_Python

rishistyping / AWS_JPL_OSR_DRL

Reinforcement Learning for Robot Obstacle Avoidance #3

SemanticScholar - RL-agent Obstacle Avoidance

SemanticScholar - RL-agent Obstacle Avoidance

Arxiv.org - Unmanned Aerial Vehicles

Part 1

Google Research - Comparison of DRL Policies for Moving Obstacle Avoidance