gutfeeling / practical_rl_for_coders

Learn reinforcement learning with Python, Gym and Keras.
8 stars 5 forks source link

Lesson: Reinforcement Learning Basics - Environment, Agent, Actions and Learning Goal #13

Closed gutfeeling closed 4 years ago

gutfeeling commented 4 years ago

What are we doing in this lesson and why?

  1. In this lesson, we are going to learn some basic concepts of Reinforcement Learning by looking at the first Reinforcement Learning problem that we will try to solve in this course. This first problem is called CartPole-v1. You have already seen this name when we were testing the Gym installation.

  2. Even though we will be working within this specific problem, the concepts that you learn are going to be applicable to all Reinforcement Learning problems that you will ever encounter. Knowing these basics will help us analyze Reinforcement Learning problems and find effective solutions.

Content

  1. The CartPole-v1 environment is described in this webpage. Here's how you get to this webpage. You first go to the page containing the list of all Reinforcement Learning problems within the Gym package. I have included this link in the lecture notes. Then you find the problem that you want to understand. In our case, the problem is called CartPole-v1 and it is included in the Classic Control section. Click on the name and you are in this webpage that describes the problem.

  2. CartPole-v1 is one of the most classic problems in Reinforcement Learning and was described as early as 1983. The problem consists of an environment, which is this 2 dimensional mini world that you see here. The environment consists of a black wire, a black rectangular cart, and a brown pole which is attached to the cart.

  3. The black rectangular cart is called the Agent. The agent is able to observe the environment. You can imagine that it has eyes. It can also take actions. In this case, it can apply a force +1 or -1 at any given time step in order to move left or right along the wire.

  4. Even thought the pole is attached to the cart, it can swing freely and typically falls under its own weight. The goal in this environment is to teach the agent, the cart, how to move so that the pole stays upright for a certain length of time, let's say 500 timesteps. This is the learning goal of the environments.

  5. The environment starts in a state where the agent is at the center and the pole is upright. Then, in every following time step, the Agent can practice keeping the pole upright. The Agent can practice for maximum 500 timesteps, after which the environment terminates. If at any time, the pole falls more than 15 degrees from the upright position or the cart moves out of the box, then the environment will also terminate. The time from the start till the environment terminates is called an episode.

  6. What you are seeing here is dumb agent, which moves randomly. Of course, the pole falls all the time and the episodes end abruptly. The next episode starts and the episode number goes up.

  7. However, once the Agent has learned to balance the cart, an episode would go on for longer, nearing the whole 500 time steps. Look at how the Agent is balancing the pole gracefully, almost like a circus performer. We want to start from the dumb agent that you see here, then teach it via Reinforcement Learning to get to this graceful circus performer eventually. This will be the first project of the many projects that you will do in this course.

Summary

  1. Alright, let's summarize what we learned. A Reinforcement Learning problem has an environment, which is like a mini-world inside which all the action happens. There is an Agent, which can observe the environment and take actions. In this case of CartPole-v1, the Agent, or the cart, can move left or right. Finally, there is a learning goal in the environment. In the case of CartPole-v1, the goal is to keep the pole upright. You will encounter these four components, Environment, Agent, Actions and Learning Goal in all Reinforcement Learning problem.

  2. In the next few lessons, we are going to see how these concepts translate into Python code.