gutfeeling / practical_rl_for_coders

Learn reinforcement learning with Python, Gym and Keras.
8 stars 5 forks source link

Lesson: Setting up a Reinforcement Learning problem in Gym #15

Closed gutfeeling closed 3 years ago

gutfeeling commented 4 years ago

What is the focus of this lesson?

How to find and setup a Reinforcement Learning problem in Gym.

Text

  1. In this lesson, we are going to setup our first Reinforcement Learning problem using Gym and Jupyter Notebook. After we have set up the problem, we will explore the problem in the subsequent lessons and learn some basic concepts of Reinforcement Learning.

  2. The Gym package comes with many Reinforcement Learning problems. There are two important ways of seeing the list of problems contained in Gym. The easy graphical way is to go to the url https://gym.openai.com/envs/ and look around. There are several sections. Each of them contain many diverse problems. So Acrobot-v1 is a Reinforcment Learning problem that you can solve inside Gym. So is CartPole-v1. However, this list is not complete.

  3. To get the complete list, we have to use Python. Let's import gym first. To query the complete list, call gym.envs.registry(). This returns a dict_values object containing something called EnvSpec. We are not going to look at what dict_values and EnvSpec means, because we will never use them directly. However, the text withing the EnvSpec objects are the names of the Reinforcement Learning problems. For example, CartPole-v0 is a RL problem. So is CartPole-v1. CartPole-v1 was listed in this webpage. But CartPole-v0 was not. Both of these are the same Reinforcement Learning problem, which is called CartPole, but they are different versions of the problem with slightly different settings. If we scroll through, we see EnvSpec objects corresponding to all Reinforcement Learning problems contained in gym.

  4. The first Reinforcement Learning problem that we will tackle in this course is the CartPole-v0 problem, and we are going to set it up right away.

  5. To set it up, simply call gym.make(). It takes a single string argument, which is the name of the problem. The name of the problem is CartPole-v0, so we enter that. This command returns something called an environment and we will store it in a variable aptly called env. We will find out why it is called an environment soon in the next lesson.

  6. Great, after setting up the problem, the very first thing we must do is initialize the problem. We do that by calling env.reset(). This returns something interesting - an array of four floating point numbers. Ignore this for the moment. We will inspect this in more detail in the next lesson.

  7. Now, we want to get a visual representation of the problem. We do that by typing env.render(). This brings up a popup window, giving us a visual representation of the problem. As we know, this popup window cannot be closed using its close button. So we close it by calling env.close().

  8. Remember, the env.reset() command is an essential part of the setup. If you don't do this, the problem will not be initialized and you will get a black screen when calling env.render(). We can try that out using the Jupyter notebook. A great thing about the Jupyter notebook is that you can restart the Python interpreter without losing the commands that you typed before. We do that by going to Kernel --> Restart & Clear Output. If you restart the interpreter, it forgets what you did before and you can run the commands as if you are doing it for the first time. You can also cherry pick which commands you want to run. So this time, I am going to import gym, run the gym.make() command and skip the env.reset() command. As you can see, you can a black screen instead of getting a visual representation of the problem. So remember to use the env.reset() command to initialize the environment.

  9. Alright, let's summarize what we learned in this lesson. We found out how to get the list of RL problems in gym and we learned how to setup a problem. We set up our first Reinforcement Learning problem, which is called CartPole-v0 and got up to the point where we saw the visual representation of the problem. In the next lesson, we will start exploring and understanding this visual representation.