gutfeeling / practical_rl_for_coders

Learn reinforcement learning with Python, Gym and Keras.
8 stars 5 forks source link

Lesson: Observations in Gym #14

Closed gutfeeling closed 4 years ago

gutfeeling commented 4 years ago

What and why

  1. In this lesson, we are going to setup the CartPole-v1 environment using Python and Gym, so that we can interact with the environment via code. During the setup process, we will also find out how the agent observes the environment i.e. what it sees with its eyes.

Content

  1. As always, we will start by importing gym.

  2. In order to create an interactive environment in gym, we need to use the gym.make() function which takes a single string as argument and returns the interactive environment, which we will store in the variable env. The value of the string should be the name of the environment. In this case, the name is CartPole-v1.

  3. You can try this out with other environments listed in the gym environments page too. For example, to create the Acrobot-v1 environment, we must type acrobot_env = gym.make("Acrobot-v1").

  4. Remember that the starting state of the CartPole-v1 environment has the cart at the center of the screen and the pole is completely vertical. At any point, we can get to this starting state by typing env.reset(). This returns the observation made by the agent in this starting state, which we will store in the variable observation. So the variable observation is how the agent sees the starting state.

  5. Let's print out the observation and see what it is. As you can see, it seems to contain four floating point numbers. The first element is the cart position and can vary from -2.4 to +2.4. Since the starting state places the cart at the center, it is 0.something in this case. So it is nearly at the center but not exactly. They do this to add an element of randomness into the starting state.

  6. The second element is the cart velocity and can range from -infinity to +infinity. The third element is the pole angle and this can range between -41.8 degrees and +41.8 degrees, expressed in radians of course. The vertical position has the value 0. When the pole is titled to the right, then the pole angle is positive. When it is tilted to the left, the pole angle is negative. We find that the current value is 0.something, which is very close to vertical. The tiny deviation from the vertical position is once again to add an element of randomness and to make the pole fall under its own weight.

  7. The last element is the velocity of the tip of the pole. This can also range from -inf to +inf. The faster the pole falls, the higher the absolute value.

  8. All these details are not available in the CartPole-v1 page on Gym's website. Rather, it is available in the wiki of Gym's GitHub page. If at any point, you want to know more about the environment than is available in the webpage, it makes sense to try the Wiki to see if there is more information.

  9. One thing clear at this point. The way the Agent observes the environment is very different from how we see the environment. If we wanted to see the environment in its current condition, we have to use the env.render() command, which will draw he current state of the environment in a popup window.

  10. We see the the rectangular cart, the wire, the pole, the white space etc. However, the Agent only sees these four numbers, which are the cart position and velocity, and the pole angle and velocity. It's view is also limited, i.e. it has no idea how high the wire is from the bottom of the screen. It also doesn't know the length of the pole. It has no idea about all this whitebox or the rectangular bounding box. In general the agent's worldview and our worldview can be markedly different. But the agent has to act based on how it observes the system, so this is all it has.

  11. What is the Python type of the agents observation? If we print type(observation), we find that it is a special data type called Box(4). This datatype is defined by gym. Here, Box tells us that it is a sequence of floating point numbers. Box(4) means that it is a sequence of 4 floating point numbers. Box(2) on the other hand would mean a sequence of 2 floating point numbers. gym has many of these quirky data types, and we will discover more of them as we deal with more environments in the course.

Summary