Closed schrum2 closed 5 years ago
I solved this issue (#4) by adding the following code:
if i_episode % 10 == 0: env.render()
before the act = agent.step(obs0)
declaration. This code enables the agent to render the zeroth (initial, on-load) episode and any episode with IDs of multiples of 10, rather than rendering every single one. Adding this code will guarantee the issue solved in any file where the following block of code is written before the aforementioned declaration:
env = gym.make('Pendulum-v0')
env.seed(1)
env = env.unwrapped
Some notes:
This code does not work with A2C_mountain_car.py, but neither does env.render()
. Since the output for both is TypeError: output must be an array
, I'm convinced that it has to do with the environment, which I will work on fixing later.
This code also does not work with HER_coin.py, but the variable env
is not declared as above in that file. Instead, it is declared as such:
bit_size = 15
env = Env(bit_size)
As a consequence of the variable being declared as a completely different object (perhaps because "coin" does not have environment that can be represented in the agent), env.render()
will not work at all.
Additionally, the agent may go into a "Not Responding" state during prolonged episodes with IDs that are not multiples of 10, but it starts back up on the necessary ones. I wonder if this issue has anything to do with the sudden way the episodes end, which causes the agent to flip out until the next necessary episode is rendered?
Some of this documentation seems to correspond to different issues. However, the specific issue of not watching every episode seems to be resolved ... can this issue be closed?
Learning can seem slow if you are watching every episode, but it is unsatisfying to just watch performance numbers scroll by. Make it so env.render() only executes if the i_episode is divisible by a certain number ... for example 10, so that you only look at the agent performance every 10 episodes. We can tweak this number.