Try to create a rectangular bin like 36, 28 to validate the positioning fix. Here we create 36 bins for the first parameter which is position and 28 bins for velocity, the second parameter. With a trained q table it should have action values for each cell, not just the top.
create_uniform_grid(env.observation_space.low, env.observation_space.high, bins=(36, 28))
Try to create a rectangular bin like 36, 28 to validate the positioning fix. Here we create 36 bins for the first parameter which is position and 28 bins for velocity, the second parameter. With a trained q table it should have action values for each cell, not just the top. create_uniform_grid(env.observation_space.low, env.observation_space.high, bins=(36, 28))