openai / gym

A toolkit for developing and comparing reinforcement learning algorithms.
https://www.gymlibrary.dev
Other
34.74k stars 8.61k forks source link

Taxi-V3 observations do not match observation space size #2219

Closed psaintlaurent closed 3 years ago

psaintlaurent commented 3 years ago

Hello,

I am trying to solve the Taxi-V3 problem with different control methods and I'm running into a problem where the number of observations that are returned does not match the observation space. I can run a Generalized Policy Iteration using MC first visit, TD methods or q learning. The number of actual unique observations never matches size of the observation space and is capped at 400. Is this a bug or am I missing something?

RedTachyon commented 3 years ago

So I did a bunch of debugging and playing with the code until I realized the solution is staring at me mathematically. Well, almost.

With a brute force script I found that there are actually 404 attainable states. There are 100 states missing, but 4 out of those "missing" ones are actually reachable in a certain sense.

The 100 missing states correspond to situations where the pickup and dropoff locations are the same. Which I guess makes sense, if the passenger's position is their destination, there's no real task for the agent, so this is disabled in environment creation. There are 25 (taxi positions) x 4 (possible goal positions) = 100 states like this. So we'd have 400 states in total, as OP noted.

The only time when something like this can actually happen is right after the taxi drops off the passenger, i.e. the positions of the passenger, destination, and the taxi are the same. There are 4 positions like this. This only occurs after the episode ends, so it can be neglected when actually learning the environment, which is probably why OP only saw 400 states.

tl;dr, there are 96 states that can't be reached due to environment logic, and 4 states which only appear after an episode is successfully finished

So as far as I can tell, the environment works as intended, but could probably use an update in the description. I'll update the docstring and the docs, and I think that's everything this needs. It would probably be a bit more elegant to reparametrize the observation space so that the entirety of it can actually be used, but that would be a significantly bigger challenge, plus it would (imo unnecessarily) change the environment's interface.

jkterry1 commented 3 years ago

@alfred100p Please make sure this whole thing is incorporated into the docs PR you're working on for this