openai / gym-http-api

API to access OpenAI Gym from other languages via HTTP
MIT License
293 stars 142 forks source link

Environments terminate after 200 steps. How to change this behaviour? #41

Closed markusdumke closed 7 years ago

markusdumke commented 7 years ago

Using the latest gym version environments will terminate after 200 time steps. In python I can write

env = gym.make("MountainCar-v0").env to change this behaviour. See also: http://stackoverflow.com/questions/42787924/why-is-episode-done-after-200-time-steps-gym-environment-mountaincar

But how could I get this working using the gym API in R? In the example_agent.R I tried

env_id = "MountainCar-v0.env" instance_id = env_create(client, env_id)

but this returns an error.

paulhendricks commented 7 years ago

It looks like the code to create a new environment currently does not allow one to instantiate gym environments that don't terminate after 200 time steps (original code found here: https://github.com/openai/gym-http-api/blob/master/gym_http_server.py#L43).

We can modify the create function to allow this behavior with a keyword argument, perhaps terminate_early. For example:

    def create(self, env_id, terminate_early=True):
        try:
            env = gym.make(env_id)
        except gym.error.Error:
            raise InvalidUsage("Attempted to look up malformed environment ID '{}'".format(env_id))
        if not terminate_early:
            env = env.env
        instance_id = str(uuid.uuid4().hex)[:self.id_len]
        self.envs[instance_id] = env
        return instance_id

The resulting client code would be (e.g. in R):

env_id = "MountainCar-v0"
instance_id = env_create(client, env_id, terminate_early = FALSE)

@catherio Any thoughts? I can make a pull request with the code change (and appropriate unit tests and documentation changes) if approved!

catherio commented 7 years ago

Hi @paulhendricks and @markdumke, Actually, overriding the time limits while keeping the same env name is not desirable. Check out https://github.com/openai/gym/wiki/FAQ for more details. Basically, if you want an environment that does not respect the time limits, you have created an incompatible environment that should be given its own name. The workaround would be to create your own environment, and then reference it by name. If you have more questions once you've taken a look at the FAQ, let me know.