need for more parameters in reset(), step(), etc.

osigaud commented 8 years ago

Hi,

I'm currently refactoring a more complicated environment to match gym's API and I'm meeting the limits of the current API.

For reset(), I may want to have a deterministic reset(), which always start from the same point, or a stochastic one (the current one). Thus adding a boolean "deterministic" parameter. Or I may have a list of starting states and want to add the number of the state from which I want to restart. Or even more complicated, I may perform active learning and choose any state from where to start (In that case, I may rather use an additional set_state() method rather than do it through reset()).

For step(), I may want to pass additional information (in my case, a target_size to see if the current reaching movement was a hit and compute the reward). I believe many environments will require these kinds of extensions and at the moment the only option is to move away with the standard API, making any pull request difficult.

I see two possible solutions:

add a general purpose "setContext( dictionary)" method where you can pass anything to your environment to provide additional information that any API method will use through dedicated attributes
add a (generally empty) dictionary parameter to all the standard methods (in the spirit of the "infos" output of the step() method), that the programmer can use for his own purpose.

Everything coded so far would still work with both methods. I tend to believe the latter option is preferable, but of course the gym team may think differently (or find a different solution)!

Looking forward to reading you on that Olivier Sigaud

gdb commented 8 years ago

Off the top of my head:

For step(), can you just make your action space more complicated, and include whatever info is required?
For reset(), though mildly hacky, why not have the first .step() of your environment be passing in configuration information like this?

Both approaches fit with the existing paradigm: everything your agent does to alter the environment is encapsulated in the action. This means all existing wrappers and expectations around the semantics of various calls will remain valid. And there's a decent argument that this is the semantically correct thing to do.

What do you think?

osigaud commented 8 years ago

Thanks for the immediate reaction!

For step(), can you just make your action space more complicated, and include whatever info is required?

Well, to me this is ugly because from a machine learning point of view, keeping the action space as small as possible is a good idea. Thus I will have to extract the "true action" from what I send to step() before sending it to the learner...

For reset(), though mildly hacky, why not have the first .step() of your environment be passing in configuration information like this?

Even worse, because I will have to build a very specific "action" variable just for that step, that will contain for instance the state where I want to start...

If you don't like the idea of adding a general purpose parameter everywhere, I find adding the "setContext()" method less hacky than what you suggest. And for you, in terms of effort, this is just adding one method in the generic env class...

Still not convinced? ;) Olivier

osigaud commented 8 years ago

A further thought : rather than calling it "setContext()", the method could be called "setEnvAttributes()". The idea is that there might be several varying features in your environment (is it deterministic? where is the initial state? what is the target size?...) and you may want the agent (or another experimental scheduling process) to have some control over these features...

Olivier

gdb commented 8 years ago

Another option would be to initialize a new environment each time.

(FWIW, you're welcome to add whatever methods you want to a specific environment.)

gdb commented 8 years ago

Err, sorry early send. But, the thing I was going to say is, we're definitely going to try out hardest to keep the core interface as simple as possible. We've found there's a huge amount of power to having simple reset/step. Specific environments are welcome to grow more functionality, though they stop being automatically comparable in quite the same ways.

It's definitely a tradeoff, and we ultimately need to strike a balance between being flexible and being unified.

osigaud commented 8 years ago

Another option would be to initialize a new environment each time.

OK: the init is my "setEnvAttributes()" method... But then I need to send parameters to my init.

(FWIW, you're welcome to add whatever methods you want to a specific environment.)

Yes, but can these additional method be called on an environment created with env = gym.make('myEnv') ?

It's definitely a tradeoff, and we ultimately need to strike a balance between being flexible and being unified.

Yes, I understand this and I agree on this general philosophy. It seems to me that adding my "setEnvAttributes()" will add a lot of power (thus unification) with nearly no loss in flexibility (all previous environment will just ignore it). But I won't insist more ;)

Olivier

gdb commented 8 years ago

Gotcha. I almost didn't want to mention it but there's also configure, which on closer read is actually probably exactly the same as setEnvAttributes.

I hoped it would not be the answer since the intent was only to use it for things which don't change the semantics of the environment -- thus we could be sure that two envs with the same ID are always comparable.

Let me know if that looks like what you want :).

On Friday, September 9, 2016, Olivier Sigaud notifications@github.com wrote:

Another option would be to initialize a new environment each time.

OK: the init is my "setEnvAttributes()" method... But then I need to send parameters to my init.

(FWIW, you're welcome to add whatever methods you want to a specific environment.)

Yes, but can these additional method be called on an environment created with env = gym.make('myEnv') ?

It's definitely a tradeoff, and we ultimately need to strike a balance between being flexible and being unified.

Yes, I understand this and I agree on this general philosophy. It seems to me that adding my "setEnvAttributes()" will add a lot of power (thus unification) with nearly no loss in flexibility (all previous environment will just ignore it). But I won't insist more ;)

Olivier

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/openai/gym/issues/337#issuecomment-245840796, or mute the thread https://github.com/notifications/unsubscribe-auth/AAM7kRoWpSqGreqqzLFPQgvK7hYkGCFwks5qoQtjgaJpZM4J4wGE .

Sent from mobile

osigaud commented 8 years ago

Yes, this is exactly what I need!

Through configure(), I can tell my environment: "now, your reset is deterministic", or "now, you start from that state", "now your target is that large", etc.

I see the danger of using it too much, bu t I believe I have use cases where it is truly the solution.

Thanks! Olivier

hholst80 commented 8 years ago

Can you explain more what a semantic compatible vs incompatible use of configure would be?

tlbtlbtlb commented 8 years ago

'Semantically compatible' means it doesn't change the action/observation/reward behavior, but something else like the way it renders the visualization.

Compatible: env.configure(display=':0') Incompatible: env.configure(gravity=9.7)

Since we want to be able to make rigorous comparisons between agents on the same environment, it's important that changes to the environment's semantics be clearly distinguished.

hholst80 commented 8 years ago

Ok. I realise that we are abusing the API if we are for instance setting frame_skip in ALE via configure. How should such auguments to the environment be passed on to the environment? Is it impossible to use gym.make directly if we need to have this flexibility?

tlbtlbtlb commented 8 years ago

Define 2 separately named environments, with different kwargs to the constructor. See, for example, FrozenLake-v0 and FrozenLake8x8-v0.

For a more complex environment, see how kwargs are generated programatically for the standard Atari environemnts

openai / gym

need for more parameters in reset(), step(), etc. #337