openai / gym

A toolkit for developing and comparing reinforcement learning algorithms.
https://www.gymlibrary.dev
Other
34.53k stars 8.59k forks source link

[Proposal] Adding a Distribution space #2992

Closed ryanrudes closed 2 years ago

ryanrudes commented 2 years ago

Proposal

A clear and concise description of the proposal.

Motivation

Gym is severely lacking numerous Space types that are commonly seen in RL, though I'm going to address just one here to keep things simple. imagine a person wanting to implement a high-level baseline model that should be compatible with a wide range of environments; the user has to specify information regarding the mean and variance of the state separately in order for data processing to be done automatically (if you can't guess from my highly-specific remarks, I'm that person). This could be easily resolved by having a space defined implicitly by a mean and variance, essentially an open-range "Box".

Pitch

I want to propose a Distribution space which represents values sampled from a gaussian distribution defined by mean and std. This implementation would be quite similar to Box with some obvious changes in sampling.

Alternatives

Alternatively, this could simply be implemented into the Box space as an added functionality. Or maybe this would be too confusing for users, especially when it comes to sampling (ie. when fixed interval bounds apply vs. when infinite range gaussian sampling applies). If it were to be merged with the Box space, I would imagine this is how we'd interpret the user's intentions based on which arguments were specified:

Scenario 1: If mean and std are provided, but low and/or high are not, then low and/or high will default to -inf and/or inf, respectively. When the sample() method is called, a value while be sampled from the distribution centered at mean with standard deviation std and then that value will be clipped to [low, high]

Scenario 2: If mean, low, and high are all specified, but std is not, std will default to one quarter of the range of the interval (in accordance with the Range Rule). Sampling procedure will then be the same as Scenario 1

Scenario 3. If low and high are specified, but mean and std are not, we simply sample from the interval [low, high]

Checklist

balisujohn commented 2 years ago

Would this be for sampling from your action space, so you can have a more structured distribution for action space sampling? And the idea is that for each environment, recommended params for a (diagonal) multivariate normal action space sample distribution are provided?

pseudo-rnd-thoughts commented 2 years ago

Personally, I'm not in favour of this change. My understanding of the Gym spaces is to provide bounds on all of the observations/actions for an environment, we are agnostic to the agents. Therefore, as this PR is mainly scoped on space for agents to sample with rather than an environment I dislike this space.

This PR would make sense in Stable-Baseline-3 to help users. Alternatively, this PR could be a proposal for the Box sample mask such that the mask specifies a non-uniform distribution

RedTachyon commented 2 years ago

I'm moderately strongly against this. We have to be aware of what the purpose of a Space is and what are its limits.

It is not necessarily a full description of the observation space, in the sense that every obs which fulfills space.contains(obs) is actually in the mathematical set of valid observations. Instead, all we can expect is a surjection from observations into the Space, that is for every mathematically valid observation, we can have space.contains(obs), but it can contain other stuff as well. That's why a somwehat similar prior proposal was rejected #2662.

So in short, the purpose of a space is to provide a general heuristic of whether something is a valid observation/action or not. It gives a template for the form, mostly ignoring the content, if that makes sense.

This specific proposal is even more niche, because as I undertand, it concern sampling, which itself is a somewhat dodgy feature and there were plans to remove it. It should not be used for any "serious" procedures, it's more of a "give me something roughly observation-shaped, I don't care what".

From the perspective that I described, this space would be exactly the same as Box, but raising some more tricky questions space.contains. Consider we have a standard normal distribution space, and we check space.contains(1e8). The probability of that is absurdly low, I'd estimate you could run all the supercomputers in the world for the duration of the age of the universe, and it still wouldn't happen. But technically speaking it could happen. Should we perform statistical testing?

There is an argument to be made that maybe spaces should attempt to be more precise, that we should try make it a bijective relation rather than surjective. I'm not convinced by that argument at the moment, but it's out there.

There is absolutely no precedence for adding a new core space just to affect the sampling distribution, and I don't think we should introduce it, since it's a step after making it bijective in the first place.

My recommendation for you - just subclass Normal(Box) and add whatever you want there. If you do it correctly, it should still work with any existing code, and you can add your features. But it's a niche of a niche, so it shouldn't be added natively in gym.

PS For anyone from the future searching for a potential Distribution space where an element of the space is itself a probability distribution (which was my initial understanding from the title) - I think that's more interesting, but much more challenging, so if you have any ideas, feel free to suggest them in a new issue.