Closed ryanrudes closed 2 years ago
Would this be for sampling from your action space, so you can have a more structured distribution for action space sampling? And the idea is that for each environment, recommended params for a (diagonal) multivariate normal action space sample distribution are provided?
Personally, I'm not in favour of this change. My understanding of the Gym spaces is to provide bounds on all of the observations/actions for an environment, we are agnostic to the agents. Therefore, as this PR is mainly scoped on space for agents to sample with rather than an environment I dislike this space.
This PR would make sense in Stable-Baseline-3 to help users. Alternatively, this PR could be a proposal for the Box sample mask such that the mask specifies a non-uniform distribution
I'm moderately strongly against this. We have to be aware of what the purpose of a Space
is and what are its limits.
It is not necessarily a full description of the observation space, in the sense that every obs
which fulfills space.contains(obs)
is actually in the mathematical set of valid observations. Instead, all we can expect is a surjection from observations into the Space
, that is for every mathematically valid observation, we can have space.contains(obs)
, but it can contain other stuff as well. That's why a somwehat similar prior proposal was rejected #2662.
So in short, the purpose of a space is to provide a general heuristic of whether something is a valid observation/action or not. It gives a template for the form, mostly ignoring the content, if that makes sense.
This specific proposal is even more niche, because as I undertand, it concern sampling, which itself is a somewhat dodgy feature and there were plans to remove it. It should not be used for any "serious" procedures, it's more of a "give me something roughly observation-shaped, I don't care what".
From the perspective that I described, this space would be exactly the same as Box
, but raising some more tricky questions space.contains
. Consider we have a standard normal distribution space, and we check space.contains(1e8)
. The probability of that is absurdly low, I'd estimate you could run all the supercomputers in the world for the duration of the age of the universe, and it still wouldn't happen. But technically speaking it could happen. Should we perform statistical testing?
There is an argument to be made that maybe spaces should attempt to be more precise, that we should try make it a bijective relation rather than surjective. I'm not convinced by that argument at the moment, but it's out there.
There is absolutely no precedence for adding a new core space just to affect the sampling distribution, and I don't think we should introduce it, since it's a step after making it bijective in the first place.
My recommendation for you - just subclass Normal(Box)
and add whatever you want there. If you do it correctly, it should still work with any existing code, and you can add your features. But it's a niche of a niche, so it shouldn't be added natively in gym.
PS For anyone from the future searching for a potential Distribution
space where an element of the space is itself a probability distribution (which was my initial understanding from the title) - I think that's more interesting, but much more challenging, so if you have any ideas, feel free to suggest them in a new issue.
Proposal
A clear and concise description of the proposal.
Motivation
Gym is severely lacking numerous
Space
types that are commonly seen in RL, though I'm going to address just one here to keep things simple. imagine a person wanting to implement a high-level baseline model that should be compatible with a wide range of environments; the user has to specify information regarding the mean and variance of the state separately in order for data processing to be done automatically (if you can't guess from my highly-specific remarks, I'm that person). This could be easily resolved by having a space defined implicitly by a mean and variance, essentially an open-range "Box
".Pitch
I want to propose a
Distribution
space which represents values sampled from a gaussian distribution defined bymean
andstd
. This implementation would be quite similar toBox
with some obvious changes in sampling.Alternatives
Alternatively, this could simply be implemented into the Box space as an added functionality. Or maybe this would be too confusing for users, especially when it comes to sampling (ie. when fixed interval bounds apply vs. when infinite range gaussian sampling applies). If it were to be merged with the
Box
space, I would imagine this is how we'd interpret the user's intentions based on which arguments were specified:Scenario 1: If
mean
andstd
are provided, butlow
and/orhigh
are not, thenlow
and/orhigh
will default to-inf
and/orinf
, respectively. When thesample()
method is called, a value while be sampled from the distribution centered atmean
with standard deviationstd
and then that value will be clipped to[low, high]
Scenario 2: If
mean
,low
, andhigh
are all specified, butstd
is not,std
will default to one quarter of the range of the interval (in accordance with the Range Rule). Sampling procedure will then be the same as Scenario 1Scenario 3. If
low
andhigh
are specified, butmean
andstd
are not, we simply sample from the interval[low, high]
Checklist