Create a separate repo for Wrappers ?

zuoxingdong commented 5 years ago

Because there are quite some Issues/PRs for new features which could be done through implementing a Wrapper, plus there are very useful wrappers in baselines repo for RL algorithms, e.g. Atari preprocessing/DQN, continuous action clipping, normalizing observations, scaling rewards etc.

Therefore, will it be a good idea to have another repo maintained by OpenAI for keeping a wide collections of Wrappers in one place (+ unit tests), this makes gym itself lightweight and helpful for the community to access to standardized, well-tested wrappers.

What do you think @pzhokhov ?

pzhokhov commented 5 years ago

Having a collection of standardized, well-tested wrappers - sounds very useful, I agree with that. I am not sure separate repo is justified though; rather we can put them (and the unit tests) into the gym. Per discussion with @christopherhesse, env-specific standard wrappers could live in the gym, whereas wrappers that are algorithm-specific/related can live closer to algorithms, i.e. in baselines. Let's keep this issue open until we complete migration of wrappers into gym and their testing.

zuoxingdong commented 5 years ago

Thanks @pzhokhov , that sounds exciting ! I've made a small list here for further discussion:

Standard
- [x] ClipAction: clip the received action with upper and lower bounds. Valid for Box action space
- [x] SignReward: apply sign function to bin the reward
- [x] ClipReward: clip the raw reward with upper and lower bounds.
- [x] ScaleReward: rescale the reward by a factor
- [ ] WarpFrame: downsample the image observation. Note: maybe better name with ResizeFrame/DownsampleFrame
- [x] FlattenObservation
- [x] GrayScaleObservation: convert pixel observation to gray scale
- [x] TimeAwareObservation: append time step to the observation. Refer to paper, Time Limits in Reinforcement Learning
Standardization: better applies to VecEnv
- [ ] StandardizeObservation: standardize the observation by online estimation of moments
- [ ] StandardizeReward: standardize the reward by online estimation of variance. Note: NOT subtracted by mean
- [ ] VecMonitor: record episodic length, returns and elapsed time.
Specific
- [x] NoopResetEnv: atari
- [x] FireResetEnv: atari
- [x] EpisodicLifeEnv: atari
- [x] MaxAndSkipEnv: atari
- [x] FrameStack: DQN
- [x] ScaledFloatFrame: DQN
- [x] LazyFrames: DQN

zuoxingdong commented 5 years ago

@pzhokhov I'd be happy to implement some as the discussion leads to migrate that wrapper.

AdilZouitine commented 5 years ago

Hi @zuoxingdong , I saw you did a lot of PRs implementing the wrappers you mentioned!

My question is if there is still some wrapper left to implement, I would be happy to help you :smile:

Thank you.

zuoxingdong commented 5 years ago

Hi @AdilZouitine , thanks a lot for your interest ! I've modified a checklist above, for now, probably the standardization for observation and reward in vectorized environment are missing. Would you like to implement them ?

AdilZouitine commented 5 years ago

Hi @zuoxingdong, yes, one of them interests me, it will be an opportunity to learn more about environment vectors.

For my internship, I had to implement a wrapper similar to the WarpFrame that you propose in your list. I can also take care of this one.

Best regards :smile:

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

openai / gym

Create a separate repo for Wrappers ? #1335