Closed bionicles closed 5 years ago
Hi, great to hear you are interested in testing these environments with TF-Agents. We are fully compatible with the gym interface, so the easiest path forward will be for you to put these into your own repo and register them with gym (using gym.envs.registration.register). Then, you can just modify one of the example trainers (e.g. agents/dqn/examples/v2/train_eval.py) to load your environment (modifying env_name to point at your env).
Once you get everything set up, you could try out some of the different agents on each of the environments to see how performance differs. If it looks good, we can include a pointer to your repo from the TF-Agents readme!
ok sounds fair! thanks sam
how do you deal with the problem of many different input shapes? I want to train 1 agent to solve all the tasks
You can have one agent, just the network has to handle different input shapes.
Hi, I have around 12 openAI gym environments which I can contribute to the project;
They are loosely based around Dr. Howard Gardner's multiple intelligences... https://www.edutopia.org/video/howard-gardner-multiple-intelligences
Caveats: I developed them all myself alone, except for the Mol.Py which I worked on with Kamel from Tunisia last year and used some code from that project (it makes nice biomolecular gifs) and I had to invent some new "spaces" for OpenAI Gym to do so. (string and array spaces) Some of them use proprietary data and cannot be added without permission from other authors of those datasets.
Also, some rely on big datasets or web scraping which we might not be able to include in your corporate codebase / might need download scripts. I wrote some standard helper functions to wrap the observations with dictionary space.
They all use numpy arrays to be agnostic to the pytorch/tensorflow debate, so the agent or the helper would need to convert these arrays into tensors. A few need a bit of work but they all return observations and accept actions and calculate rewards. For text tasks I figured just let the agent type one letter at a time for faster reward feedback (but we can add sparse rewards too). The biomolecular folding and docking environment can be simplified ... we got too fancy with orthogonal screenshots and can probably just use the atomic tensor (which includes like amino acid info).
Mostly just figured noobs building AGI don't do TDD (because when I brought up unit testing in an AGI facebook group a bunch of nerds flamed me about it), so I wanted to practice TDD, and I did. Mission accomplished. It's actually been just sitting around for a few months now because life happens. Still need to architect and build the agent to solve them. It would be fun to try architecture search and I built a graph generator for that but need to compile it into computation graph... Open to publish on this avenue. Let me know!
How can I contribute these environments?
Space details follow ... might be out of date, it's from a google doc because the code is on another rig, and I used bitbucket instead of GitHub for privacy.
GreedBotEnv-v0 observation : chart - Box(256, 256, 4) - sample type: ndarray - shape: (256, 256, 4) - dtype: uint8 greed - String() - sample type: str highest_change_call_data - Box(10,) - sample type: ndarray - shape: (10,) - dtype: float64 highest_change_call_string - String() - sample type: str highest_change_put_data - Box(10,) - sample type: ndarray - shape: (10,) - dtype: float64 highest_change_put_string - String() - sample type: str highest_volume_call_data - Box(10,) - sample type: ndarray - shape: (10,) - dtype: float64 highest_volume_call_string - String() - sample type: str highest_volume_put_data - Box(10,) - sample type: ndarray - shape: (10,) - dtype: float64 highest_volume_put_string - String() - sample type: str price/earnings - String() - sample type: str underlying_data - Box(6,) - sample type: ndarray - shape: (6,) - dtype: float64 underlying_symbol - String() - sample type: str
GreedBotEnv-v0 action : option_to_buy - Discrete(5) - sample type: int - sample: 0 time_to_sleep - Box(1,) - sample type: ndarray - shape: (1,) - dtype: float32
TSPEnv-v0 observation : cities - List(Tuple(Discrete(100), Discrete(100)), length=16) - sample type: list - length: 15 - dtype: tuple current - Tuple(Discrete(100), Discrete(100)) - sample type: tuple - sample: (36, 87) path - List(Tuple(Discrete(100), Discrete(100))) - sample type: list - length: 2 - dtype: tuple
TSPEnv-v0 action : next_city_index - Discrete(16) - sample type: int - sample: 15
DrawEnv-v0 observation : description - String() - sample type: str title - String() - sample type: str
DrawEnv-v0 action : image - Box(256, 256, 4) - sample type: ndarray - shape: (256, 256, 4) - dtype: float32
MusicEnv-v0 observation : rating - Box(4,) - sample type: ndarray - shape: (4,) - dtype: float64
MusicEnv-v0 action : note - Box(1,) - sample type: ndarray - shape: (1,) - dtype: float32
NBackEnv-v0 observation : back - Discrete(1) - sample type: int - sample: 1 letter - Discrete(52) - sample type: str
NBackEnv-v0 action : match - Discrete(2) - sample type: int - sample: 0
BipedalWalker-v2 observation : BipedalWalker-v2 - Box(24,) - sample type: ndarray - shape: (24,) - dtype: float64
BipedalWalker-v2 action : BipedalWalker-v2 - Box(4,) - sample type: ndarray - shape: (4,) - dtype: float32
SatEnv-v0 observation : problem - String() - sample type: str
SatEnv-v0 action : candidate - Box(3,) - sample type: ndarray - shape: (3,) - dtype: uint8 satisfiable - Discrete(2) - sample type: int - sample: 1
MontezumaRevenge-v0 observation : MontezumaRevenge-v0 - Box(210, 160, 3) - sample type: ndarray - shape: (210, 160, 3) - dtype: uint8
MontezumaRevenge-v0 action : MontezumaRevenge-v0 - Discrete(18) - sample type: int - sample: 5
DockEnv-v0 observation : aminos - Array(shape=(None, 7)) - sample type: ndarray - shape: (468, 7) - dtype: float64 atoms - Array(shape=(None, 17)) - sample type: ndarray - shape: (3736, 17) - dtype: float64 bonds - Array(shape=(None, 1)) - sample type: ndarray - shape: (936, 1) - dtype: float64 chains - Array(shape=(None, 6)) - sample type: ndarray - shape: (2, 6) - dtype: float64 image - Array(shape=(768, 256, 3)) - sample type: ndarray - shape: (768, 256, 3) - dtype: uint8
DockEnv-v0 action : potentials - Array(shape=(3736, 3)) - sample type: ndarray - shape: (3736, 3) - dtype: float64
FoldEnv-v0 observation : aminos - Array(shape=(None, 7)) - sample type: ndarray - shape: (441, 7) - dtype: float64 atoms - Array(shape=(None, 17)) - sample type: ndarray - shape: (3376, 17) - dtype: float64 bonds - Array(shape=(None, 1)) - sample type: ndarray - shape: (882, 1) - dtype: float64 chains - Array(shape=(None, 6)) - sample type: ndarray - shape: (1, 6) - dtype: float64 image - Array(shape=(768, 256, 3)) - sample type: ndarray - shape: (768, 256, 3) - dtype: uint8
FoldEnv-v0 action : potentials - Array(shape=(3376, 3)) - sample type: ndarray - shape: (3376, 3) - dtype: float64
PairsEnv-v0 observation : a - Array(shape=(None, 1)) - sample type: ndarray - shape: (1569,) - dtype: float32 a_type - String() - sample type: str b - Array(shape=(None, 1)) - sample type: ndarray - shape: (1569,) - dtype: float32 b_type - String() - sample type: str
PairsEnv-v0 action : details - Discrete(4) - sample type: int - sample: 3 target - Discrete(3) - sample type: int - sample: 1
VocabEnv-v0 observation : proposed - String() - sample type: str word - String() - sample type: str
VocabEnv-v0 action : finished - Discrete(2) - sample type: int - sample: 1 letter - Discrete(95) - sample type: int - sample: 9
SquadEnv-v0 observation : context - String() - sample type: str question - String() - sample type: str title - String() - sample type: str
SquadEnv-v0 action : done_typing - Discrete(2) - sample type: int - sample: 1 is_impossible - Discrete(2) - sample type: int - sample: 0 letter - Discrete(128) - sample type: int - sample: 64
StoriesEnv-v0 observation : character - String() - sample type: str context - String() - sample type: str linenum - Discrete(5) - sample type: int - sample: 4
StoriesEnv-v0 action : done_typing - Discrete(2) - sample type: int - sample: 0 letter - Discrete(128) - sample type: int - sample: 28 writing - Discrete(3) - sample type: int - sample: 2