tensorflow / agents

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.
Apache License 2.0
2.8k stars 721 forks source link

Have a bunch of environments to contribute #82

Closed bionicles closed 5 years ago

bionicles commented 5 years ago

Hi, I have around 12 openAI gym environments which I can contribute to the project;

They are loosely based around Dr. Howard Gardner's multiple intelligences... https://www.edutopia.org/video/howard-gardner-multiple-intelligences

Caveats: I developed them all myself alone, except for the Mol.Py which I worked on with Kamel from Tunisia last year and used some code from that project (it makes nice biomolecular gifs) and I had to invent some new "spaces" for OpenAI Gym to do so. (string and array spaces) Some of them use proprietary data and cannot be added without permission from other authors of those datasets.

Also, some rely on big datasets or web scraping which we might not be able to include in your corporate codebase / might need download scripts. I wrote some standard helper functions to wrap the observations with dictionary space.

They all use numpy arrays to be agnostic to the pytorch/tensorflow debate, so the agent or the helper would need to convert these arrays into tensors. A few need a bit of work but they all return observations and accept actions and calculate rewards. For text tasks I figured just let the agent type one letter at a time for faster reward feedback (but we can add sparse rewards too). The biomolecular folding and docking environment can be simplified ... we got too fancy with orthogonal screenshots and can probably just use the atomic tensor (which includes like amino acid info).

Mostly just figured noobs building AGI don't do TDD (because when I brought up unit testing in an AGI facebook group a bunch of nerds flamed me about it), so I wanted to practice TDD, and I did. Mission accomplished. It's actually been just sitting around for a few months now because life happens. Still need to architect and build the agent to solve them. It would be fun to try architecture search and I built a graph generator for that but need to compile it into computation graph... Open to publish on this avenue. Let me know!

How can I contribute these environments?

Space details follow ... might be out of date, it's from a google doc because the code is on another rig, and I used bitbucket instead of GitHub for privacy.

GreedBotEnv-v0 observation : chart - Box(256, 256, 4) - sample type: ndarray - shape: (256, 256, 4) - dtype: uint8 greed - String() - sample type: str highest_change_call_data - Box(10,) - sample type: ndarray - shape: (10,) - dtype: float64 highest_change_call_string - String() - sample type: str highest_change_put_data - Box(10,) - sample type: ndarray - shape: (10,) - dtype: float64 highest_change_put_string - String() - sample type: str highest_volume_call_data - Box(10,) - sample type: ndarray - shape: (10,) - dtype: float64 highest_volume_call_string - String() - sample type: str highest_volume_put_data - Box(10,) - sample type: ndarray - shape: (10,) - dtype: float64 highest_volume_put_string - String() - sample type: str price/earnings - String() - sample type: str underlying_data - Box(6,) - sample type: ndarray - shape: (6,) - dtype: float64 underlying_symbol - String() - sample type: str

GreedBotEnv-v0 action : option_to_buy - Discrete(5) - sample type: int - sample: 0 time_to_sleep - Box(1,) - sample type: ndarray - shape: (1,) - dtype: float32

TSPEnv-v0 observation : cities - List(Tuple(Discrete(100), Discrete(100)), length=16) - sample type: list - length: 15 - dtype: tuple current - Tuple(Discrete(100), Discrete(100)) - sample type: tuple - sample: (36, 87) path - List(Tuple(Discrete(100), Discrete(100))) - sample type: list - length: 2 - dtype: tuple

TSPEnv-v0 action : next_city_index - Discrete(16) - sample type: int - sample: 15

DrawEnv-v0 observation : description - String() - sample type: str title - String() - sample type: str

DrawEnv-v0 action : image - Box(256, 256, 4) - sample type: ndarray - shape: (256, 256, 4) - dtype: float32

MusicEnv-v0 observation : rating - Box(4,) - sample type: ndarray - shape: (4,) - dtype: float64

MusicEnv-v0 action : note - Box(1,) - sample type: ndarray - shape: (1,) - dtype: float32

NBackEnv-v0 observation : back - Discrete(1) - sample type: int - sample: 1 letter - Discrete(52) - sample type: str

NBackEnv-v0 action : match - Discrete(2) - sample type: int - sample: 0

BipedalWalker-v2 observation : BipedalWalker-v2 - Box(24,) - sample type: ndarray - shape: (24,) - dtype: float64

BipedalWalker-v2 action : BipedalWalker-v2 - Box(4,) - sample type: ndarray - shape: (4,) - dtype: float32

SatEnv-v0 observation : problem - String() - sample type: str

SatEnv-v0 action : candidate - Box(3,) - sample type: ndarray - shape: (3,) - dtype: uint8 satisfiable - Discrete(2) - sample type: int - sample: 1

MontezumaRevenge-v0 observation : MontezumaRevenge-v0 - Box(210, 160, 3) - sample type: ndarray - shape: (210, 160, 3) - dtype: uint8

MontezumaRevenge-v0 action : MontezumaRevenge-v0 - Discrete(18) - sample type: int - sample: 5

DockEnv-v0 observation : aminos - Array(shape=(None, 7)) - sample type: ndarray - shape: (468, 7) - dtype: float64 atoms - Array(shape=(None, 17)) - sample type: ndarray - shape: (3736, 17) - dtype: float64 bonds - Array(shape=(None, 1)) - sample type: ndarray - shape: (936, 1) - dtype: float64 chains - Array(shape=(None, 6)) - sample type: ndarray - shape: (2, 6) - dtype: float64 image - Array(shape=(768, 256, 3)) - sample type: ndarray - shape: (768, 256, 3) - dtype: uint8

DockEnv-v0 action : potentials - Array(shape=(3736, 3)) - sample type: ndarray - shape: (3736, 3) - dtype: float64

FoldEnv-v0 observation : aminos - Array(shape=(None, 7)) - sample type: ndarray - shape: (441, 7) - dtype: float64 atoms - Array(shape=(None, 17)) - sample type: ndarray - shape: (3376, 17) - dtype: float64 bonds - Array(shape=(None, 1)) - sample type: ndarray - shape: (882, 1) - dtype: float64 chains - Array(shape=(None, 6)) - sample type: ndarray - shape: (1, 6) - dtype: float64 image - Array(shape=(768, 256, 3)) - sample type: ndarray - shape: (768, 256, 3) - dtype: uint8

FoldEnv-v0 action : potentials - Array(shape=(3376, 3)) - sample type: ndarray - shape: (3376, 3) - dtype: float64

PairsEnv-v0 observation : a - Array(shape=(None, 1)) - sample type: ndarray - shape: (1569,) - dtype: float32 a_type - String() - sample type: str b - Array(shape=(None, 1)) - sample type: ndarray - shape: (1569,) - dtype: float32 b_type - String() - sample type: str

PairsEnv-v0 action : details - Discrete(4) - sample type: int - sample: 3 target - Discrete(3) - sample type: int - sample: 1

VocabEnv-v0 observation : proposed - String() - sample type: str word - String() - sample type: str

VocabEnv-v0 action : finished - Discrete(2) - sample type: int - sample: 1 letter - Discrete(95) - sample type: int - sample: 9

SquadEnv-v0 observation : context - String() - sample type: str question - String() - sample type: str title - String() - sample type: str

SquadEnv-v0 action : done_typing - Discrete(2) - sample type: int - sample: 1 is_impossible - Discrete(2) - sample type: int - sample: 0 letter - Discrete(128) - sample type: int - sample: 64

StoriesEnv-v0 observation : character - String() - sample type: str context - String() - sample type: str linenum - Discrete(5) - sample type: int - sample: 4

StoriesEnv-v0 action : done_typing - Discrete(2) - sample type: int - sample: 0 letter - Discrete(128) - sample type: int - sample: 28 writing - Discrete(3) - sample type: int - sample: 2

samfishman commented 5 years ago

Hi, great to hear you are interested in testing these environments with TF-Agents. We are fully compatible with the gym interface, so the easiest path forward will be for you to put these into your own repo and register them with gym (using gym.envs.registration.register). Then, you can just modify one of the example trainers (e.g. agents/dqn/examples/v2/train_eval.py) to load your environment (modifying env_name to point at your env).

Once you get everything set up, you could try out some of the different agents on each of the environments to see how performance differs. If it looks good, we can include a pointer to your repo from the TF-Agents readme!

bionicles commented 5 years ago

ok sounds fair! thanks sam

bionicles commented 5 years ago

how do you deal with the problem of many different input shapes? I want to train 1 agent to solve all the tasks

kbanoop commented 5 years ago

You can have one agent, just the network has to handle different input shapes.