Closed bstriner closed 7 years ago
Different agents need different things. In particular, parallel agents don't map easily to a simple API. I'd point people to the simple agents in gym/examples/agents/cem.py for examples of how to write a single-threaded rollout function.
In cem.py, should num_steps
be env.timestep_limit
instead of 200
?
cem.py is an interesting example because it learns over batches of agents. The most common Gym use-case is probably running a single agent and learning each timestep. Maybe second-most common is running single agents or batches of agents and learning at the end of each episode. A handful of configurations would probably cover the majority of situations and eliminate a lot of boilerplate. Changes like timestep_limit could be managed centrally without changing user code.
examples/agents/tabular_q_agent.py looks interesting but there isn't any code to run it. I might add a main like in the other example agents.
BTW, cem.py is using the old Monitor API.
Thanks, Ben
The Monitor API has been updated in https://github.com/openai/gym/pull/453.
"rllab is a framework for developing and evaluating reinforcement learning algorithms, fully compatible with OpenAI Gym. " https://github.com/openai/rllab
Thanks @futurely and @tlbtlbtlb! rllab was the missing component in my mind.
So as I understand it now: roughly speaking, gym is for environments and rllab is for agents. Gym is stand-alone and provides some exemplary agents, but for additional work on agents, rllab is the openai sister project. I'll try working within rllab and raise any agent-related issues there.
Would it make sense to link rllab from the gym readme?
Something like: Gym provides a stand-alone environment for reinforcement learning but additional reinforcement learning agents and a framework for reinforcement learning may be found in the OpenAI sister project, rllab.
Cheers, Ben
Rllab has implemented many algorithms and is a good starting point. But its development was not very active recently. https://github.com/openai/rllab/graphs/contributors https://github.com/openai/rllab/graphs/commit-activity https://github.com/openai/rllab/graphs/code-frequency
RL is hot today and choices are abundant. https://github.com/search?o=desc&q=reinforcement+learning&s=stars&type=Repositories&utf8=%E2%9C%93
Some of the latest researches such as dueling network and prioritized replay are not always implemented in the most popular projects. https://github.com/search?utf8=%E2%9C%93&q=Dueling+Network https://github.com/search?q=Dueling+Network&type=Code&utf8=%E2%9C%93 https://github.com/search?utf8=%E2%9C%93&q=prioritized+replay https://github.com/search?q=prioritized+replay&type=Code&utf8=%E2%9C%93
If rllab hasn't had enough recent activity, the release of Gym might be a good opportunity to renew interest. If I my goal is to implement the latest research in a reusable framework so other people can use it, maybe rllab is the thing to work on.
Making a standalone module for such-and-such algorithm is good and all, but I would rather be contributing to something bigger.
So, does implementing prioritized replay and dueling networks in rllab sound like a worthwhile project, or is there some better framework out there that we should be using?
Thanks, Ben
Universe-starter-agent is intended as a simple agent where you can rip out the algorithm and replace it with something like DQN with prioritized replay. But you could start from scratch too. I don't think it's important to implement it as part of a particular framework now: if it works and produces better results on some environments, that'll be awesome.
Hi everybody,
Is there a plan to standardize a framework for runners and agents, especially for testing?
A standard API for agents like
new_episode()
,observe(observation)
,act()
would mean you could write a standardized test runner that creates an environment, sets the seed, runs the agent, etc. Maybe needs a few callbacks for customization but should ultimately cut down on the amount of boiler plate we have to write. Of course people could still write their own environment loops, but it wouldn't be required.Looking at examples around the web, not everyone is respecting the done flag, resetting the environment properly, setting the seed, etc. An environment loop within the framework could help.
I would also add
learn(reward, done)
to the API, which the runner would call in training mode but not in test mode.Please let me know your thoughts on the best way to put agents and environments into a loop in the most reusable way possible.
Cheers, Ben