Important Note fastrl==2.* is being developed at fastrl. https://github.com/josiahls/fastrl) is the permanent place for all fastai version 2.0 changes as well as faster/refactored/more stable models. Please go there instead for new information/code.
This repo is not affiliated with Jeremy Howard or his course which can be found here. We will be using components from the Fastai library for building and training our reinforcement learning (RL) agents.
Our goal is for fast_rl to be make benchmarking easier, inference more efficient, and environment compatibility to be as decoupled as much as possible. This being version 1.0, we still have a lot of work to make RL training itself faster and more efficient. The goals for this repo can be seen in the RoadMap.
An important note is that training can use up a lot of RAM. This will likely be resolved as more models are being added. Likely will be resolved by off loading to storage in the next few versions.
A simple example:
from fast_rl.agents.dqn import create_dqn_model, dqn_learner
from fast_rl.agents.dqn_models import *
from fast_rl.core.agent_core import ExperienceReplay, GreedyEpsilon
from fast_rl.core.data_block import MDPDataBunch
from fast_rl.core.metrics import RewardMetric, EpsilonMetric
memory = ExperienceReplay(memory_size=1000000, reduce_ram=True)
explore = GreedyEpsilon(epsilon_start=1, epsilon_end=0.1, decay=0.001)
data = MDPDataBunch.from_env('CartPole-v1', render='human', bs=64, add_valid=False)
model = create_dqn_model(data=data, base_arch=FixedTargetDQNModule, lr=0.001, layers=[32,32])
learn = dqn_learner(data, model, memory=memory, exploration_method=explore, copy_over_frequency=300,
callback_fns=[RewardMetric, EpsilonMetric])
learn.fit(450)
More complex examples might involve running an RL agent multiple times, generating episode snapshots as gifs, grouping reward plots, and finally showing the best and worst runs in a single graph.
from fastai.basic_data import DatasetType
from fast_rl.agents.dqn import create_dqn_model, dqn_learner
from fast_rl.agents.dqn_models import *
from fast_rl.core.agent_core import ExperienceReplay, GreedyEpsilon
from fast_rl.core.data_block import MDPDataBunch
from fast_rl.core.metrics import RewardMetric, EpsilonMetric
from fast_rl.core.train import GroupAgentInterpretation, AgentInterpretation
group_interp = GroupAgentInterpretation()
for i in range(5):
memory = ExperienceReplay(memory_size=1000000, reduce_ram=True)
explore = GreedyEpsilon(epsilon_start=1, epsilon_end=0.1, decay=0.001)
data = MDPDataBunch.from_env('CartPole-v1', render='human', bs=64, add_valid=False)
model = create_dqn_model(data=data, base_arch=FixedTargetDQNModule, lr=0.001, layers=[32,32])
learn = dqn_learner(data, model, memory=memory, exploration_method=explore, copy_over_frequency=300,
callback_fns=[RewardMetric, EpsilonMetric])
learn.fit(450)
interp=AgentInterpretation(learn, ds_type=DatasetType.Train)
interp.plot_rewards(cumulative=True, per_episode=True, group_name='cartpole_experience_example')
group_interp.add_interpretation(interp)
group_interp.to_pickle(f'{learn.model.name.lower()}/', f'{learn.model.name.lower()}')
for g in interp.generate_gif(): g.write(f'{learn.model.name.lower()}')
group_interp.plot_reward_bounds(per_episode=True, smooth_groups=10)
More examples can be found in docs_src
and the actual code being run for generating gifs can be found in tests
in
either test_dqn.py
or test_ddpg.py
.
As a note, here is a run down of existing RL frameworks:
However there are also frameworks in PyTorch:
fastai (semi-optional)\
Install Fastai
or if you are using Anaconda (which is a good idea to use Anaconda) you can do: \
conda install -c pytorch -c fastai fastai
fast_rl\
Fastai will be installed if it does not exist. If it does exist, the versioning should be repaired by the the setup.py.
pip install fastai
OpenAI all gyms: \
pip install gym[all]
Mazes: \
git clone https://github.com/MattChanTK/gym-maze.git
\
cd gym-maze
\
python setup.py install
git clone https://github.com/josiahls/fast-reinforcement-learning.git
\
cd fast-reinforcement-learning
\
python setup.py install
Many issues will likely fall under fastai installation issues.
Any other issues are likely environment related. It is important to note that Python 3.7 is not being tested due to an issue with Pyglet and gym do not working. This issue will not stop you from training models, however this might impact using OpenAI environments.
gym.GoalEnv
Following fastai's guidelines would be desirable: Guidelines
While we hope that model additions will be added smoothly. All models will only be dependent on core.layers.py
.
As time goes on, the model architecture will overall improve (we are and while continue to be still figuring things out).
Since fastai uses a different style from traditional PEP-8, we will be following Style and Abbreviations. Also we will use RL specific abbr.
Concept | Abbr. | Combination Examples | |
---|---|---|---|
RL | State | st | |
Action | acn | ||
Bounds | bb | Same as Bounding Box |
Model | |
---|---|
DQN | |
Dueling DQN | |
Double DQN | |
DDDQN | |
Fixed Target DQN | |
DQN | |
Dueling DQN | |
Double DQN | |
DDDQN | |
Fixed Target DQN | |
DDPG | |
DDPG | |
DDPG |
Model | Gif(Early) | Gif(Mid) | Gif(Late) |
---|---|---|---|
DDPG+PER | |||
DoubleDueling+ER | |||
DoubleDQN+ER | |||
DuelingDQN+ER | |||
DoubleDueling+PER | |||
DQN+ER | |||
DuelingDQN+PER | |||
DQN+PER | |||
DoubleDQN+PER | |||
DDPG+PER | |||
DDPG+ER | |||
DQN+PER | |||
FixedTargetDQN+ER | |||
DQN+ER | |||
FixedTargetDQN+PER | |||
DoubleDQN+ER | |||
DoubleDQN+PER | |||
DuelingDQN+ER | |||
DoubleDueling+PER | |||
DuelingDQN+PER | |||
DoubleDueling+ER | |||
DDPG+ER | |||
DDPG+PER |