kengz / SLM-Lab

Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".
https://slm-lab.gitbook.io/slm-lab/
MIT License
1.24k stars 264 forks source link

`train` mode with resume; `enjoy` mode refactor #455

Closed kengz closed 4 years ago

kengz commented 4 years ago

train mode with resume

Fixes #444. This adds capability to resume a training in a past-future-consistent manner. See explanation below.

Suppose we run a training with 10 million (10M) frames to completion, and see that further improvements may be possible if we had run it for longer, say 20M frames. If only we could go back in time and set the frames to 20M to begin with.

The resume mode allows us to do that without time traveling. We can edit the spec file in the present and resume training so the run picks up where it left off as if it was already using the edited spec. Of course, the modification to the spec file must itself be consistent to the past and the future, e.g. we cannot suddenly modify the initial learning rate or variable values.

To achieve this, the lab relies on 3 objects and their load methods

Since everything in the lab runs according to env.clock, the above are all we need to restore for resuming training. Once the network and training metrics are restored, and the clock is set correctly, everything runs from the designated point in time.

NOTE: for off-policy algorithms the replay memory is not restored simply due to the cost of storing replay data (GBs of data per session and slow write during frequent checkpoints). Hence the behavior of off-policy replay is slightly different: it will need to fill up again from resume-point and training will only start again at the specified replay size threshold, so we will lose a small fraction of the total timesteps.

Usage example

Specify train mode as train@{predir}, where {predir} is the data directory of the last training run, or simply uselatest` to use the latest. e.g.:

python run_lab.py slm_lab/spec/benchmark/reinforce/reinforce_cartpole.json reinforce_cartpole train
# terminate run before its completion
# optionally edit the spec file in a past-future-consistent manner

# run resume with either of the commands:
python run_lab.py slm_lab/spec/benchmark/reinforce/reinforce_cartpole.json reinforce_cartpole train@latest
# or to use a specific run folder
python run_lab.py slm_lab/spec/benchmark/reinforce/reinforce_cartpole.json reinforce_cartpole train@data/reinforce_cartpole_2020_04_13_232521

enjoy mode refactor

The train@ resume mode API allows for the enjoy mode to be refactored. Both share similar syntax. Continuing with the example above, to enjoy a train model, we now use:

python run_lab.py slm_lab/spec/benchmark/reinforce/reinforce_cartpole.json reinforce_cartpole enjoy@data/reinforce_cartpole_2020_04_13_232521/reinforce_cartpole_t0_s0_spec.json

The refactored changes are summarized below:

Misc