Using short-horizon nonlinear dynamics for on-policy simulation to improve value estimation. See the paper for algorithmic details.
Here are setup-specific requirements that you really, really have to do yourself:
~/install-mujoco.sh
with the key mjkey.txt
in the CWD.scripts/
assume this is the python
and pip
in PATH
)glfw3
, then reinstall everything in a shell where the following environment variables are set (and for good measure in the shell where you're launching experiments): export LD_LIBRARY_PATH=~/.mujoco/mjpro150/bin
export LIBRARY_PATH=~/.mujoco/mjpro150/bin
Other system dependencies:
gym
- gym READMEgym2
- gym2 READMEExample installation:
# GPU version
# ... you install system packages here
conda create -y -n gpu-py3.5 python=3.5
source activate gpu-py3.5
pip install -r <( sed 's/tensorflow/tensorflow-gpu/' requirements.txt )
# CPU version
# ... you install system packages here
conda create -y -n cpu-py3.5 python=3.5
source activate cpu-py3.5
pip install -r requirements.txt
# Lazy version (defaults to CPU)
conda create -y -n cpu-py3.5 python=3.5
./scripts/install-mujoco.sh
./scripts/ubuntu-install.sh
# Lazy version (GPU)
conda create -y -n gpu-py3.5 python=3.5
./scripts/install-mujoco.sh
sed -i 's/tensorflow/tensorflow-gpu/' requirements.txt
./scripts/ubuntu-install.sh
All scripts are available in scripts/
, and should be run from the repo root.
script | purpose |
---|---|
lint.sh |
invokes pylint with the appropriate flags for this repo |
ubuntu-install.sh |
installs all deps except MuJoCo/python on Ubuntu 14.04 or Ubuntu 16.04 |
install-mujoco.sh |
add MuJoCo 1.50 on linux, assuming a key |
tests.sh |
runs tests |
fake-display.sh |
create a dummy X11 display (to render on a server) |
launch-ray-aws.sh |
launch an AWS ray cluster at the current branch |
teardown-ray-aws.sh |
tear down a cluster |
To run experiments locally, use main_ray.py
(note the resources requirements specified here are only used for scheduling trials; the actual processes are free to create as many threads as they want if you'd like to oversubscribe the machines by setting tf_parallelism
within the YAML config to larger than the number of guaranteed CPUs). To multiplex the GPUs locally, set self_host
to the total number of virtual GPUs.
python mve/main_ray.py --experiment_name hc0 --config experiments/hc0.yaml --ncpus 1 --ngpus 0 --self_host 1
Experiments can also be run in a distributed manner by connecting to a live ray cluster by changing --self_host
to --port RAY_REDIS_PORT
above. Multiple experiments can be run by the same driver. See python mve/main_ray.py --help
for details.
Multiple components of this code run in parallel.
--env_parallelism
workers. I have found that peak performance is reached when there is some batching; i.e., the number of workers is less than half the number of environments used for evaluation (default 8).--tf_parallelism
(default nproc
).CUDA_VISIBLE_DEVICES
, which if left empty uses the first GPU available on the machine. There is currently no support for actually using multiple GPUs in a single experiment.OMP_NUM_THREADS
is overriden; there's no need to set it.Just use the manual encryption instructions here. .travis.yml
is already configured to securely unencrypt mjkey.txt.gpg.
FullyObservable<env>
under mve/envs
. See FullyObservableHalfCheetah.py
for a link to the Open AI Gym commit that contains the source code you should adapt.mve/envs/__init__.py
mve/env_info.py
's _env_class
function.Make sure to test that your environment works and amend the tests in scripts/tests.sh
to include a check that it runs correctly.