vlad17 / mve

MVE: model-based value estimation
Apache License 2.0
10 stars 0 forks source link

MVE: Model-based Value Estimation Build Status

Using short-horizon nonlinear dynamics for on-policy simulation to improve value estimation. See the paper for algorithmic details.

Requirements

Here are setup-specific requirements that you really, really have to do yourself:

    export LD_LIBRARY_PATH=~/.mujoco/mjpro150/bin
    export LIBRARY_PATH=~/.mujoco/mjpro150/bin

Other system dependencies:

Example installation:

# GPU version
# ... you install system packages here
conda create -y -n gpu-py3.5 python=3.5
source activate gpu-py3.5
pip install -r <( sed 's/tensorflow/tensorflow-gpu/' requirements.txt )

# CPU version
# ... you install system packages here
conda create -y -n cpu-py3.5 python=3.5
source activate cpu-py3.5
pip install -r requirements.txt

# Lazy version (defaults to CPU)
conda create -y -n cpu-py3.5 python=3.5
./scripts/install-mujoco.sh
./scripts/ubuntu-install.sh

# Lazy version (GPU)
conda create -y -n gpu-py3.5 python=3.5
./scripts/install-mujoco.sh
sed -i 's/tensorflow/tensorflow-gpu/' requirements.txt
./scripts/ubuntu-install.sh

Scripts

All scripts are available in scripts/, and should be run from the repo root.

script purpose
lint.sh invokes pylint with the appropriate flags for this repo
ubuntu-install.sh installs all deps except MuJoCo/python on Ubuntu 14.04 or Ubuntu 16.04
install-mujoco.sh add MuJoCo 1.50 on linux, assuming a key
tests.sh runs tests
fake-display.sh create a dummy X11 display (to render on a server)
launch-ray-aws.sh launch an AWS ray cluster at the current branch
teardown-ray-aws.sh tear down a cluster

Running experiments

To run experiments locally, use main_ray.py (note the resources requirements specified here are only used for scheduling trials; the actual processes are free to create as many threads as they want if you'd like to oversubscribe the machines by setting tf_parallelism within the YAML config to larger than the number of guaranteed CPUs). To multiplex the GPUs locally, set self_host to the total number of virtual GPUs.

python mve/main_ray.py --experiment_name hc0 --config experiments/hc0.yaml --ncpus 1 --ngpus 0 --self_host 1

Experiments can also be run in a distributed manner by connecting to a live ray cluster by changing --self_host to --port RAY_REDIS_PORT above. Multiple experiments can be run by the same driver. See python mve/main_ray.py --help for details.

Parallelism

Multiple components of this code run in parallel.

Adding MuJoCo key to CI securely

Just use the manual encryption instructions here. .travis.yml is already configured to securely unencrypt mjkey.txt.gpg.

Adding New Fully Observable Environment

  1. Create a new FullyObservable<env> under mve/envs. See FullyObservableHalfCheetah.py for a link to the Open AI Gym commit that contains the source code you should adapt.
  2. Expose the new environment by importing it in mve/envs/__init__.py
  3. Make it accessible through mve/env_info.py's _env_class function.

Make sure to test that your environment works and amend the tests in scripts/tests.sh to include a check that it runs correctly.