yandexdataschool/tinyverse

Universe RL trainer platform. Simple. Supple. Scalable.

Why should i care?

tinyverse is a reinforcement learning platform for gym/universe/custom environments that lets you utilize any resources you have to train reinforcement learning algorithm.

Key features

Simple: the core is currently under 400 lines including code (~50%), comments(~40%) and spaces (~10%).
Supple: tinyverse assumes almost nothing of your agent and environment. The environment may not be interruptable. Agent may have any algorithm/structure. Agent [will soon](https://github.com/yandexdataschool/tinyverse/issues/14) support any framework from numpy to pure tensorflow/theano to keras/lasagne+agentnet.
Scalable: You can train and play 10 parallel games on your GPU desktop/server, 20 more sessions on your Macbook and another 5 on your friend's laptop when he doesn't look. (And 1000 more games and 10 trainers in the cloud ofc).

The core idea is to have two types of processes:

play-er - interacts with the environment, records sessions to the database, periodically loads new params
train-er - reads sessions from the database, trains agent via experience replay, sends params to the database

Those processes revolve around database that stores experience sessions and weights. The database is currently implemented with Redis since it is simple to setup and swift with key-value operations. You can, however, implement the database interface with what database you prefer.

Quickstart

install redis server
- (Ubuntu) sudo apt-get install redis-server
- Mac OS version HERE.
- Otherwise search "Install redis your_OS" or ask on gitter.
- If you want to run on multiple machines, configure redis-server to listen to 0.0.0.0 (also mb set password)
install python packages
- gym and universe
- pip install gym[atari]
- pip install universe - most likely needs dependencies, see urls above.
- install bleeding edge theano, lasagne and agentnet for agentnet examples to work.
- Preferably setup theano to use floatX=float32 in .theanorc
- pip install joblib redis prefetch_generator six
- examples require opencv: conda install -y -c https://conda.binstar.org/menpo opencv3
Spawn several player processes. Each process simply interacts and saves results. -b stands for batch size.
```
for i in `seq 1 10`; 
do
     python tinyverse atari.py play -b 3 &
done
```
Spawn trainer process. (demo below runs on gpu, change to cpu if you have to) THEANO_FLAGS=device=gpu python tinyverse atari.py train -b 10 &
evaluate results at any time (records video to ./records) python tinyverse atari.py eval -n 5

Devs: see workbench.ipynb