kv_store based asynchronous RL method

Given that the multi-threading performance of MXNet looks pretty limited, we only have two choices that can fully utilize all the processors:

use a local machine distributed training method.
switch our religion to tensorflow.

However, the problem with choice 2 is that tensorflow does not seem to be flexible and readable enough.

So we may choose the first one. The major work to do this is to modify the dmlc_local.py in order to support user created worker process as multiprocessing.Process. I can investigate it later after finished tuning a continuous control RL algorithm.

peterzcc / Arena

kv_store based asynchronous RL method #16