Closed zmonoid closed 6 years ago
@zmonoid
dqn_demo
should be removed. Thanks very much for pointing out these issues! We need to spend some time refactoring the code.
@zmonoid That CartpoleSwingupEnv environment is used to test the correctness of our actor-critic implementation so we copy this environment from another toolbox rllab https://github.com/rllab/rllab. We will use gym to replace it in the future. For the atari game part, shall we also move to gym? The current atari game class is actually a little complicated. Perhaps we can split the replay memory part out of the game itself.
@sxjscience @flyers Thanks very much for your response.
I'd like to contribute for the issues I proposed above and more, but in case of duplication of your works, you may kindly inform me your work on this or assign me task.
I'd like to revise the asynchronous one step Q learning to replicate the result in the paper.
@zmonoid Great! I'll next work on a new API for base and replay memory as well as implementing the natural policy gradient.
@sxjscience
I encountered some difficulty when implementing asynchronous q learning.
It seems when using multi-threading to update the same network, I will receive some error.
The error messages is here:
[2016-07-28 16:11:20,933] Making new env: Breakout-v0
[2016-07-28 16:11:20,954] Making new env: Breakout-v0
Thread 0 - Final epsilon: 0.5
Thread 1 - Final epsilon: 0.5
[16:11:27] /home/bzhou/mxnet/dmlc-core/include/dmlc/logging.h:235: [16:11:27] src/ndarray/ndarray.cc:227: Check failed: from.shape() == to->shape() operands shape mismatch
Exception in thread Thread-1:
Traceback (most recent call last):
File "/home/bzhou/anaconda/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/home/bzhou/anaconda/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "dqn_async_demo.py", line 180, in actor_learner_thread
dqn_reward=targets)
File "/home/bzhou/svn/Arena/arena/base.py", line 204, in forward
self.exe.arg_dict[k][:] = v
File "/home/bzhou/anaconda/lib/python2.7/site-packages/mxnet-0.7.0-py2.7.egg/mxnet/ndarray.py", line 220, in __setitem__
value.copyto(self)
File "/home/bzhou/anaconda/lib/python2.7/site-packages/mxnet-0.7.0-py2.7.egg/mxnet/ndarray.py", line 458, in copyto
return NDArray._copyto(self, out=other)
File "/home/bzhou/anaconda/lib/python2.7/site-packages/mxnet-0.7.0-py2.7.egg/mxnet/ndarray.py", line 1133, in unary_ndarray_function
c_array(ctypes.c_char_p, [str(i).encode('ascii') for i in kwargs.values()])))
File "/home/bzhou/anaconda/lib/python2.7/site-packages/mxnet-0.7.0-py2.7.egg/mxnet/base.py", line 77, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
MXNetError: [16:11:27] src/ndarray/ndarray.cc:227: Check failed: from.shape() == to->shape() operands shape mismatch
('| Thread 01', '| Step', 295, '| Reward: 02', ' Qmax: 0.3533', ' Epsilon: 0.99963', ' Epsilon progress: 0.000737')
('| Thread 01', '| Step', 460, '| Reward: 00', ' Qmax: 0.3533', ' Epsilon: 0.99943', ' Epsilon progress: 0.001150')
('| Thread 01', '| Step', 629, '| Reward: 00', ' Qmax: 0.3533', ' Epsilon: 0.99921', ' Epsilon progress: 0.001572')
You may refer to my code here: https://github.com/zmonoid/Arena/blob/master/dqn_async_demo.py
This only happens with multi-threading to update network, set num_thread to zero or set is_train to false will not receive such error.
@zmonoid It's strange. It should have passed the assertion here https://github.com/peterzcc/Arena/blob/master/arena/base.py#L201-L203 .
@sxjscience Indeed it is strange. I thought it was caused by conflicts of multithreading operation, therefore I use threading.lock to lock other threads for the following code:
lock.acquire()
outputs = qnet.forward(is_train=True,
data=states,
dqn_action=actions,
dqn_reward=targets)
qnet.backward()
qnet.update(updater=updater)
lock.release()
Well, this time, it does not pass the assertion any more:
bzhou@bzhou-Desktop ~/svn/Arena master ● python dqn_async_demo.py
[2016-07-29 15:15:08,093] Making new env: Breakout-v0
[2016-07-29 15:15:08,113] Making new env: Breakout-v0
Thread 0 - Final epsilon: 0.1
Thread 1 - Final epsilon: 0.5
Exception in thread Thread-1:
Traceback (most recent call last):
File "/home/bzhou/anaconda/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/home/bzhou/anaconda/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "dqn_async_demo.py", line 184, in actor_learner_thread
dqn_reward=targets)
File "/home/bzhou/svn/Arena/arena/base.py", line 203, in forward
%(k, str(self.exe.arg_dict[k].shape), str(v.shape))
AssertionError: Shape not match: key data, need (1L, 4L, 84L, 84L), received (5L, 4L, 84L, 84L)
But the error information is confusing, since I define qnet like this:
action_repeat = 4
n_threads = 2
history_length = 4
I_AsyncUpdate = 5
I_target = 40000
data_shapes = {'data': (I_AsyncUpdate, history_length) + (84, 84),
'dqn_action': (I_AsyncUpdate,), 'dqn_reward': (I_AsyncUpdate,)}
optimizer = mx.optimizer.create(name='adagrad', learning_rate=0.01, eps=0.01,
clip_gradient=None,
rescale_grad=1.0, wd=0)
updater = mx.optimizer.get_updater(optimizer)
# Set up game environments (one per thread)
num_actions = envs[0].action_space.n
dqn_sym = dqn_sym_nature(num_actions)
qnet = Base(data_shapes=data_shapes, sym_gen=dqn_sym, name='QNet',
initializer=DQNInitializer(factor_type="in"),
ctx=ctx)
QNet actually requires data shape (5, 4, 84, 84) by definition.
Is it because that during the forwards session to get action index from qnet I passed in data with shape (1, 4, 84, 84) thereafter changes the input data shape of QNET?
@sxjscience @flyers Problem solved. Needs to lock the forwards session as well. This is caused by the conflicts of different batch size feed into QNet. Now I am training with 8 threads, let's see if it will converge or not. >.<
@zmonoid Great! The logic of arena.Base
is to store different data shape combinations in a dictionary and fetch the corresponding executor. Before we compute the forward and backward pass, we need to call switch_bucket to fetch (or create) an executor. https://github.com/peterzcc/Arena/blob/master/arena/base.py#L195-L196
@flyers @peterzcc @zmonoid I'm trying to revise the base class to enable more control of the executors. The new API will force the users to create/fetch executors by themselves when necessary.
First, I tried dqn_dist_demo.py on my PC, it runs smoothly. However when I tried it on my server, it will report such error:
It seems to be a swig version problem, in which case swig needs to be upgraded to swig3.0+ according to the error information:
However it does not fix my problem even if I did so.
I would suggest we can move game environment to openai/gym, which is much easier to install and use, and provides uniform interface for various games. Even for the racing game TORCS: https://github.com/ugo-nama-kun/gym_torcs
Second, notice this segment of code in dqn_async.py:
It seems the program only runs asynchronously for this two lines:
which does not fit the idea of asynchronization.
We may refer to this code to rewrite it: https://github.com/tflearn/tflearn/blob/master/examples/reinforcement_learning/atari_1step_qlearning.py
Thirdly, currently we need to manually add log file like:
python dqn_dist_demo.py >> log.txt
Consider to save log file into dir_path automatically.Forthly, dqn_demo.py seems to be a duplicate of dqn_dist_demo.py in which kv_type=None. Consider to remove it or it is for other purposes?
I will try to do it first. Best, zmonoid