Closed mw66 closed 8 years ago
The main reason was that when using ReplayMemory to keep track of current state, the states visited during testing also make it to replay memory and I don't think that's the correct thing to do. Basically we are allowing test data to leak into training. I'm not sure how it actually affects the result though. I'm assuming that previous approach made results slightly better, because during testing we're using exploration rate 0.05, which might result in slightly better trajectories in replay memory.
Thanks for the explanation. Maybe you can put this as comments of StateBuffer class? to clarify if other people have same confusion as me.
BTW, I have another question: I plan to change the original DQN for some experiment, like this;
path1 = original_DQN path2 = some_extra_info
layers = [MergeMultistream(layers=[path1, path2], merge="stack"), Affine(nout=num_actions, init = init_norm)]
self.model = Model(layers=layers)
My question is how to prepare the input data in self._setInput(states) ?
currently, states.shape == (self.batch_size, self.history_length, screen_height, screen_width)
I think after my modification, the input data will be a tuple (screen, extra_info)
where screen.shape = self.screen_dim extra_info.shape is arbitrary, which I haven't decided yet (maybe as simple as a scalar), but will be unrelated to the screen.shape
1) how to write a numpy array (self.batch_size, self.history_length) of tuples (i.e. (screen, extra_info))? 2) if I pass such a state to self._setInput, will Neon work (for such array of tuples)?
Thanks!
On Sun, Jun 26, 2016 at 6:21 AM, Tambet Matiisen notifications@github.com wrote:
Closed #22 https://github.com/tambetm/simple_dqn/issues/22.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/tambetm/simple_dqn/issues/22#event-704238242, or mute the thread https://github.com/notifications/unsubscribe/ACArXn86T7pB1LWaMBut1dGA5a5OT3vYks5qPnzLgaJpZM4I-OpZ .
related question here:
https://groups.google.com/forum/#!topic/neon-users/hMTRAT4Hf4M
How to change this:
https://github.com/tambetm/simple_dqn/blob/master/src/deepqnetwork.py#L37
self.screen_input_shape = (self.history_length,) + self.screen_dim + (self.batch_size,)
self.extra_input_shape = (self.history_length,) + (N,) + (self.batch_size,)
self.input = (self.be.empty(self.screen_input_shape), self.be.empty(self.extra_input_shape))
self.input.lshape = (self.screen_input_shape, self.extra_input_shape) # HACK: needed for convolutional networks
the input will be a tuple, and self.input.lshape will be tuple of tuples, will this work?
Ahh, I realized that if I separate two inputs by using two variables, it may work:
self.screen_input = ... self.extra_input = ...
comments?
And call fprop with the tuple
self.model.fprop((self.screen_input, self.extra_input))
as in
https://github.com/NervanaSystems/neon/blob/master/neon/data/imagecaption.py#L217
https://github.com/tambetm/simple_dqn/blob/master/src/deepqnetwork.py#L49
how to change this then:
self.model.initialize(self.input_shape[:-1], self.cost)
I read the doc:
http://neon.nervanasys.com/docs/latest/_modules/neon/models/model.html#Model.initialize def initialize(self, dataset, cost=None):
it suppose to get a dataset, is this a bug?
should it be:
self.model.initialize(self.input[:-1], self.cost)
if this is a bug, in my experiment, can I change it to:
self.model.initialize((self.screen_input[:-1], self.extra_input[:-1]), self.cost)
i.e. pass a tuple as dataset, will it work as expected?
Added comment for StateBuffer. For the other issues I think you should refer to Neon.
I just saw this in the new code,
https://github.com/tambetm/simple_dqn/blob/master/src/agent.py#L12
and wonder why add StateBuffer?
It stores the duplicated data (screen) in replay_memory, maybe only to make getCurrentState() easier?
I think that's too small a reason to introduce a new class, and store the same data twice in different structures.