tambetm / simple_dqn

Simple deep Q-learning agent.
MIT License
695 stars 184 forks source link

What's the reason to add StateBuffer? #22

Closed mw66 closed 8 years ago

mw66 commented 8 years ago

I just saw this in the new code,

https://github.com/tambetm/simple_dqn/blob/master/src/agent.py#L12

and wonder why add StateBuffer?

It stores the duplicated data (screen) in replay_memory, maybe only to make getCurrentState() easier?

I think that's too small a reason to introduce a new class, and store the same data twice in different structures.

tambetm commented 8 years ago

The main reason was that when using ReplayMemory to keep track of current state, the states visited during testing also make it to replay memory and I don't think that's the correct thing to do. Basically we are allowing test data to leak into training. I'm not sure how it actually affects the result though. I'm assuming that previous approach made results slightly better, because during testing we're using exploration rate 0.05, which might result in slightly better trajectories in replay memory.

mw66 commented 8 years ago

Thanks for the explanation. Maybe you can put this as comments of StateBuffer class? to clarify if other people have same confusion as me.

BTW, I have another question: I plan to change the original DQN for some experiment, like this;

path1 = original_DQN path2 = some_extra_info

layers = [MergeMultistream(layers=[path1, path2], merge="stack"), Affine(nout=num_actions, init = init_norm)]

self.model = Model(layers=layers)

My question is how to prepare the input data in self._setInput(states) ?

currently, states.shape == (self.batch_size, self.history_length, screen_height, screen_width)

I think after my modification, the input data will be a tuple (screen, extra_info)

where screen.shape = self.screen_dim extra_info.shape is arbitrary, which I haven't decided yet (maybe as simple as a scalar), but will be unrelated to the screen.shape

1) how to write a numpy array (self.batch_size, self.history_length) of tuples (i.e. (screen, extra_info))? 2) if I pass such a state to self._setInput, will Neon work (for such array of tuples)?

Thanks!

On Sun, Jun 26, 2016 at 6:21 AM, Tambet Matiisen notifications@github.com wrote:

Closed #22 https://github.com/tambetm/simple_dqn/issues/22.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/tambetm/simple_dqn/issues/22#event-704238242, or mute the thread https://github.com/notifications/unsubscribe/ACArXn86T7pB1LWaMBut1dGA5a5OT3vYks5qPnzLgaJpZM4I-OpZ .

mw66 commented 8 years ago

related question here:

https://groups.google.com/forum/#!topic/neon-users/hMTRAT4Hf4M

mw66 commented 8 years ago

How to change this:

https://github.com/tambetm/simple_dqn/blob/master/src/deepqnetwork.py#L37

self.screen_input_shape = (self.history_length,) + self.screen_dim + (self.batch_size,)
self.extra_input_shape = (self.history_length,) + (N,) + (self.batch_size,)
self.input = (self.be.empty(self.screen_input_shape), self.be.empty(self.extra_input_shape))
self.input.lshape = (self.screen_input_shape,  self.extra_input_shape) # HACK: needed for convolutional networks

the input will be a tuple, and self.input.lshape will be tuple of tuples, will this work?

mw66 commented 8 years ago

Ahh, I realized that if I separate two inputs by using two variables, it may work:

self.screen_input = ... self.extra_input = ...

comments?

mw66 commented 8 years ago

And call fprop with the tuple

self.model.fprop((self.screen_input, self.extra_input))

as in

https://github.com/NervanaSystems/neon/blob/master/neon/data/imagecaption.py#L217

mw66 commented 8 years ago

https://github.com/tambetm/simple_dqn/blob/master/src/deepqnetwork.py#L49

how to change this then:

self.model.initialize(self.input_shape[:-1], self.cost)

I read the doc:

http://neon.nervanasys.com/docs/latest/_modules/neon/models/model.html#Model.initialize def initialize(self, dataset, cost=None):

it suppose to get a dataset, is this a bug?

mw66 commented 8 years ago

should it be:

self.model.initialize(self.input[:-1], self.cost)

if this is a bug, in my experiment, can I change it to:

self.model.initialize((self.screen_input[:-1], self.extra_input[:-1]), self.cost)

i.e. pass a tuple as dataset, will it work as expected?

tambetm commented 8 years ago

Added comment for StateBuffer. For the other issues I think you should refer to Neon.