[Low priority] Building off of deep q learning example

Hey there! First of all: thank you so much for releasing this, documenting things, putting it on pypi, etc. etc., really appreciate it :)

I've been trying to get a fun "search and rescue" example working where a drone with a search radius explores a map until it finds the objective. Right now I am having trouble getting the state input properly... assuming I have the rest understood. It seems like I should keep doing DQNAgent.learn over and over until I am satisfied? Was kind of confused by the DQNA example with all the socketIO stuff, wasn't sure how that was driving the learning.

# some psuedo code
height = width = 40
map = numpy array[40, 40]
ACTIONS = ('up', 'down', 'left', 'right')

agent = DQNAgent(height * width, len(self.ACTIONS))

while True:
    state = map.copy()
    action = self.agent.get_action(state)
    reward = 0 # not sure what to set this on the first "learn"

    drone.do_action(ACTIONS[action])

    # Get state after action has changed it
    next_state = map.copy()

    reward = drone.get_current_reward()

    self.agent.learn(state, action, reward, next_state)

However the problem is I get:

Wrong number of dimensions: expected 2, got 3 with shape (1, 40, 40).

If you're feeling crazy here's the actual source.

I may have this all ass backwards, apologies if this is a silly question.

Hey there, to summarize my direct problem: what kind of "state" should i be using? I am trying array with shape (40, 40) so I had to modify the example, changing:

action = self.model.compute([state])   # gives us (1, 40, 40) when we want (40, 40) state

to:

action = self.model.compute(state)  # gives us (40, 40)

And now I get:

ValueError: Shape mismatch: x has 40 cols (and 40 rows) but y has 1600 rows (and 100 cols)
Apply node that caused the error: Dot22(x, W_dense1)
Inputs types: [TensorType(float64, matrix), TensorType(float64, matrix)]
Inputs shapes: [(40, 40), (1600, 100)]
Inputs strides: [(320, 8), (800, 8)]
Inputs values: ['not shown', 'not shown']

Maybe I'm doing something else wrong and I don't want to poke around too much in the deepy codebase--but how should I be setting up the state properly?

Full output:

$ python src/run.py deepq
Starting experiment...
  state_num = 1600
> /Users/eric/src/plithos/src/plithos/deep_q_learner.py(57)get_action()
     56                 import ipdb; ipdb.set_trace()
---> 57                 action = self.model.compute(state)
     58             return int(action[0].argmax())

ipdb> c
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
src/run.py in <module>()
     48         drone_count=args.drone_count,
     49     )
---> 50     experiment.start()
     51 
     52 

/Users/eric/src/plithos/src/plithos/simulations/dqn_single_drone.py in start(self)
     21 
     22             state = self.map.copy()
---> 23             action = self.agent.get_action(state)
     24             reward = 0
     25 

/Users/eric/src/plithos/src/plithos/deep_q_learner.pyc in get_action(self, state)
     55             with self.thread_lock:
     56                 import ipdb; ipdb.set_trace()
---> 57                 action = self.model.compute(state)
     58             return int(action[0].argmax())
     59 

/Users/eric/.virtualenvs/plithos/lib/python2.7/site-packages/deepy/networks/network.pyc in compute(self, *x)
    143         """
    144         self._compile()
--> 145         return self._compute(*x)
    146 
    147     @property

/Users/eric/.virtualenvs/plithos/lib/python2.7/site-packages/theano/compile/function_module.pyc in __call__(self, *args, **kwargs)
    604                         self.fn.nodes[self.fn.position_of_error],
    605                         self.fn.thunks[self.fn.position_of_error],
--> 606                         storage_map=self.fn.storage_map)
    607                 else:
    608                     # For the c linker We don't have access from

/Users/eric/.virtualenvs/plithos/lib/python2.7/site-packages/theano/compile/function_module.pyc in __call__(self, *args, **kwargs)
    593         t0_fn = time.time()
    594         try:
--> 595             outputs = self.fn()
    596         except Exception:
    597             if hasattr(self.fn, 'position_of_error'):

ValueError: Shape mismatch: x has 40 cols (and 40 rows) but y has 1600 rows (and 100 cols)
Apply node that caused the error: Dot22(x, W_dense1)
Inputs types: [TensorType(float64, matrix), TensorType(float64, matrix)]
Inputs shapes: [(40, 40), (1600, 100)]
Inputs strides: [(320, 8), (800, 8)]
Inputs values: ['not shown', 'not shown']

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

zomux / deepy

[Low priority] Building off of deep q learning example #18