some question about the input and output of the LSTM

kracwarlock / action-recognition-visual-attention

Action recognition using soft attention based deep recurrent neural networks

http://www.cs.toronto.edu/~shikhar/projects/action-recognition-attention

351 stars 158 forks source link

some question about the input and output of the LSTM #10

Closed Ouya-Bytes closed 8 years ago

Ouya-Bytes commented 8 years ago

Hello,Kracwarklock.I want to know is l(ti) reprensent a scalar?and what the shape of the x(t)=sum{l(ti)_X(t,i)},i=1....k_k, x(t) is a vextor?a martix? or a cube?

kracwarlock commented 8 years ago

Hi @OuYag

Yes l_{t,i} is a scalar. It represents the weight of the i-th location and the timestep-t for the attention model.

kracwarlock commented 8 years ago

For the paper: x_t for a single data point is of shape (D,). We had D=1024 since we used googlenet features. It is a vector for a datapoint which is the average of the feature cube weighted by the location.

In the code: Since we use batches, this becomes (batchsize,D) for each minibatch of examples. It is represented by the variable `ctx` https://github.com/kracwarlock/action-recognition-visual-attention/blob/master/src/actrec.py#L278