ottokart / punctuator2

A bidirectional recurrent neural network model with attention mechanism for restoring missing punctuation in unsegmented text
http://bark.phon.ioc.ee/punctuator
MIT License
657 stars 195 forks source link

Output values of output_recurrence #59

Open helga-lvl opened 4 years ago

helga-lvl commented 4 years ago

Hey, I’m trying to understand the outputs of the _outputrecurrence function in models.py. Right now, I’m not seeing the updates I would expect. I’ve gone through the function step by step and somewhere it just starts multiplying everything with 0 and loses all values. All the fusion parameters seem to be initalized at 0 and I don’t see where they’re updated.

My first question:

_weightsconst: it’s defined as np.ones but then every time it’s used all the values are multiplied by 0 giving it only an output of zeroes?

My second question:

Output recurrence, I went through it some steps of the theano.scan, will give the results of the third update.

I have:

minibatch_size = 128
n_hidden = 256
x_vocabulary_size = 15695
y_vocabulary_size = 8

Inputs in the theano.scan:

x_t: context[3]: hidden layer forward and backward concatenated shape: (50,128,512)

h_tm1: starts at GRU.h0=np.zeros((128,256) in the first iteration, updates with the GRU.step function shape: (128,256)

Attention parameters

Wa_h: d = 6/(256+512) random between -d and d shape: (256, 512)

Wa_y: d = 6/(512+1) random between -d and d shape: (512, 1)

Late fusion parameters

Wf_h: np.zeros((256,256)) shape: (256, 256)

Wf_c: np.zeros((512,256)) shape: (512,256)

Wf_f: np.zeros((256,256)) shape: (512,256)

bf: np.zeros((1,256)) shape: (1,256)

Output model

Wy: np.zeros((256,8)) shape:(512,256) by: np.zeros((256,8)) shape: (256,8)

context: forward and backward sequences shape: (50,128,512)

projected_context: T.dot(context, Wa_c) + ba = (context*random) + 0 shape: (50,128,512)

Inside the function:

h_a: values between -1 and 1 shape: (50,128,512)

alphas (1): T.exp(T.dot(h_a, Wa_y)): random values shape: (50,128)

alphas (2): alphas.reshaped to only the first two dimensions (extra question, I only have two dimensions, should I have three?) shape: (50,128)

alphas (3): Normalized alphas shape: (50,128)

weighed_context: (context * alphas[:,:,None]).sum(axis=0), random values shape: (128,512)

h_t: GRU.step, the hidden time step that is used in the recurrence, the initial value is h0, updated values have gone through sigmoid and tanh activations.

Late fusion (now I stop understanding)

lfc: T.dot(weighted_context, Wf_c) -Wf_c is a matrix of zeros so it becomes np.zeros((128,256)). shape: (128,256)

fw (fusion weights): T.nnet.sigmoid(T.dot(lfc, Wf_f) + T.dot(h_t, Wf_h) + bf) Wf_f, lfc, Wf_h and bf are all matrices of zeroes, since it’s taking sigmoid of it I get a matrix of 0.5. shape: (128,256)

hf_t (weighted fused context + hidden state): lfc * fw + h_t - lfc is 0, fwis 0.5 and h_t is just the hidden state, so hf_t = h_t in all steps. Shape: (128,256)

z: T.dot(hf_t, Wy) + by: Wyis a zero matrix and by a zero vector, becomes np.zeros((128,8)) shape: (128,8)

y_t: T.nnet.softmax(x) Softmax of zeroes in every single step, resultrow_number * ([0.125] * y_vocabulary_size) Shape: (128,8)

I want to ask, shouldn’t I be able to update this? Are the fusion values all supposed to be zero through the process? Thanks. :)

Disclaimer: I write that values are random although they are being carefully updated if they were initialized as random.